dc.creator |
Hansen, Dorte Haltrup |
dc.creator |
Offersgaard, Lene |
dc.date.accessioned |
2018-06-11T15:12:05Z |
dc.date.available |
2018-06-11T15:12:05Z |
dc.date.issued |
2011 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12115/18 |
dc.description |
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from 12 of the biggest Danish companies.
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation placed in separate text external spangroups.
The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology.
The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010. |
dc.language.iso |
dan |
dc.language.iso |
eng |
dc.publisher |
Centre for Language Technology, NorS, University of Copenhagen |
dc.rights |
CLARIN-ACA-NC |
dc.rights.uri |
https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1 |
dc.rights.label |
ACA |
dc.subject |
Economics |
dc.title |
DK-CLARIN Parallel Financial Corpus (da-en) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN-DK |
contact.person |
Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen |
sponsor |
n/a; n/a; DK-CLARIN; nationalFunds; |
size.info |
4343072; tokens |
size.info |
4854172; tokens |
size.info |
90; files |
files.size |
190604631 |
files.count |
7 |
annotationInfo.annotationType |
tokenization |
annotationInfo.annotationType |
sentence and paragraph segmentation |
annotationInfo.annotationType |
POS-tagging |
annotationInfo.annotationType |
lemmatization |
annotationInfo.annotationType |
termhood scoring |