Show simple item record

 
dc.creator Hansen, Dorte Haltrup
dc.creator Offersgaard, Lene
dc.date.accessioned 2018-06-11T15:12:05Z
dc.date.available 2018-06-11T15:12:05Z
dc.date.issued 2011
dc.identifier.uri http://hdl.handle.net/20.500.12115/18
dc.description The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from 12 of the biggest Danish companies. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
dc.language.iso dan
dc.language.iso eng
dc.publisher Centre for Language Technology, NorS, University of Copenhagen
dc.rights CLARIN-ACA-NC
dc.rights.uri https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1
dc.rights.label ACA
dc.subject Economics
dc.title DK-CLARIN Parallel Financial Corpus (da-en)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-DK
contact.person Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen
sponsor n/a; n/a; DK-CLARIN; nationalFunds;
size.info 4343072; tokens
size.info 4854172; tokens
size.info 90; files
files.size 190604631
files.count 7
annotationInfo.annotationType tokenization
annotationInfo.annotationType sentence and paragraph segmentation
annotationInfo.annotationType POS-tagging
annotationInfo.annotationType lemmatization
annotationInfo.annotationType termhood scoring


 Files in this item

 Download all files in item (181.77 MB)
This item is
Academic Use
and licensed under:
CLARIN-ACA-NC
Attribution Required Noncommercial
Icon
Name
annual-reports-da.zip
Size
93.86 MB
Format
application/zip
Description
Corpus - Danish
MD5
441f9b22e1f510d83a9e5dba1725b7a3
 Download file
Icon
Name
annual-reports-en.zip
Size
87.23 MB
Format
application/zip
Description
Corpus - English
MD5
e79821e3d1b912536f56254760b2e85e
 Download file
Icon
Name
README_financial-reports.txt
Size
2.38 KB
Format
Text file
Description
Documentation
MD5
a8048d2626384dbaa8cb0d0b9dccbef7
 Download file
Icon
Name
text-header.pdf
Size
375.79 KB
Format
PDF
Description
Documentation
MD5
47825d0010a398bf10ce1564da2a15f0
 Download file
Icon
Name
text-format.pdf
Size
111.77 KB
Format
PDF
Description
Documentation
MD5
c4c4b5f1cd83ff232c44bc7692621da7
 Download file
Icon
Name
textCorpusProfile.xsd
Size
142.26 KB
Format
XML
Description
Schema
MD5
7d6b452b88175041133ea8020e453cd8
 Download file
Icon
Name
teiHeader.xsd
Size
59.88 KB
Format
XML
Description
Schema
MD5
9fc5374ad34319278f437b963454f972
 Download file

Show simple item record