Show simple item record

 
dc.creator Haltrup Hansen, Dorte
dc.creator Offersgaard, Lene
dc.date.accessioned 2018-06-25T13:41:09Z
dc.date.available 2018-06-25T13:41:09Z
dc.date.issued 2012
dc.identifier.uri http://hdl.handle.net/20.500.12115/30
dc.description The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The corpus comprises 5330 + 2200 press releases (files) for each language Danish, English and German with app. 5,000,000 words per language and 260,000 - 270,000 aligned sentences for the language pair Danish - English and Danish - German. All documents are processed with Uplug (https://bitbucket.org/tiedemann/uplug/wiki/Home) and aligned with HunAlign. Files with more than 10 % negative alignments have been removed and so has all 0-alignmants. The documents are in txt-format for each language and in tmx-format for the aligned language pairs (da-en and da-de).
dc.language.iso dan
dc.language.iso eng
dc.language.iso deu
dc.publisher Centre for Language Technology, NorS, University of Copenhagen
dc.publisher European Commission
dc.rights CLARIN-ACA-NC
dc.rights.uri https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1
dc.rights.label ACA
dc.subject MT
dc.subject EU
dc.subject press relase
dc.subject alignment
dc.subject politics
dc.title DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-DK
contact.person Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen
size.info 5000000; tokens
size.info 270000; sentences
files.size 112987350
files.count 3


 Files in this item

 Download all files in item (107.75 MB)
This item is
Academic Use
and licensed under:
CLARIN-ACA-NC
Attribution Required Noncommercial
Icon
Name
Rapid-1993-2003.zip
Size
68.55 MB
Format
application/zip
Description
Corpus 1993 - 2003
MD5
d73a47ab17a22afeff024a360100e907
 Download file
Icon
Name
Rapid-2004-2011.zip
Size
39.2 MB
Format
application/zip
Description
Corpus 2004 - 2011
MD5
ce84f48a004e249fcbe511faf0856e77
 Download file
Icon
Name
README.txt
Size
1.01 KB
Format
Text file
Description
Documentation
MD5
8a7d86a2ef03a56751b93a15b60a4d63
 Download file

Show simple item record