dc.creator |
Haltrup Hansen, Dorte |
dc.creator |
Offersgaard, Lene |
dc.date.accessioned |
2018-06-25T13:41:09Z |
dc.date.available |
2018-06-25T13:41:09Z |
dc.date.issued |
2012 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12115/30 |
dc.description |
The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm).
The corpus comprises 5330 + 2200 press releases (files) for each language Danish, English and German with app. 5,000,000 words per language and 260,000 - 270,000 aligned sentences for the language pair Danish - English and Danish - German.
All documents are processed with Uplug (https://bitbucket.org/tiedemann/uplug/wiki/Home) and aligned with HunAlign.
Files with more than 10 % negative alignments have been removed and so has all 0-alignmants.
The documents are in txt-format for each language and in tmx-format for the aligned language pairs (da-en and da-de). |
dc.language.iso |
dan |
dc.language.iso |
eng |
dc.language.iso |
deu |
dc.publisher |
Centre for Language Technology, NorS, University of Copenhagen |
dc.publisher |
European Commission |
dc.rights |
CLARIN-ACA-NC |
dc.rights.uri |
https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1 |
dc.rights.label |
ACA |
dc.subject |
MT |
dc.subject |
EU |
dc.subject |
press relase |
dc.subject |
alignment |
dc.subject |
politics |
dc.title |
DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de) |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
text |
has.files |
yes |
branding |
CLARIN-DK |
contact.person |
Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen |
size.info |
5000000; tokens |
size.info |
270000; sentences |
files.size |
112987350 |
files.count |
3 |