Show simple item record

 
dc.creator Hansen, Dorte Haltrup
dc.creator Offersgaard, Lene
dc.date.accessioned 2018-06-22T09:44:11Z
dc.date.available 2018-06-22T09:44:11Z
dc.date.issued 2011
dc.identifier.uri http://hdl.handle.net/20.500.12115/28
dc.description The corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 (http://europa.eu/rapid/search.htm). Each of the 5330 press releases (files) exist in Danish, English and German with app. 3,000,000 words for each language. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), the Danish and English texts with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation, and the German texts with tokenisation sentence and paragraph segmentation. The annotations are placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
dc.language.iso dan
dc.language.iso eng
dc.language.iso deu
dc.publisher Centre for Language Technology, NorS, University of Copenhagen
dc.publisher European Commission
dc.rights CLARIN-ACA-NC
dc.rights.uri https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1
dc.rights.label ACA
dc.subject press relase
dc.subject politics
dc.subject EU
dc.title DK-CLARIN Rapid Parallel Corpus 1993-2003 (da-en-de)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-DK
contact.person Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen
size.info 5330; files
size.info 3000000; words
files.size 352451311
files.count 6
annotationInfo.annotationType tokenization
annotationInfo.annotationType sentence and paragraph segmentation
annotationInfo.annotationType POS-tagging
annotationInfo.annotationType lemmatization
annotationInfo.annotationType termhood scoring


 Files in this item

 Download all files in item (336.12 MB)
This item is
Academic Use
and licensed under:
CLARIN-ACA-NC
Attribution Required Noncommercial
Icon
Name
da.zip
Size
132.07 MB
Format
application/zip
Description
Danish corpus
MD5
340ac9cd92f3dd4974f0d0ffcd391d78
 Download file
Icon
Name
de.zip
Size
67.45 MB
Format
application/zip
Description
German corpus
MD5
a33313a5c1cd6760856bc68876096d34
 Download file
Icon
Name
en.zip
Size
136.07 MB
Format
application/zip
Description
English corpus
MD5
0406ef3fc8fb5eebe4f3d10fb952a994
 Download file
Icon
Name
teiHeader.xsd
Size
59.88 KB
Format
XML
Description
schema
MD5
9fc5374ad34319278f437b963454f972
 Download file
Icon
Name
text-format.pdf
Size
111.77 KB
Format
PDF
Description
Documentation
MD5
c4c4b5f1cd83ff232c44bc7692621da7
 Download file
Icon
Name
text-header.pdf
Size
375.79 KB
Format
PDF
Description
Documentation
MD5
47825d0010a398bf10ce1564da2a15f0
 Download file

Show simple item record