DK-CLARIN Rapid Parallel Corpus 1993-2003 (da-en-de)

Hansen, Dorte Haltrup; Offersgaard, Lene

dc.creator	Hansen, Dorte Haltrup
dc.creator	Offersgaard, Lene
dc.date.accessioned	2018-06-22T09:44:11Z
dc.date.available	2018-06-22T09:44:11Z
dc.date.issued	2011
dc.identifier.uri	http://hdl.handle.net/20.500.12115/28
dc.description	The corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 (http://europa.eu/rapid/search.htm). Each of the 5330 press releases (files) exist in Danish, English and German with app. 3,000,000 words for each language. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), the Danish and English texts with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation, and the German texts with tokenisation sentence and paragraph segmentation. The annotations are placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
dc.language.iso	dan
dc.language.iso	eng
dc.language.iso	deu
dc.publisher	Centre for Language Technology, NorS, University of Copenhagen
dc.publisher	European Commission
dc.rights	CLARIN-ACA-NC
dc.rights.uri	https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1
dc.rights.label	ACA
dc.subject	press relase
dc.subject	politics
dc.subject	EU
dc.title	DK-CLARIN Rapid Parallel Corpus 1993-2003 (da-en-de)
dc.type	corpus
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN-DK
contact.person	Administrator; CLARIN-DK; info@clarin.dk; Centre for Language Technology, NorS, University of Copenhagen
size.info	5330; files
size.info	3000000; words
files.size	352451311
files.count	6
annotationInfo.annotationType	tokenization
annotationInfo.annotationType	sentence and paragraph segmentation
annotationInfo.annotationType	POS-tagging
annotationInfo.annotationType	lemmatization
annotationInfo.annotationType	termhood scoring