DK-CLARIN LSP Corpus - Construction domain
Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and The Danish Language Council, 2011,
DK-CLARIN LSP Corpus - Construction domain, CLARIN-DK-UCPH Centre Repository,
http://hdl.handle.net/20.500.12115/9.
Authors
Item identifier
Date issued
2011
Size
35 files,
577,392 tokens
Language(s)
Description
Texts in the Construction Domain come from Statens Byggeforskningsinstitut, Erhvervs- og byggestyrelsen and Murerfagets Oplysningsråd and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011.
The corpus consists of 577,392 words in 35 files.
Communicative setting/Number of files: expert->expert (18) expert->advanced (6) expert->basic (11).
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, sentence and paragraph segmentation, pos-tagging, lemmatisation and termhood annotation placed in separate text external spangroups.
"DK-CLARIN LSP Corpus - Construction domain" is a part of the Danish DK-CLARIN LSP corpus consisting of seven sub-corpora from following subject domains: Agriculture, Construction, Economics, Environment, Health, IT and Nanotechnology.
Acknowledgement
n/a
Project code:n/a
Project name:DK-CLARIN
Subject(s)
Collections



