What's New
corpus
Description:
The ensiwiki dataset contains Wikipedia pages sampled from Simple-English and regular English Wikipedia. For each Simple-English page, a paired page was sampled from the regular English Wikipedia if available. The result ...
This item contains 2 files (917.12
MB).
Publicly Available
corpus
Description:
CopCo is an eye-tracking corpus tailored to both psycholinguistics and natural language processing. The goal is to investigate reading behavior of Danish texts in various populations. To this end, we record eye movements ...
This item contains 58 files (99.12
MB).
Publicly Available
corpus
Description:
Danmarks Nyere Tid fra Nationalmuseets billedarkiv består af 6834 kulturhistoriske sort-hvide fotografier af håndværk, industri og næringsliv med tilhørende beskrivelser. Fotografierne forestiller alt fra redskaber til ...
This item contains 26 files (5.14
GB).
Publicly Available
Most Viewed Items
Top Last Week
toolService
Description:
CSTlemma is a lemmatizer that treats pre- in- and suffixes alike.
The CST's lemmatizer can be (and already is) trained for tens of languages, also ones that require lemmatization rules that change words by adding or ...
This item contains 1 file (163.48
KB).
Publicly Available
corpus
Description:
The ensiwiki dataset contains Wikipedia pages sampled from Simple-English and regular English Wikipedia. For each Simple-English page, a paired page was sampled from the regular English Wikipedia if available. The result ...
This item contains 2 files (917.12
MB).
Publicly Available
corpus
Description:
Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011.
The corpus ...
This item contains 15 files (188.39
MB).
Academic Use