Frequently Asked Questions



What is the repository?

It is like a library for linguistic data and tools.

  • Search for data and tools and easily download them.
  • Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)

What submissions do we accept?

CLARIN-DK is an infrastructure where researchers can deposit, share and download language-based material. i.e. texts, transcriptions, lexicons, word lists, audio and video files.

Data have to be of one of the 4 accepted data types:
  • corpus, i.e. text, audio, and video corpora with or without annotations
  • languageDescription, i.e. grammars, language models, etc.
  • lexicalConceptualResource, i.e. lexica, word lists, wordnets, etc.
  • toolService, software for language processing

In practice there are some restrictions to the data that are to be deposited. Data must be available for research purposes. It must be possible to provide evidence for that the data can be used as research data that follow the Danish Code of Conduct for Research Integrity.

Data to be deposited must be available for sharing under the conditions of a chosen license. See info on licenses: License Agreement and Contracts and Available Licenses.

The curators of the repository can decide not to receive data:
  • if the size of the data exceeds the repository limit for free storage of data (data packages of a size exceeding 100 GB will be assessed, some will be offered storage for a yearly negotiated fee, others will be offered free storage or rejected).
  • if data needs to be assigned a specific and restrictive license, which the curators estimate will be a hindrance to other researchers’ use of the data.
  • if data are in a not-open data format, which the curators of the repository do not regard as broadly accessible to other researchers.
  • if data are not within the scope/focus of the repository which is language based data.

If the user does not readily accept the decision of the curators, he/she can contact the national coordinator of CLARIN in Denmark, see the Contact page.

A log is kept of the cases of data that are rejected for deposit. Once a year a report is sent to the national coordinator of CLARIN in Denmark in which the cases of rejected deposits and the reasons for the rejections are explained.

Do I need to create an account to download and/or make a submission?

  • You can download data and tools with a license that allows free sharing without any obstacles. Just read the license and download. This applies to all data with Creative Commons and tools with open source licenses.
  • To download data and tools that require you to sign a license, you need to log in. To make a submission, you also need to log in. However, if you are from the academic world, you probably don't need any new account.
  • Just click "Login" and search for your academic institution. To sign in, you can use any account with an Identity Provider that is a member of EduGAIN federation.
  • If you don't have an academic account that works with us, let us know. We will make you a local account.

I see an error logging in

Please let us know through our Help Desk, if you have any trouble logging in.

Ocassionally (usually when you are the first one logging in using your home institution) you might see an error stating "The authentication was successful; however, your identity provider did provide neither your email, eppn nor targeted id." This means your home institution did not send us enough data about you to operate our service; the institution is doing so to protect your personal data. We only require an email and we are following Data Protection Code of Conduct, which helps us convince the institution we won't abuse data about you.

If you have an account with multiple providers and you login with different one each time, you might see error stating "Your email is already associated with a different user.". Please try to use the same provider each time, if that is not possible, let us know and we'll change the default one.

Why should I submit my data into your repository?

  • It is free and safe.
  • We respect your license. We encourage Free Data and believe it benefits not only users, but also the data providers. However we accept also more closed data and we can make users sign a license before downloading your data, if that is what you need.
  • The data is visible, giving you maximal credit for your work (google, VLO, DataCite, OLAC, Data Citation Index, arXive).
  • The data is easy to cite. We provide ready-to-use one-click citations in BibTex, RIS, and other popular reference formats. All the citations include permanent links created from persistent identifiers (we use handles for PIDs). These PIDs are future-proof.
  • For some data, like text corpora or treebanks, we can provide additional services, like full-text or even tree-query search.

Why should I submit my tools?

  • See "Why should I submit my data into your repository?". Everything applies to software tools too.
  • You can just link your version control system (svn, git), if it is publicly accessible. You can also link your project page, or demo site.

What is the PID (handle) good for?

It is a special permanent URL. It provides a permanent link that will resolve correctly even if in some distant future the data is moved. Thus it should be used as URL in citations.

What is the actual depositing/archiving procedure?

During the submission of digital language resources to the repository, the data undergo a curation process in order to ensure quality and consistency. We assist you in meeting necessary requirements for sustainable resource archiving. Data have to be provided with metadata in standard formats accepted/adopted in the respective communities, persistent identifiers (PIDs) have to be assigned, IPR issues have to be resolved and clear statements with regard to licensing and possible use of the resources are to be made. The depositor is also required to electronically sign a deposition agreement acknowledging the (s)he is the holder of rights to the data and that (s)he has the right to grant the rights contained in this licence. Once the data is indeed deposited in the repository it is assigned a PID for stable reference.

What data formats are accepted?

See Recommended data formats page.

What if I want/need to update the archived data?

Our policy in regards to metadata changes is to allow for changes in the case of misspellings, minimal corrections or adding further information. Any changes to the status or metadata of a resource are recorded in provenance information. Updates to a published resource including any changes made to metadata are logged.

Data are not permitted to be modified after being published in the respository. However, for major modifications to the item, the depositor is requested to submit a new version.

When a new version of the resource is submitted, it will get a reference in the metadata to the version it replaces. The superseded version of the resource will also be added with a reference in the metadata that it is replaced by (newer version). The new resource will obtain a new persistent identifier (PID) when the submission is accepted and published.

If a resource has multiple versions, a notice is displayed to the user when accessing the landing page. On the landing page a list of the available versions of the resource is available for the user to load an alternate version, i.e. a superseding version.

If the case of withdrawal requests, these will be evaluated on case-by-case basis. Furthermore, we reserve the right to keep the metadata of published submissions available in case that there is no legal requirement to delete the metadata. Changes made to data are logged.

What if I want to withdraw the resources in the future? Can I delete the data?

Yes, in this case contact our Help Desk with the submission PID and the reason. However, we need to keep a reference that the data was in our repository (because a persistent identifier was issued), so the administrative metadata will be retained indicating that the data itself were removed.

I don't want / cannot make the data publicly available or make them available after a specific date. Would you still archive them for me?

In accordance with the advocacy of the research infrastructures and the general development with respect to Open Access, we strongly encourage the data producers to be as open as possible. However, in other circumstances we will archive your data even if they will not be publicly available. Please, contact our Help Desk prior to completing the submission.

How to cite a submission?

See About Citations

How safe is my data, if I store it with you?

Quite safe, probably much more than in your computer. Our storage plan:

  • All the data in the repository have an on-site backup copy.
  • There is another off-site copy, so even complete destruction of our building does not destroy your data.
  • We check all the copies regularly and should any of them become corrupted we delete it and make a new one.
  • We keep at least three copies, one of them off-site, at all times

What license should I pick for my data/tool?

We encourage using a free license. A representative selection of free licenses as well as CC licenses (more appropriate for data) is available directly during submission. There is a great OPEN License Selector which can guide you through the selection of appropriate license.
If for some reason you need a different license, Contact Us.

Where can I find more information about supported licenses?

The list of licenses currently supported is here. However, do not hesitate to Contact Us in case you need your specific license. The licenses can be accompanied by various requirements; eg. limiting to logged in users, filling additional details (purpose) etc.

How do I get the most of my searches?

In contrast to other search engines this one uses OR as a default operator; see examples below that clarify this. If you are not satisfied with the results of your searches, you might wish to go beyond plain text searches. You may search only in certain fields, use negation, add score (emphasis) to some parts of the query and match more. The search engine is SOLR so use it's syntax if you know it or check it in the documentation.

How are quality control checks implemented?

Submitted metadata will be validated by schemas during submission. Content of metadata fields are reviewed with regards to the relevant content and understandability. It is preferred that metadata is written in English. If metadata is provided in other languages than English, a translation of the content may be requested.

File formats are validated using the list of Recommended and accepted formats. Filenames are reviewed and changes might be suggested by a curator.

If audio or video data are included in a deposit, the curator will ask if the submission contains sensitive content and if so, ask if consent is given that allows for sharing of the data with regards to privacy and data protection laws.

Curation tasks can be run by a curator during review of a submission and some curation tasks are run regularly on the repository.

What is the preservation plan?

CLARIN-DK is committed to the long-term care of items deposited in our repository and strives to adopt the current best practice in digital preservation. The National Coordinator for CLARIN-DK together with the repository manager regularly reviews the current resource types and the file formats available. On this basis, they will decide if updates to the repository are needed for supporting data types, recommended file formats or guidance to users. These decisions forms the basis for the implementation plan for the repository for the current year. As part of the implementation plan, technical changes or updates to the repository are also considered to ensure that the implementation is up-to-date. It is the responsibility of the repository manager to follow-up on the reports about any security changes that are requested by UCPH IT or specified in the reports from DKCERT. See the Preservation Policy here.

What are the future development plans?

A yearly implementation or development plan for the repository is formulated in January, and reported to the DIGHUMLAB secretary and management board. The implementation plan takes into consideration the CLARIN infrastructure workplan decided by CLARIN ERIC at the CLARIN Annual Conference in the autumn of the preseeding year. The implementation plans are described in a DIGHUMLAB internal document which is not published. If you wish to submit proposals to the future developments, see the Contact page for contact details.