Show simple item record

 
dc.creator Schneidermann, Nina
dc.date.accessioned 2019-11-18T12:30:09Z
dc.date.available 2019-11-18T12:30:09Z
dc.date.issued 2019-11-18
dc.identifier.uri http://hdl.handle.net/20.500.12115/39
dc.description The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges according to their semantic similarity, i.e. the extend to which the two words are similar in meaning, in a normalized 0-1 range. Note that this dataset provides a way of measuring similarity rather than relatedness/association. Description of files included in this material: (Note: In both of the included files, rows correspond to items (word pairs) and columns to properties of each item.) All_sims_da.csv: Contains the non-normalized mean similarity scores over all judges, along with the non-normalized scores given by each of the 38 judges on the scale 0-6, where 0 is given to the most dissimilar items and 6 to the most similar items. Gold_sims_da.csv: Contains the similarity gold standard for each item, which is the normalized mean similarity score for a given item over all judges. Scores are normalized to a 0-1 range, where 0 denotes the minimum degree of similarity and 1 denotes the maximum degree of similarity.
dc.language.iso dan
dc.publisher Centre for Language Technology, NorS, University of Copenhagen
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/kuhumcst/Danish-Similarity-Dataset
dc.subject similarity data set
dc.subject Danish
dc.title Danish Similarity Data Set (date: Nov. 18, 2019)
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType other
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-DK
contact.person Bolette Sandford; Pedersen; bspedersen@hum.ku.dk; Centre for Language Technology, NorS, University of Copenhagen
size.info 198; other
files.size 8369
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Attribution Required
Icon
Name
Danish_Similarity_Dataset.zip
Size
8.17 KB
Format
application/zip
MD5
e5eb2594717dcc6e9b16b854d5f75eb8
 Download file  Preview
 File Preview  

Show simple item record