------------------------------------------------------------------------------------- The Danish Parliament Corpus 2009 - 2017, v2, w. subject annotation ------------------------------------------------------------------------------------- April 2021 The corpus contains transcripts of parliamentary speeches of the Danish Parliament, Folketinget, session 20091 to 20161 (6/10 2009 – 7/9 2017) downloaded from the Danish Parliaments ftp server: ftp://oda.ft.dk. The corpus has extensive metadata about the MPs (name, gender, age, role, title, party affiliation), timing of the speeches and subject annotation of each agenda item. The information on age and gender was added from external sources and the subject annotation was semiautomaticly added to each speech on the basis of manual annotation of the agenda titles. The corpus is organized into UTF-8 formated, tab separated txt-files, one filer per meeting, one zip-file per session. In addition the excel file "Subjects.xlsx" acounts for the subject categories used in the corpus and a list of party names and abbreviations can be found in the file "Paties.txt". Each line contains a speech with the following tab seperated metadata: ID the id consisting of date and start time of the speech Date date of the speech Start time stat time of the speech End time end time of the speech Time duration of the speech Agenda item the date and agenda number Case no the case number acording to the parliament Case type the case type acording to the parliament Agenda title the agenda title unde which the speech was held Subject 1 one of 19 subject categories (or other) Subject 2 if nessasary an addition subject category (or none) Name name of the speaker Gender gender of the speaker Party party of the speaker Role role of the speaker Title title of the speaker Birth birth of the speaker Age age of the speaker on the speaking time Text the speech The Danish Parliament Corpus 2009 - 2017 follows the license for Open Data stating the following: "The Danish Parliament grants a world-wide, free, non-exclusive and otherwise unrestricted right of use of the data in the Danish Parliament's open data catalogue. The data can be freely: • copied, distributed and published, • adapted and combined with other material, • exploited commercially and non-commercially. " Following the copyright act, the speeches can be distributed without the consent of the speaker but only in a way where the author/speaker of each text/speech is clearly stated. Furthermore, the Danish Parliament must be acknowledged as the source. Papers that reference the corpus: http://lrec-conf.org/workshops/lrec2018/W2/pdf/3_W2.pdf http://ceur-ws.org/Vol-2364/15_paper.pdf