Recommended data formats
File formats recommended and accepted by the CLARIN-DK-UCPH Repository for data sharing and reuse. You may need to convert your data files to a preservation file format. Please get in touch with questions and suggestions about appropriate file formats.
Type of data | Recommended formats | Acceptable formats |
---|---|---|
Textual data |
Text Encoding Initiative (TEI) P5 Guidelines (TEI P5) eXtensible Mark-up Language (.xml) text according to an appropriate Document Type Definition (DTD) or schema plain text, ASCII (.txt) CSV format |
Hypertext Mark-up Language (.html) Rich Text Format (.rtf) widely-used formats: MS Word (.doc/.docx) NVivo |
Audio | Free Lossless Audio Codec (FLAC) (.flac) |
MPEG-1 Audio Layer 3 (.mp3) if original created in this format Waveform Audio Format (.wav) Audio Interchange File Format (.aif) |
Video |
MPEG-4 (.mp4) OGG video (.ogv, .ogg) motion JPEG 2000 (.mj2) |
AVCHD video (.avchd) |
Annotations |
Text Encoding Initiative (TEI) P5 Guidelines (TEI P5) CSV format CQP corpus format |
CLAN PRAAT |
Images (e.g. of texts) |
TIFF 6.0 uncompressed (.tif) JPEG (.jpeg, .jpg, .jp2) if original created in this format |
GIF (.gif) TIFF other versions (.tif, .tiff) RAW image format (.raw) Photoshop files (.psd) BMP (.bmp) PNG (.png) Adobe Portable Document Format (PDF/A, PDF) (.pdf) |
Documentation |
PDF/UA, PDF/A or PDF (.pdf) Rich Text Format (.rtf) XHTML or HTML (.xhtml, .htm) OpenDocument Text (.odt) |
plain text (.txt) widely-used formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx) XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0 |