Published September 15, 2020 | Version v1
Dataset Open

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 3

  • 1. ROR icon University of Tübingen

Description

The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem.

  • N-Gram range (inclusive): 3 - 6
  • Number of hashing buckets: 2^21
  • Hashing function: FNV-1a
  • Window size: 10
  • Negative Samples: 5
  • Dimensions: 300
  • Minimum Token Frequency: 30

             

Other (English)

Research carried out in work package A03 of the SFB 833.

Files

CMDI_Part3.xml

Files (94.8 GB)

Name Size Download all
md5:d3e2afc9b10b713b188b207a7ec46b58
32.3 kB Preview Download
md5:94956fe63496c55681c0b1aadae8805d
6.4 GB Preview Download
md5:d770e24d7ff460314cc8f377541654e1
10.8 GB Preview Download
md5:fe04617b5c0815d0f530acf0410ca940
13.4 GB Preview Download
md5:90300040756128cdd8658eecc3cf57bc
17.2 GB Preview Download
md5:f63f2c123b325f494a29c18426d63f3b
11.8 GB Preview Download
md5:93b920f41d97be2e6bcd7c16fc2ef864
16.5 GB Preview Download
md5:bdf2f6fd8cfb52d52308b70e439dda9f
12.2 GB Preview Download
md5:9138c3ac609298775eb0fcf131f42889
6.6 GB Preview Download

Additional details

Related works

Is part of
Collection: 10.57754/FDAT.n64dr-wre27 (DOI)

Funding

Deutsche Forschungsgemeinschaft
SFB 833:  Bedeutungskonstitution - Dynamik und Adaptivität sprachlicher Strukturen 75650358

Data quality

Accuracy

Not specified.

Completeness

Not specified.

Conformity

Not specified.

Consistency

Not specified.

Credibility

Not specified.

Processability

Not specified.

Relevance

Not specified.

Timeliness

Not specified.

Understandability

Not specified.