Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 2

Pütz, Sebastian

doi:10.57754/FDAT.2gr88-44y24

Published September 15, 2020 | Version v1

Dataset Open

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 2

Pütz, Sebastian (Researcher)¹

1. University of Tübingen

The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem.

N-Gram range (inclusive): 3 - 6
Number of hashing buckets: 2^21
Hashing function: FNV-1a
Window size: 10
Negative Samples: 5
Dimensions: 300
Minimum Token Frequency: 30

Other (English)

Research carried out in work package A03 of the SFB 833.

Files

CMDI_Part2.xml

Files (63.3 GB)

Name	Size	Download all
CMDI_Part2.xml md5:041145d1bb911ff7b428e17815e4eafc	28.9 kB	Preview Download
G-H-gl-he.zip md5:b87313439a4cbe11bdd872eef336f1b7	10.7 GB	Preview Download
H-hn-hr-hu.zip md5:82f91dd495c3b06b5e9816f80837d630	13.4 GB	Preview Download
I-id-it.zip md5:7fac23f9dc5ab9d8d55965f0355150b3	15.1 GB	Preview Download
J-ja.zip md5:0d656c127940cbc2ab2e48961bf25b1d	8.4 GB	Preview Download
R-ro-ru.zip md5:b0cf845fda7999ca447ed56d0ab7fbbb	15.7 GB	Preview Download

Additional details

Is part of: Collection: 10.57754/FDAT.n64dr-wre27 (DOI)

Deutsche Forschungsgemeinschaft
SFB 833: Bedeutungskonstitution - Dynamik und Adaptivität sprachlicher Strukturen 75650358

Accuracy: Not specified.
Completeness: Not specified.
Conformity: Not specified.
Consistency: Not specified.
Credibility: Not specified.
Processability: Not specified.
Relevance: Not specified.
Timeliness: Not specified.
Understandability: Not specified.

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 2

Other (English)

Files

CMDI_Part2.xml

Files (63.3 GB)

Additional details

Related works

Funding

Data quality

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 2

Creators

Description

Other (English)

Files

CMDI_Part2.xml

Files (63.3 GB)

Additional details

Related works

Funding

Data quality