Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Collection

Pütz, Sebastian

doi:10.57754/FDAT.n64dr-wre27

Published September 15, 2020 | Version v1

Collection Open

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Collection

Pütz, Sebastian (Researcher)¹

1. University of Tübingen

This is the collection, bracketing three research data sets.

The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem.

N-Gram range (inclusive): 3 - 6
Number of hashing buckets: 2^21
Hashing function: FNV-1a
Window size: 10
Negative Samples: 5
Dimensions: 300
Minimum Token Frequency: 30

Other (English)

Research carried out in work package A03 of the SFB 833.

Files

CMDI.xml

Files (13.7 kB)

Name	Size	Download all
CMDI.xml md5:6626910845b67d2e3c98265899cf5b36	13.7 kB	Preview Download

Additional details

Has part: Dataset: 10.57754/FDAT.eh5fz-7ec28 (DOI); Dataset: 10.57754/FDAT.2gr88-44y24 (DOI); Dataset: 10.57754/FDAT.q21vw-0fp88 (DOI)

Deutsche Forschungsgemeinschaft
SFB 833: Bedeutungskonstitution - Dynamik und Adaptivität sprachlicher Strukturen 75650358

Accuracy: Not specified.
Completeness: Not specified.
Conformity: Not specified.
Consistency: Not specified.
Credibility: Not specified.
Processability: Not specified.
Relevance: Not specified.
Timeliness: Not specified.
Understandability: Not specified.

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Collection

Other (English)

Files

CMDI.xml

Files (13.7 kB)

Additional details

Related works

Funding

Data quality

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Collection

Creators

Description

Other (English)

Files

CMDI.xml

Files (13.7 kB)

Additional details

Related works

Funding

Data quality