Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1

Pütz, Sebastian

doi:10.57754/FDAT.eh5fz-7ec28

Published September 15, 2020 | Version v1

Dataset Open

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1

Pütz, Sebastian (Researcher)¹

1. University of Tübingen

The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem.

N-Gram range (inclusive): 3 - 6
Number of hashing buckets: 2^21
Hashing function: FNV-1a
Window size: 10
Negative Samples: 5
Dimensions: 300
Minimum Token Frequency: 30

Other (English)

Research carried out in work package A03 of the SFB 833.

Files

CMDI_Part1.xml

Files (94.9 GB)

Name	Size	Download all
A-B-ar-bg.zip md5:723f6154a22664631bc1ed7ec86012ee	11.4 GB	Preview Download
C-ca-cs.zip md5:c8017a9eaf60fe786429d9903c4890c5	12.6 GB	Preview Download
CMDI_Part1.xml md5:7f3c6a491afe8d98e1fabee988beb15c	32.1 kB	Preview Download
D-da-de.zip md5:d6d851e829c1dcb5e2d3a805a9afd951	16.7 GB	Preview Download
E1-el-en.zip md5:da2ae4803fc557de9ac5e86eb440617b	15.2 GB	Preview Download
E2-es-et.zip md5:2fab0a5480b6641ab1406289dce5ceb9	13.4 GB	Preview Download
E3-eu.zip md5:8e18e7a2e376043a86e839b322efdecc	5.2 GB	Preview Download
F1-fa-fi.zip md5:7dc2a89784b8c17074772726b299f811	12.8 GB	Preview Download
F2-fr.zip md5:41322143d6f9a91ca27079f7a9ab1dc4	7.6 GB	Preview Download

Additional details

Is part of: Collection: 10.57754/FDAT.n64dr-wre27 (DOI)

Deutsche Forschungsgemeinschaft
SFB 833: Bedeutungskonstitution - Dynamik und Adaptivität sprachlicher Strukturen 75650358

Accuracy: Not specified.
Completeness: Not specified.
Conformity: Not specified.
Consistency: Not specified.
Credibility: Not specified.
Processability: Not specified.
Relevance: Not specified.
Timeliness: Not specified.
Understandability: Not specified.

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1

Other (English)

Files

CMDI_Part1.xml

Files (94.9 GB)

Additional details

Related works

Funding

Data quality

Embeddings trained on CONLL2017 Corpora (conll2017-embeddings) - Part 1

Creators

Description

Other (English)

Files

CMDI_Part1.xml

Files (94.9 GB)

Additional details

Related works

Funding

Data quality