Published December 10, 2014 | Version v1
Dataset Open

Tübingen Dependency-Parsed German Wikipedia Treebank

  • 1. ROR icon University of Tübingen

Description

TüBa-D/W is a large treebank of modern written german, that follows common annotations standards and is freely available under a permissive license. The treebank is based on Wikipedia text and consists of 36.1 million sentences (615 million tokens) in CONLL-X format. The annotation layers are: part-of-speech tags, morphology, lemmas, and dependency structure.

Files

CMDI.xml

Files (6.1 GB)

Name Size Download all
md5:de2e32cf34f0d830142cd95272476833
7.5 kB Preview Download
md5:dcb60a17c28b25c0e1dfbdd3b369c72f
6.1 GB Download

Additional details

Data quality

Accuracy

Not specified.

Completeness

Not specified.

Conformity

Not specified.

Consistency

Not specified.

Credibility

Not specified.

Processability

Not specified.

Relevance

Not specified.

Timeliness

Not specified.

Understandability

Not specified.