Published December 10, 2014
| Version v1
Dataset
Open
Tübingen Dependency-Parsed German Wikipedia Treebank
Description
TüBa-D/W is a large treebank of modern written german, that follows common annotations standards and is freely available under a permissive license. The treebank is based on Wikipedia text and consists of 36.1 million sentences (615 million tokens) in CONLL-X format. The annotation layers are: part-of-speech tags, morphology, lemmas, and dependency structure.
Files
CMDI.xml
Files
(6.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:de2e32cf34f0d830142cd95272476833
|
7.5 kB | Preview Download |
|
md5:dcb60a17c28b25c0e1dfbdd3b369c72f
|
6.1 GB | Download |