TüBa-J/S: Tübinger Baumbank des Japanischen/Spontansprache

Kawata, Yasuhiro; Bartels, Julia

doi:10.57754/FDAT.mn7xy-y0b67

Published February 15, 2013 | Version v1

Dataset Restricted

TüBa-J/S: Tübinger Baumbank des Japanischen/Spontansprache

1. University of Tübingen

Die TüBa-J/S-Baumbank wurde im Projekt Verbmobil erstellt. Verbmobil war ein langfristig angelegtes Projekt zur maschinellen Übersetzung von Spontansprache, das vom Bundesministerium für Bildung und Forschung (BMBF) gefördert wurde. Die Tübinger Baumbank des Japanischen / Spontansprache (TüBa-J/S) ist ein syntaktisch annotiertes Korpus auf der Grundlage von spontansprachlichen Dialogen, die manuell transliteriert wurden. Sie umfasst ca. 18.000 Sätze bzw. 160.000 Wörter. Die Annotation erfolgte von Hand. Die syntaktische Annotation basiert auf HPSG Prinzipien. Das Annotationsschema unterscheidet drei Ebenen syntaktischer Konstituenz: die lexikalische Ebene, die phrasale Ebene und die Satzebene. Zusätzlich zur Konstituentenstruktur sind die Kanten zwischen den Knoten mit Labels annotiert. Diese Kantenlabels beschreiben grammatische Funktionen (als Relationen zwischen Phrasen) sowie die Unterscheidung zwischen Head und Non-Head (phrasenintern). Die Annotationen wurden 2006 beim CoNLL-X Shared Task: Multi-lingual Dependency Parsing als Trainingsdaten verwendet und sind in der normalen Baumbanklizenz enthalten.

Other (English)

The TüBa-J/S treebank was annotated in the project Verbmobil . Verbmobil was a longterm Machine Translation project for spontaneous speech funded by the Federal Ministry for Education and Research (BMBF). The Tübingen Treebank of Spoken Japanese, TüBa-J/S, is a syntactically annotated corpus based on spontaneous dialogues, which were manually transliterated. The treebank comprises approximately 18.000 sentences (ca. 160.000 words). The syntactic annotation was performed manually. The syntactic annotation is HPSG-oriented. The annotation scheme distinguishes three levels of syntactic constituency: the lexical level, the phrasal level, and the clausal level. In addition to constituent structure, annotated trees contain edge labels between nodes. These edge labels encode grammatical functions (as relation between phrases) and the distinction between heads and non-heads (as phrase-internal relations). Annotations were used as training data at the CoNLL-X Shared Task: Multi-lingual Dependency Parsing in 2006 and are included in the normal treebank license.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Alternative title (English): TüBa-J/S: Tübingen Treebank of Spoken Japanese

Bundesministerium für Forschung, Technologie und Raumfahrt

Accuracy: Not specified.
Completeness: Not specified.
Conformity: Not specified.
Consistency: Not specified.
Credibility: Not specified.
Processability: Not specified.
Relevance: Not specified.
Timeliness: Not specified.
Understandability: Not specified.

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

TüBa-J/S: Tübinger Baumbank des Japanischen/Spontansprache

Other (English)

Files

Restricted

Additional details

Additional titles

Funding

Data quality

TüBa-J/S: Tübinger Baumbank des Japanischen/Spontansprache

Creators

Description

Other (English)

Files

Restricted

Additional details

Additional titles

Funding

Data quality