Published March 14, 2017 | Version v1
Dataset Open

GerCo: German Adjective-Noun Collocations Datase

  • 1. ROR icon University of Tübingen
  • 2. ROR icon Berlin-Brandenburg Academy of Sciences and Humanities

Description

The dataset contains 4732 adjective-noun pairs extracted from the DWDS corpora [1] with the application Wortprofil [2]. All the phrases have been annotated by two experts as collocations vs non-collocations. The non-collocations have been further classified by one of the annotators as free phrases, idioms, named entities, and terms.

If you want to use this dataset for research purposes, please refer to the following paper:

- Yana Strakatova, Neele Falk, Isabel Fuhrmann, Daniela Rossmann, Erhard Hinrichs. All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German. 2019.

References:

[1]: DWDS – Digitales Wörterbuch der deutschen Sprache. Das Wortauskunftssystem zur deutschen Sprache in Geschichte und Gegenwart, hrsg. v. d. Berlin-Brandenburgischen Akademie der Wissenschaften.

[2]: DWDS-Wortprofil, erstellt durch das Digitale Wörterbuch der deutschen Sprache.

Abstract (German)

Gegenstand des Projektes ist die lexikalisch-semantische Modellierung und Beschreibung unterschiedlicher Aspekte von Kollokationen.

Die wesentlichen Forschungsfragen in diesem Projekt lauten:

  • a) auf welche Weise und in welchem theoretischen Rahmen die lexikalisch-semantische Gruppierung von Kollokanten zu einer Kollokationsbasis gruppiert und beschrieben werden können;
  • b) ob und in welcher Weise Kollokanten, die mehreren Kollokationsbasen auf einer paradigmatischen Achse gemeinsam sind, zu einer generalisierten Beschreibung dieser Kollokationsbasen beitragen können.

Im Rahmen dieses Projekts werden die Meaning-Text Theorie von Igor Melauk, besonders die Lexikalischen Funktionen, und die Theorie des Generativen Lexikons von James Pustejovsky, besonders die Qualiarollen, als gemeinsamer Rahmen zusammengeführt.

Es wird untersucht, ob eine angemessene Modellierung oben beschriebener Aspekte von Kollokationen mit einer Synthese der beiden genannten theoretischen Ansätze möglich ist.

Ein zentrales Ergebnis des Projektes wird eine Handreichung für die Modellierung von Kollokationen mit den aus dem Übergreifenden theoretischen Rahmen abgeleiteten Beschreibungsmitteln sein.

Abstract (English)

The project aims at modeling and describing lexical-semantic properties of collocations. The main research topics of the project are:

  • a) how and according to which theoretical framework the lexical-semantic grouping of collocations can be performed and described;
  • b) whether and how it is possible to group collocators by their semantic similarity and to compare and represent (sets of) collocations based on the semantic relatedness of their collocational bases.

The research will draw on the "Meaning Text Theory", which has been elaborated by Igor Mel'čuk and which provides the concept of "Lexical Function", and on the theory of the "Generative Lexicon" by James Pustejovsky, providing the concept of "qualia structure".

The possibility of synthesizing the two theoretical frameworks to build an appropriate and adequate modelling of the described aspects of collocations will be investigated.

A central outcome of the project will be a set of guidelines for modeling collocations based on the resulting specification of the cross-theoretical framework.

Files

CMDI.xml

Files (14.3 MB)

Name Size Download all
md5:6213f29ba052896d53ca3c44d1695df4
132.9 kB Preview Download
md5:4c39be3a3246ebae18e166f6b486513f
133.6 kB Preview Download
md5:511aa886d8f722f8855dadafbb76e072
132.0 kB Preview Download
md5:885c37f74798d30362f97ab6aa9801d6
134.5 kB Preview Download
md5:41e2fc72fcd8af92bde77940eb26b68a
135.6 kB Preview Download
md5:31cd8693eeaef24080d270ff247267d4
132.8 kB Preview Download
md5:c525fccf089dd995664b4a19aef2f387
269.1 kB Preview Download
md5:c5a9fc1af7920052940ce0a27266af10
271.0 kB Preview Download
md5:ee46d0efa35f8e49ccb1fb3c4027a9d8
282.2 kB Preview Download
md5:374050eb57dc40c0b459c204df6b52e1
262.9 kB Preview Download
md5:e1fb9d98bdf7004a70df15ab8e0512c9
241.2 kB Preview Download
md5:d6e41f81d8226afa0f13280976135cab
271.9 kB Preview Download
md5:b959b635d812f9602d406c38b11caf2c
1.2 MB Preview Download
md5:59cded7fda02cc8092285fdb3a159789
1.2 MB Preview Download
md5:8f51116dfd101eab5e983872eadbdd68
1.2 MB Preview Download
md5:cc4ed0b92fa21d7cd45b333508c9e0c2
1.2 MB Preview Download
md5:e7b9d0a684cfde5db5a89c21be3f13b2
1.2 MB Preview Download
md5:fd9d4300df5aae2a4b85007f9d9b78c8
1.2 MB Preview Download
md5:4da30d2dbeb2a3766e5bd515ed91f0cc
46.8 kB Preview Download
md5:133cb42265d6d76a2a1f822e2c8a5d4d
63.3 kB Preview Download
md5:3531498a39f0c8f74ac861ef8337ac1b
65.2 kB Preview Download
md5:cc6d74310da48f33b370f4e68656ec11
62.2 kB Preview Download
md5:9a15a7b830e708a54a05adf92a370f62
63.9 kB Preview Download
md5:a63ea88997c40e6c738dcf7fb91446f3
64.0 kB Preview Download
md5:107c9178c2b95ec1b349dca03530d984
60.4 kB Preview Download
md5:82141d8d94397a383a9fc44cc1966f69
95.0 kB Preview Download
md5:f3a25e72ea94372a60fbf5836b8a2bca
4.8 kB Preview Download
md5:b6f4e78e422d5d6bb76513c1066781f4
128.0 kB Preview Download
md5:64e7c945e343b9b2457fff493ebb0b97
127.1 kB Preview Download
md5:8095a5ad9f473606fa57d9bd5fdbbcdc
135.0 kB Preview Download
md5:c757d3f734080e07eebe1d8230b04853
121.2 kB Preview Download
md5:6aad48f8252d29d62a7ffd730012972f
115.6 kB Preview Download
md5:7740c1be832d9d1d0bfb7bb3b603b77f
131.4 kB Preview Download
md5:2ae1a5e381b6ac7c00ec80f0f15c0ed6
566.9 kB Preview Download
md5:40b429aab475755c13d554e1c6279fbb
565.9 kB Preview Download
md5:c054737a858b7ceb9295f8ddf57de692
561.0 kB Preview Download
md5:a170310f2897adb81b93fec8ca49a915
573.0 kB Preview Download
md5:6250ae5891a862918c74a40830aa5281
578.5 kB Preview Download
md5:347edb364b3fffa09ff425e659e3918f
566.4 kB Preview Download

Additional details

Related works

Is cited by
Data paper: https://aclanthology.org/2020.lrec-1.538/ (URL)

Funding

Deutsche Forschungsgemeinschaft
Modellierung lexikalisch-semantischer Beziehungen von Kollokationen (MoKo) 322096725

Data quality

Accuracy

Not specified.

Completeness

Not specified.

Conformity

Not specified.

Consistency

Not specified.

Credibility

Not specified.

Processability

Not specified.

Relevance

Not specified.

Timeliness

Not specified.

Understandability

Not specified.