<?xml version="1.0" ?><?xml-stylesheet type='text/xsl' href='/cmdixsl/templates.xsl'?><CMD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.clarin.eu/cmd/1" xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1527668176123" CMDVersion="1.2" xsi:schemaLocation="http://www.clarin.eu/cmd/1 https://infra.clarin.eu/CMDI/1.x/xsd/cmd-envelop.xsd http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1527668176123 https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1527668176123/1.2/xsd">
<Header>
<MdCreator>nnsdg01@uni-tuebingen.de</MdCreator>
<MdCreationDate>2019-05-01</MdCreationDate>
<MdSelfLink>https://doi.org/10.57754/FDAT.0yjbq-rze07</MdSelfLink>
<MdProfile>clarin.eu:cr1:p_1527668176123</MdProfile>
<MdCollectionDisplayName>Tübingen Archive of Language Resources (TALAR)</MdCollectionDisplayName>
</Header>
<Resources>
<ResourceProxyList>
<ResourceProxy id="metadata110220000-0007-D3BB-8">
<ResourceType>Metadata</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="landingpage110220000-0007-D3BB-8">
<ResourceType>LandingPage</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="dev_texttxt110220000-0007-D3BB-8">
<ResourceType>Resource</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="encow-adj-nbin110230000-0007-D3BB-8">
<ResourceType>Resource</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="eng-adj-n-readmetxt110240000-0007-D3BB-8">
<ResourceType>Resource</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="test_texttxt110250000-0007-D3BB-8">
<ResourceType>Resource</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
<ResourceProxy id="train_texttxt110260000-0007-D3BB-8">
<ResourceType>Resource</ResourceType>
<ResourceRef>https://doi.org/10.57754/FDAT.0yjbq-rze07</ResourceRef>
</ResourceProxy>
</ResourceProxyList>
<JournalFileProxyList/>
<ResourceRelationList/>
</Resources>
<IsPartOfList>
<IsPartOf>https://doi.org/10.57754/FDAT.721tn-jef87</IsPartOf>
</IsPartOfList>
<Components>
<cmdp:LexicalResourceProfile>
<cmdp:GeneralInfo>
<cmdp:ResourceName xml:lang="en">eng-adj-n</cmdp:ResourceName>
<cmdp:ResourceTitle xml:lang="en">English Adjective-Noun Phrase Dataset for Compositionality Tests</cmdp:ResourceTitle>
<cmdp:ResourceClass>Lexicon</cmdp:ResourceClass>
<cmdp:Version xml:lang="en">1</cmdp:Version>
<cmdp:LifeCycleStatus>development</cmdp:LifeCycleStatus>
<cmdp:StartYear>2019</cmdp:StartYear>
<cmdp:CompletionYear>2019</cmdp:CompletionYear>
<cmdp:PublicationDate>May 2019</cmdp:PublicationDate>
<cmdp:LastUpdate>May 2019</cmdp:LastUpdate>
<cmdp:FieldOfResearch>Computational Linguistics</cmdp:FieldOfResearch>
<cmdp:Location>
<cmdp:Address>Seminar für Sprachwissenschaft, Wilhelmstr. 19, D-72074 Tübingen</cmdp:Address>
<cmdp:Country>
<cmdp:CountryName xml:lang="de">Deutschland</cmdp:CountryName>
<cmdp:CountryCoding>DE</cmdp:CountryCoding>
</cmdp:Country>
</cmdp:Location>
<cmdp:Descriptions>
<cmdp:Description type="short" xml:lang="en">
If you want to use this dataset for research purposes, please refer to the following sources:
- Roland Schäfer. 2015. Processing and querying large web corpora with the COW14 architecture. In Proceedings of Challenges in the Management of Large Corpora 3 (CMLC-3), Lancaster. UCREL, IDS.
- Roland Schäfer and Felix Bildhauer. 2012. Building Large Corpora from the Web Using a New Efficient Tool Chain. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 486–493, Istanbul, Turkey. European Language Resources Association (ELRA).
- Corina Dima, Daniël de Kok, Neele Witte, Erhard Hinrichs. 2019. No word is an island — a transformation weighting model for semantic composition. Transactions of the Association for Computational Linguistics.
The dataset is distributed under the Creative Commons Attribution NonCommercial (CC-BY-NC) license.
This dataset contains 238,975 English adjective-noun phrases (split into 167,292 train, 47,803 test, 23,880 dev instances) that were automatically extracted from the ENCOW16AX treebank (Schäfer and Bildhauer, 2012; Schäfer, 2015).
The phrases were extracted with the help of the part-of-speech tag information provided by the treebank.
The train/test/dev files have the following format, single parts separated by tab:
adjective noun adj-noun phrase, where the adjective and the noun are separated by the string _adj_n_ (e.g. good networking good_adj_n_networking).
For results of different composition models on this dataset see Dima et al. (2019) ), No word is an island — a transformation weighting model for semantic composition.
The word embeddings were trained on ENCOW16AX, which contains crawled web data from different sources. The training corpus was filtered to only contain sentences with a document quality of a or b to avoid noisy data.
To ensure that trained word embeddings for enough adjective-noun phrases are available, the embeddings were trained on word forms, instead of lemmas. The final training corpus for the word embeddings contains 89.0M sentences and 2.2B tokens.
The embeddings for the adjectives, nouns and phrases were trained jointly, with the word2vec package (Mikolov et al. 2013), using the skipgram model with negative sampling, a symmetric window of 10 as context size, 25 negative samples
per positive training instance and a sample probability threshold of 0.0001. The resulting embeddings have a dimension of 200 and the vocabulary size is 478,372.
The minimum frequency cut-off was set to 50 for all words and phrases.
The embeddings are stored in the binary format of word2vec in encow-adj-n.bin. This format can be loaded by several packages (e.g. the gensim package of Řehůřek, Radim and Petr Sojka (2010)).
</cmdp:Description>
</cmdp:Descriptions>
<cmdp:tags>
<cmdp:tag xml:lang="en">English adjective-noun phrases</cmdp:tag>
<cmdp:tag xml:lang="en">semantic composition</cmdp:tag>
<cmdp:tag xml:lang="en">compositional distributional representations</cmdp:tag>
<cmdp:tag xml:lang="en">phrase representations</cmdp:tag>
</cmdp:tags>
<cmdp:ModalityInfo>
<cmdp:Modalities>written</cmdp:Modalities>
<cmdp:Descriptions>
<cmdp:Description type="short"/>
</cmdp:Descriptions>
</cmdp:ModalityInfo>
</cmdp:GeneralInfo>
<cmdp:Project>
<cmdp:ProjectName>SFB 833 A3</cmdp:ProjectName>
<cmdp:ProjectTitle xml:lang="de">Korpusbasierte Semantische Kompositionsmodelle für Phrasen</cmdp:ProjectTitle>
<cmdp:ProjectTitle xml:lang="en">Corpus-based Semantic Composition Models for Phrases</cmdp:ProjectTitle>
<cmdp:ProjectID>75650358</cmdp:ProjectID>
<cmdp:Url>http://www.sfb833.uni-tuebingen.de/a-bereich-kontext/a3-hinrichsde-kok.html</cmdp:Url>
<cmdp:Funder>
<cmdp:fundingAgency>Deutsche Forschungsgemeinschaft (DFG)</cmdp:fundingAgency>
</cmdp:Funder>
<cmdp:Institution>
<cmdp:Department xml:lang="de">Sonderforschungsbereich 833: Bedeutungskonstitution-
Dynamik und Adaptivität sprachlicher Strukturen</cmdp:Department>
<cmdp:Department xml:lang="en">SFB 833: The construction of meaning - the dynamics
and adaptivity of linguistic structures</cmdp:Department>
<cmdp:Url>http://www.sfb833.uni-tuebingen.de/</cmdp:Url>
<cmdp:Organisation>
<cmdp:name>Eberhard Karls Universität Tübingen</cmdp:name>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/155435537</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://d-nb.info/gnd/36187-2</cmdp:id>
<cmdp:issuingAuthority>GND</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://isni.org/isni/0000000121901447</cmdp:id>
<cmdp:issuingAuthority>ISNI</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Organisation>
<cmdp:Descriptions>
<cmdp:Description xml:lang="de">Der SFB 833 untersucht die Emergenz von
Bedeutung. Mit einer Neuakzentuierung der Zeitabhängigkeit der
Interpretation von Sprache richten wir unser gemeinsames
Erkenntnisinteresse auf die Erforschung sprachlicher Bedeutung in ihrer
dynamischen Anpassung an die sie beeinflussenden Faktoren. Gegenstand
des SFB ist die Frage, wie Bedeutung entsteht, (a) im Kontext, (b)
während der Sprachverarbeitung und (c) unter den spezifischen
Bedingungen einer Einzelgrammatik. An dem Forschungsverbund sind
Sprachwissenschaft – Allgemeine Sprachwissenschaft, Computerlinguistik
und Einzelphilologien – sowie Kognitionswissenschaften – Psychologie und
Neurowissenschaften – beteiligt.
</cmdp:Description>
<cmdp:Description xml:lang="en">The SFB 833 investigates the emergence of
meaning. Focusing on the time dimension in the interpretation of
language, we pursue the common goal of exploring linguistic meaning and
its dynamic response to the factors which impact upon it. The central
research question of the SFB is thus how meaning arises (a) in context,
(b) during linguistic processing, (c) in the specific circumstances of
an individual language variety. The joint research group contains
linguists – general linguists, computer linguists and specialists in
individual languages – and cognitive scientists – psychologists and
neuroscientists.
</cmdp:Description>
</cmdp:Descriptions>
</cmdp:Institution>
<cmdp:Person>
<cmdp:firstName>Erhard</cmdp:firstName>
<cmdp:lastName>Hinrichs</cmdp:lastName>
<cmdp:role>Project leader</cmdp:role>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>http://d-nb.info/gnd/143840657</cmdp:id>
<cmdp:issuingAuthority>GND</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/37069402</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://isni.org/0000000118749683</cmdp:id>
<cmdp:issuingAuthority>ISNI</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Person>
<cmdp:Person>
<cmdp:firstName>Daniël</cmdp:firstName>
<cmdp:lastName>de Kok</cmdp:lastName>
<cmdp:role>Project leader</cmdp:role>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id/>
<cmdp:issuingAuthority>GND</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/305807824</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id/>
<cmdp:issuingAuthority>ISNI</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Person>
<cmdp:Duration>
<cmdp:StartYear>2009</cmdp:StartYear>
</cmdp:Duration>
</cmdp:Project>
<cmdp:Publications>
<cmdp:Publication>
<cmdp:PublicationTitle xml:lang="en">Building Large Corpora from the Web Using a New Efficient Tool Chain</cmdp:PublicationTitle>
<cmdp:resolvablePID/>
<cmdp:Author>
<cmdp:firstName>Roland</cmdp:firstName>
<cmdp:lastName>Schäfer</cmdp:lastName>
</cmdp:Author>
<cmdp:Author>
<cmdp:firstName>Felix</cmdp:firstName>
<cmdp:lastName>Bildhauer</cmdp:lastName>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>https://orcid.org/0000-0002-6567-5987</cmdp:id>
<cmdp:issuingAuthority>ORCID</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Author>
<cmdp:Descriptions>
<cmdp:Description xml:lang="en" type="long">Over the last decade, methods of web corpus construction and the evaluation of web corpora have been actively researched. Prominently, the WaCky initiative has provided both theoretical results and a set of web corpora for selected European languages. We present a software toolkit for web corpus construction and a set of siginificantly larger corpora (up to over 9 billion tokens) built using this software. First, we discuss how the data should be collected to ensure that it is not biased towards certain hosts. Then, we describe our software toolkit which performs basic cleanups as well as boilerplate removal, simple connected text detection as well as shingling to remove duplicates from the corpora. We finally report evaluation results of the corpora built so far, for example w. r. t. the amount of duplication contained and the text type/genre distribution. Where applicable, we compare our corpora to the WaCky corpora, since it is inappropriate, in our view, to compare web corpora to traditional or balanced corpora. While we use some methods applied by the WaCky initiative, we can show that we have introduced incremental improvements.</cmdp:Description>
</cmdp:Descriptions>
</cmdp:Publication>
<cmdp:Publication>
<cmdp:PublicationTitle xml:lang="en">Processing and querying large web corpora with the COW14 architecture</cmdp:PublicationTitle>
<cmdp:resolvablePID/>
<cmdp:Author>
<cmdp:firstName>Roland</cmdp:firstName>
<cmdp:lastName>Schäfer</cmdp:lastName>
</cmdp:Author>
<cmdp:Descriptions>
<cmdp:Description xml:lang="en" type="long">In this paper, I present the COW14 tool chain, which comprises a web corpus creation tools called texrex, wrappers for existing linguistic annotation tools as well as an online query software called Colibri2 . By detailed descriptions of the implementation and systematic evaluations of the performance of the software on different types of systems, I show that the COW14 architecture is capable of handling the creation of corpora of up to at least 100 billion tokens. I also introduce our running demo system which currently serves corpora of up to roughly 20 billion tokens in Dutch, Englisch, French, German, Spanish, and Swedish.</cmdp:Description>
</cmdp:Descriptions>
</cmdp:Publication>
<cmdp:Publication>
<cmdp:PublicationTitle xml:lang="en">No word is an island — a transformation weighting model for semantic composition</cmdp:PublicationTitle>
<cmdp:resolvablePID/>
<cmdp:Author>
<cmdp:firstName>Corina</cmdp:firstName>
<cmdp:lastName>Dima</cmdp:lastName>
</cmdp:Author>
<cmdp:Author>
<cmdp:firstName>Daniël</cmdp:firstName>
<cmdp:lastName>de Kok</cmdp:lastName>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/305807824</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Author>
<cmdp:Author>
<cmdp:firstName>Neele</cmdp:firstName>
<cmdp:lastName>Witte</cmdp:lastName>
</cmdp:Author>
<cmdp:Author>
<cmdp:firstName>Erhard</cmdp:firstName>
<cmdp:lastName>Hinrichs</cmdp:lastName>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/37069402</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://d-nb.info/gnd/143840657</cmdp:id>
<cmdp:issuingAuthority>GND</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
<cmdp:AuthoritativeID>
<cmdp:id>http://isni.org/0000000118749683</cmdp:id>
<cmdp:issuingAuthority>ISNI</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Author>
<cmdp:Descriptions>
<cmdp:Description xml:lang="en" type="long">Composition models of distributional semantics are used to construct phrase representations from the representations of their words.
Composition models are typically situated on two ends of a spectrum.
They either have a small number of parameters but compose all phrases in the same way, or they perform word-specific compositions at the cost of a far larger number of parameters.
In this paper we propose transformation weighting (TransWeight), a composition model that consistently outperforms existing models on nominal compounds, adjective-noun phrases and adverb-adjective phrases in English, German and Dutch.
TransWeight drastically reduces then umber of parameters needed compared to the best model in the literature by composing similar words in the same way.</cmdp:Description>
</cmdp:Descriptions>
</cmdp:Publication>
</cmdp:Publications>
<cmdp:Creation>
<cmdp:Creators>
<cmdp:Person>
<cmdp:firstName>Daniël</cmdp:firstName>
<cmdp:lastName>de Kok</cmdp:lastName>
<cmdp:role>creator</cmdp:role>
<cmdp:AuthoritativeIDs>
<cmdp:AuthoritativeID>
<cmdp:id>http://viaf.org/viaf/305807824</cmdp:id>
<cmdp:issuingAuthority>VIAF</cmdp:issuingAuthority>
</cmdp:AuthoritativeID>
</cmdp:AuthoritativeIDs>
</cmdp:Person>
<cmdp:Descriptions>
<cmdp:Description>Researcher in the A3 project.</cmdp:Description>
</cmdp:Descriptions>
</cmdp:Creators>
<cmdp:CreationToolInfo>
<cmdp:CreationTool xml:lang="en"/>
<cmdp:ToolType xml:lang="en"/>
<cmdp:Url/>
<cmdp:Descriptions>
<cmdp:Description type="short" xml:lang="en"/>
</cmdp:Descriptions>
</cmdp:CreationToolInfo>
<cmdp:Source>
<cmdp:MediaFiles>
<cmdp:MediaFile>
<cmdp:CatalogueLink/>
<cmdp:Type>Unknown</cmdp:Type>
<cmdp:Quality>Unknown</cmdp:Quality>
<cmdp:RecordingConditions/>
<cmdp:Position>
<cmdp:PositionType/>
<cmdp:StartPosition/>
<cmdp:EndPosition/>
</cmdp:Position>
<cmdp:Access>
<cmdp:Availability>public</cmdp:Availability>
<cmdp:Contact>
<cmdp:email/>
<cmdp:role/>
<cmdp:Address>
<cmdp:street/>
<cmdp:ZIPCode/>
<cmdp:city/>
</cmdp:Address>
</cmdp:Contact>
</cmdp:Access>
</cmdp:MediaFile>
</cmdp:MediaFiles>
</cmdp:Source>
</cmdp:Creation>
<cmdp:Documentations>
<cmdp:Documentation/>
</cmdp:Documentations>
<cmdp:LexicalResourceContext>
<cmdp:SubjectLanguages>
<cmdp:NumberOfLanguages>1</cmdp:NumberOfLanguages>
<cmdp:SubjectLanguage>
<cmdp:Language>
<cmdp:LanguageName xml:lang="en">English</cmdp:LanguageName>
<cmdp:ISO639>
<cmdp:iso-639-3-code>deu</cmdp:iso-639-3-code>
</cmdp:ISO639>
</cmdp:Language>
</cmdp:SubjectLanguage>
</cmdp:SubjectLanguages>
<cmdp:TypeSpecificSizeInfo>
<cmdp:TypeSpecificSize>
<cmdp:Size>238975</cmdp:Size>
<cmdp:SizeUnit>phrases</cmdp:SizeUnit>
</cmdp:TypeSpecificSize>
</cmdp:TypeSpecificSizeInfo>
</cmdp:LexicalResourceContext>
<cmdp:Access>
<cmdp:Availability xml:lang="en">public</cmdp:Availability>
<cmdp:DistributionMedium>download</cmdp:DistributionMedium>
<cmdp:Licence>CC-BY-NC</cmdp:Licence>
<cmdp:Contact>
<cmdp:firstname>Daniël</cmdp:firstname>
<cmdp:lastname>de Kok</cmdp:lastname>
<cmdp:email>daniel.de-kok@uni-tuebingen.de</cmdp:email>
<cmdp:telephoneNumber/>
<cmdp:role>creator</cmdp:role>
<cmdp:Address>
<cmdp:street>Wilhelmstr. 19</cmdp:street>
<cmdp:ZIPCode>72074</cmdp:ZIPCode>
<cmdp:city>Tübingen</cmdp:city>
</cmdp:Address>
</cmdp:Contact>
</cmdp:Access>
<cmdp:ResourceProxyListInfo>
<cmdp:ResourceProxyInfo xmlns:ns1="http://www.clarin.eu/cmd/1" ns1:ref="dev_texttxt110220000-0007-D3BB-8">
<cmdp:ResProxItemName/>
<cmdp:ResProxFileName> dev_text.txt </cmdp:ResProxFileName>
<cmdp:SizeInfo>
<cmdp:TotalSize>
<cmdp:Size> 921642 </cmdp:Size>
<cmdp:SizeUnit>B</cmdp:SizeUnit>
</cmdp:TotalSize>
</cmdp:SizeInfo>
<cmdp:Checksums>
<cmdp:md5> e6bd54aec5f35e6fb917bc0054ce701e </cmdp:md5>
<cmdp:sha1> 70ed70a9936cfc63314f763d52a095a766b0948c </cmdp:sha1>
<cmdp:sha256> 5a9d6012852f8a761b30572784893a4116cb7c34ef9d0691c5ec9c676f5e473c
</cmdp:sha256>
</cmdp:Checksums>
</cmdp:ResourceProxyInfo>
<cmdp:ResourceProxyInfo xmlns:ns1="http://www.clarin.eu/cmd/1" ns1:ref="encow-adj-nbin110230000-0007-D3BB-8">
<cmdp:ResProxItemName/>
<cmdp:ResProxFileName> encow-adj-n.bin </cmdp:ResProxFileName>
<cmdp:SizeInfo>
<cmdp:TotalSize>
<cmdp:Size> 390549412 </cmdp:Size>
<cmdp:SizeUnit>B</cmdp:SizeUnit>
</cmdp:TotalSize>
</cmdp:SizeInfo>
<cmdp:Checksums>
<cmdp:md5> 5d59d00b17b1bdf6813cadcfff9a5d9c </cmdp:md5>
<cmdp:sha1> c00a02baa8f8ebd8d00b6abbdf917eb95d1edf57 </cmdp:sha1>
<cmdp:sha256> 855a0254441623c57e93dff86fe35a3a08a86badb1644069804de11685841b85
</cmdp:sha256>
</cmdp:Checksums>
</cmdp:ResourceProxyInfo>
<cmdp:ResourceProxyInfo xmlns:ns1="http://www.clarin.eu/cmd/1" ns1:ref="eng-adj-n-readmetxt110240000-0007-D3BB-8">
<cmdp:ResProxItemName/>
<cmdp:ResProxFileName> eng-adj-n-readme.txt </cmdp:ResProxFileName>
<cmdp:SizeInfo>
<cmdp:TotalSize>
<cmdp:Size> 2951 </cmdp:Size>
<cmdp:SizeUnit>B</cmdp:SizeUnit>
</cmdp:TotalSize>
</cmdp:SizeInfo>
<cmdp:Checksums>
<cmdp:md5> d2c7a67ad9458501a860f1d7e60251ef </cmdp:md5>
<cmdp:sha1> 4177463de0b4f66394837d2a188f7c901594a1b4 </cmdp:sha1>
<cmdp:sha256> 8a48ae8d56dbe2acdc3b0ec54589daab6ee8fb2284fc9dc3dfaf5129e3bb9cb7
</cmdp:sha256>
</cmdp:Checksums>
</cmdp:ResourceProxyInfo>
<cmdp:ResourceProxyInfo xmlns:ns1="http://www.clarin.eu/cmd/1" ns1:ref="test_texttxt110250000-0007-D3BB-8">
<cmdp:ResProxItemName/>
<cmdp:ResProxFileName> test_text.txt </cmdp:ResProxFileName>
<cmdp:SizeInfo>
<cmdp:TotalSize>
<cmdp:Size> 1843248 </cmdp:Size>
<cmdp:SizeUnit>B</cmdp:SizeUnit>
</cmdp:TotalSize>
</cmdp:SizeInfo>
<cmdp:Checksums>
<cmdp:md5> 3cbdc5fad50b3e49f015fd998a58a7d3 </cmdp:md5>
<cmdp:sha1> b9ada946207e333dde50ad5ead58118e2cb363cb </cmdp:sha1>
<cmdp:sha256> 982c241d2bed2ae360e09baf2e2a12d18fa65d3aec256155cceb9d1af9ca791e
</cmdp:sha256>
</cmdp:Checksums>
</cmdp:ResourceProxyInfo>
<cmdp:ResourceProxyInfo xmlns:ns1="http://www.clarin.eu/cmd/1" ns1:ref="train_texttxt110260000-0007-D3BB-8">
<cmdp:ResProxItemName/>
<cmdp:ResProxFileName> train_text.txt </cmdp:ResProxFileName>
<cmdp:SizeInfo>
<cmdp:TotalSize>
<cmdp:Size> 6455176 </cmdp:Size>
<cmdp:SizeUnit>B</cmdp:SizeUnit>
</cmdp:TotalSize>
</cmdp:SizeInfo>
<cmdp:Checksums>
<cmdp:md5> ba2142066543aaecfb2e155efb1ece3e </cmdp:md5>
<cmdp:sha1> f2973e5195e7e687c4dd1eddc89df894cedcbeb4 </cmdp:sha1>
<cmdp:sha256> 9b0674bd0cdd02f04cde16012f4244f892c21d4db63bb469ad19b3c615678320
</cmdp:sha256>
</cmdp:Checksums>
</cmdp:ResourceProxyInfo>
</cmdp:ResourceProxyListInfo>
</cmdp:LexicalResourceProfile>
</Components>
</CMD>