OntoNotes Release 2.0
Description
The corpus contains 400k words of Chinese newswire data (from Xinhua News Agency and Sinorama Magazine) and 300k words of English newswire data (from the Wall Street Journal). OntoNotes Release 2.0 adds the following to the corpus: 274k words of Chinese broadcast news data (from China Broadcating System, China Central TV, China National Radio, China Television System and Voice of America); and 200k words of English broadcast news data (from ABC, CNN, NBC, Public Radio International and Voice of America).
Files
Additional details
Related works
- Is described by
- Service: https://catalog.ldc.upenn.edu/LDC2008T04 (URL)