Skip to contentSkip to site navigation
Computer Science
Completed Project

Crowdsourcing to Create Linguistic Annotations

Isabella Cuba, Vassar College ’17 and Prof. Nancy Ide
The field of natural language processing (NLP) concerns the development of applications that enable computers to understand human language such as machine translation, search and retrieval software (e.g. Google), and voice recognition, and to perform analyses that detect emotions, opinions, etc. in social media and other language data. At this time, NLP application development relies heavily on machine learning, which “teaches” computers to recognize various linguistic phenomena, usually by providing the machine with a substantial body of correct instances from which to learn. Thus the creation of language corpora with valid annotations for phenomena such as syntactic elements, named entities, and the like that can serve as a “gold standard” is an active area of work in the field. However, automatic generation of annotations for linguistic phenomena is often very far from 100% accurate, and so substantial manual work is required to generate linguistically annotated data that can be used to train machine learning algorithms. Manual creation or validation of linguistic annotations is very costly, due to the need to employ several highly trained annotators. However, research has shown that annotations created by multiple unskilled workers can be mined to gain highly accurate results. We are exploiting this fact by using Amazon’s Mechanical Turk to distribute HITs (Human Intelligence Tasks) to anonymous online workers, who corrected automatically generated noun chunks (shallow parse elements). The workers are given a qualification test to assess reliability, by comparing their performance on a small subset of data and comparing to human-generated results. Once workers are qualified, they are given new data to correct; occasionally, an instance with a known results is included in order to serve as a reliability benchmark. For each instance, we collect HITS from 10-20 workers. A companion URSI project focused on developing means to analyze results in order to determine the best answer based on the results provided by multiple annotators.