Distributive Crowdsourcing in Natural Language Processing
Nathan Bazan, Vassar College ’15 and Prof. Nancy Ide
In the field of natural language processing (NLP), programs and methods are developed to enable computers to “understand” human languages. These programs rely on language models built by analyzing large quantities of data annotated for linguistic properties such as part of speech, syntactic structure, and various types of semantic information. Many software programs and systems have been developed to automatically produce these annotations, but the accuracy of the results can often be very low, especially for semantic phenomena. Therefore, in order to create the required resources for NLP research and development, it is necessary to manually validate automatically-produced annotations to correct errors.
In the past, the American National Corpus (ANC) project has relied on hourly workers to perform manual validation of automatically-produced annotations, which is very costly and time-consuming. In recent years, crowdsourcing has provided a far cheaper and efficient means to accomplish such. In this project, we set up Amazon’s Mechanical Turk (AMT) platform to distribute HITs (Human Intelligence Tasks) over the web to online workers, who are asked to correct annotations for phenomena such as part of speech annotation, identification of noun and verb phrases, etc. Because our task requires some linguistic knowledge, workers are required to pass a qualification test by processing several HITs, which we then compare to previously human-processed results in order to assess their reliability. Qualified workers then go on to correcting data annotations. The HITs are collected and processed into a CSV file, and then passed into a Grails web-app developed in this project to examine the results.
Using crowdsourcing to handle quality control enables us validate far larger amounts of annotated data in the same time, and for far less cost, than by hiring hourly workers. We also increase reliability by obtaining results from up to 10 AMT workers per HIT and using the result for which there is greatest consensus.