A Web Service Infrastructure for Language Processing Application Development
Alexandru Mahmoud ’18, Sule Marshall ’18, and Nancy Ide (Computer Science)
This project concerns a large NSF-funded project (“the Language Applications (LAPPS) Grid”) to develop an open, web-based infrastructure through which massive and distributed resources can be easily accessed, in whole or in part, and within which tailored language services can be efficiently composed, evaluated, disseminated, and consumed by researchers, developers, and students across a wide variety of disciplines. The project is collaborative among Vassar’s Computer Science Department and Carnegie-Mellon, Brandeis, and the University of Pennsylvania, as well as colleagues in Japan, Thailand, Indonesia, China, Germany, France, and the Czech Republic, with the goal of creating a massive international network of interoperable tools and data for natural language processing research and development. Our current work involves augmenting the LAPPS Grid to include a wider range of language processing (NLP) tools that will utilized in graduate-level Data Science courses by wrapping them as web services. They are then integrated into our instance of the Galaxy workflow engine, which serves as our front end. Galaxy, which was originally developed to facilitate genomics and other life sciences research, is an intuitive interface for pipelining tools in order to accomplish tasks such as information extraction, question answering, etc. Another of our current activities is to augment Galaxy to better accommodate the needs of NLP, as well as provide means to combine our information extraction facilities with Galaxy’s analytic and visualization tools to effectively utilize data from online life science publications.