Georgiana Puscasu

Resources for Multilingual Processing

Georgiana Puscasu is PhD student and member of the Research Group in Computational Linguistics at University of Wolverhampton, UK. Her current research interests include temporal processing and coreference resolution with special emphasis on improving question answering.

Georgiana earned an M.Sc. degree in Computational Linguistics from "Alexandru Ioan Cuza" University, Iasi, Romania. She has participated in the BalkaNet EU-funded project which developed the Eastern-European wordnets. Teaching, as well as involvement in NLP commercial projects can also be mentioned among her activities at University of Wolverhampton.

Resources for Multilingual Processing

The ever-spreading tentacles of the Internet have activated research in multilingual information processing, with a corresponding renewed interest in relevant multilingual resources. Ongoing research in Natural Language Processing requires vast amounts of data, as well as tools to deal with the existent variety of languages. Bearing in mind that costs for producing such data and resources are high, it is of major importance that the research community is aware of their existence and exploits them appropriately, without reinventing the wheel. In order to avoid massive and wasteful duplication of effort, it is critical to know what is already publicly available (not necessarily at no cost) and exploitable.

In light of this, the purpose of this tutorial is to present an inventory of existent resources that can support multilingual Natural Language Processing. An extensive range of corpora, tools and off-the-shelf modules, together with their description and specific features will be presented to the summer school attendants. The investigated features will include for corpora the genre, covered languages, size and availability and for tools or off-the-shelf modules the languages supported, the programming language in which they are implemented, their availability, the platforms they can be run on. The tutorial will make an attempt to cover resources in/for most of the languages spoken by the summer school participants.

 The afternoon practical session will give attendants the chance to get hands-on experience with different tools presented during the tutorial.