Georgiana Puscasu
Resources for Multilingual Processing
Georgiana Puscasu is PhD student and member of the Research
Group in Computational Linguistics at University of Wolverhampton,
Georgiana earned an M.Sc. degree in Computational
Linguistics from "Alexandru Ioan Cuza" University,
Resources for Multilingual Processing
The ever-spreading tentacles of the Internet
have activated research in multilingual information processing, with a
corresponding renewed interest in relevant multilingual resources. Ongoing
research in Natural Language Processing requires vast amounts of data, as well
as tools to deal with the existent variety of languages. Bearing in mind that
costs for producing such data and resources are high, it is of major importance
that the research community is aware of their existence and exploits them
appropriately, without reinventing the wheel. In order to avoid massive
and wasteful duplication of effort, it is critical to know what is already
publicly available (not necessarily at no cost) and exploitable.
In light of this, the purpose of this
tutorial is to present an inventory of existent resources that can support
multilingual Natural Language Processing. An extensive range of corpora, tools
and off-the-shelf modules, together with their description and specific
features will be presented to the summer school attendants. The investigated
features will include for corpora the genre, covered languages, size and
availability and for tools or off-the-shelf modules the languages
supported, the programming language in which they are implemented, their
availability, the platforms they can be run on. The
tutorial will make an attempt to cover
resources in/for most of the languages spoken by the summer
school participants.
The afternoon practical session will
give attendants the chance to get hands-on experience with different tools
presented during the tutorial.