Daniel Marcu and Dragos Stefan Munteanu

State of the art in Statistical-Based Machine Translation: A Romanian-English Experiment

Daniel Marcu

A native of Cluj, Daniel Marcu is a Research Project Leader at ISI/USC and a Research Assistant Professor in USC's Computer Science Department. Daniel's published work includes an MIT Press book, /The Theory and Practice of Discourse Parsing and Summarization/, and best paper awards at AAAI-2000 and ACL-2001 for work on statistical-based summarization and translation. Daniel is also a founder and Chief Operations and Technology Officer of Language Weaver Inc., a US company that commercializes statistical machine translation software.

Dragos Stefan Munteanu

Dragos Stefan Munteanu is a Ph.D. candidate in USC's Computer Science Department. He has a M.Sc. degree from the University of Iowa, and a B.Sc. from the University of Bucharest, Department of Mathematics. His research interests include statistical machine translation, discourse processing and summarization. His current work focuses on exploitation of comparable corpora in statistical machine translation.

 

State of the art in Statistical-Based Machine Translation: A Romanian-English Experiment

At the beginning of the nineties, the first attempts to translate natural language sentences using statistical models trained on large amounts of bilingual corpora were received with skepticism. Today, statistical approaches to machine translation have become the dominant paradigm in the field. In this tutorial, I will review the most representative mathematical models and algorithms that lie at the foundation of this area. I will also discuss practical issues that concern the development of end-to-end statistical systems by describing the work involved in developing a Romanian-English machine translation system from scratch.

 

Working session: During the working session, the participants will implement, test, and evaluate one of the main components of an end-to-end statistical machine translation system.