Babes-Bolyai University of Cluj-Napoca
Faculty of Mathematics and Computer Science
Study Cycle: Master

SUBJECT

Code
Subject
MI282 Natural Language Processing
Section
Semester
Hours: C+S+L
Category
Type
Intelligent Systems - in English
1
2+2+0
compulsory
Teaching Staff in Charge
Prof. TATAR Doina, Ph.D.,  dtatarcs.ubbcluj.ro
Aims
Natural language processing is now accepted as one of the most studied and active field of Computer Science. The notion of feature structure as linguistic object stands on the base of most recent approaches which are surveyed in this course. The optimization of the search on Web, the interfaces in natural language and the aspects of text mining are only some of motivations for studying natural language processing.
Content
1. Introduction in Natural Language Processing: Stages, Domains, Chapters. Corpora.
2. Word Sense Disambiguation. Machine learning approach: supervised (NBC and k-NN) and unsupervised (by clustering). Dictionary based approach (Lesk, Yarowsky, bilingual dictionaries).
3. Statistics in NLP: Markov chains, Hidden Markov Model. Evaluation, Estimation and Training with HMM. The probability of input sequences, the most likely path. Applications to POS tagging.
4. Probabilistic Context- free Grammars. Syntactic analysis: active charts.Earley's algorithm.
5. Unification grammars. Feature structures as objects of linguistic knowledge representation. Feature structures as graphs, AVM and descriptors. Subsumation and unification. Proof theory of descriptors. Well-typed and total-well-typed feature structures. Parsing with unification grammars (top-down and bottom-up).



References
1. J.ALLEN : Natural language understanding, Benjamin/Cummings Publ. , 2nd ed., 1995.
2. E. CHARNIAK: "Statistical language learning", MIT press, 1996.
3. B.CARPENTER: ALE:The attribute logic engine.User's guide. Carnegie Mellon University,1994.
4. D.JURAFSKY, J.MARTIN: Speech and language processing, Prentice Hall, 2000.
5. C.MANNING, H.SCHUTZE: Foundation of statistical natural language processing, MIT, 1999.
6. S.J.RUSSELL, P.NORVIG: Artificial intelligence.A modern approach, Prentice-Hall International,1995.
7. D.TATAR: Inteligenta artificiala: demonstrare automata de teoreme, prelucrarea limbajului natural, Editura Albastra, Microinformatica, 2001.
8. D.TATAR: Unification Grammars in Natural Language Processing, in "Recent topics in mathematical and computational linguistic, ed. C. Martin-Vide, G. Paun, Editura Academiei, 2000, pg 289-300.
9. D. TATAR: Inteligenta artificiala. Aplicatii in prelucrarea limbajului natural,Editura Albastra, Microinformatica, 2003, ISBN 973-650-100-0
10. Editor R. MITKOV: The Oxford Handbook of Computational Linguistics, Oxford University Press, 2003.
Assessment
The examination is by written exam, with the subjects from all the matter (60%). Will be evaluated the activity of understanding and communication of some recent papers in the field and implementation of some apllications (40%).
Links: Syllabus for all subjects
Romanian version for this subject
Rtf format for this subject