Babes-Bolyai University of Cluj-Napoca
Faculty of Mathematics and Computer Science
Study Cycle: Graduate

SUBJECT

Code
Subject
MII0007 Technics for Information Retrieval
Section
Semester
Hours: C+S+L
Category
Type
Computer Science - in Romanian
5
2+0+2
speciality
optional
Mathematics-Computer Science - in Romanian
5
2+0+2
speciality
optional
Information engineering - in Romanian
7
2+0+2
optional
Teaching Staff in Charge
Lect. LUPSA Dana, Ph.D.,  danacs.ubbcluj.ro
Aims
Information retrieval and natural language processing are now accepted to be some of the most studied and active field of Computer Science. This course will cover traditional material, as well as recent advances in intelligent Information Retrieval (IR). We will focus on the underlying retrieval algorithms and models. Information retrieval is closely related to the organization and description model of the information to be found; that is why the course also presents some techniques specific to natural language processing.
Content
1. Introduction: natural language processing (NLP) and information retrieval (IR).
2. Information in free text: linking syntax and semantics, semantic information; special attention to lexical semantics. Models for information representation.
3. Text properties; text processing. Statistic characteristics of texts, term weighting, classification algorithms with application to text classification.
4. File organization for IR.
5. Probabilistic models in NLP and IR.
6. Practical applications of IR techniques in NLP.
7. IR and World Wide Web.

References
1. J.ALLEN : $Natural language understanding$, Benjamin/Cummings Publ., 2nd ed., 1995.
2. S.COZENS, P.WAINWRIGHT: $Beginning Perl$, Wrox Press, 2000
3. C.MANNING, P.RAGHAVAN, H.SCHÜTZE: $Introduction to Information Retrieval$, Cambridge University Press. 2007.
http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html
4. D.JURAFSKY, J.MARTIN: $Speech and language processing$, Prentice Hall, 2000.
5. C.J. van RIJSBERGEN: $Information Retrieval$, 1979
6. D.TATAR: $Inteligenta artificiala. Aplicatii in prelucrarea limbajului natural$, Ed. Albastra, Microinformatica, 2003
Assessment
The final grade will be determined based on the following components:
final examination = 70%
practical work = 30%
The practical work will be graded based on programs or projects developed during laboratory classes.
Links: Syllabus for all subjects
Romanian version for this subject
Rtf format for this subject