Babes-Bolyai University of Cluj-Napoca
Faculty of Mathematics and Computer Science
Study Cycle: Master

SUBJECT

Code
Subject
MIH1005 Data Mining
Section
Semester
Hours: C+S+L
Category
Type
Component-Based Programming
3
2+0+2
speciality
optional
Intelligent Systems
3
2+0+2
speciality
optional
Database
3
2+0+2
speciality
optional
Distributive Systems in Internet
3
2+0+2
speciality
optional
Teaching Staff in Charge
Lect. CÂMPAN Alina, Ph.D.,  alinacs.ubbcluj.ro
Aims
The background that favored the data mining extensive development was the wide availability of huge amounts of data, coupled with the ever-increasing computational power that allowed these data sets to be analyzed. These data, adequately explored and analyzed, can be turned into useful information and knowledge, in different areas and for various applications: decision making, process control, production control, business management, market analysis, science exploration, information management, query processing etc.
This course presents recent developments in knowledge discovery in databases domain (KDD), with focus on an essential step in the KDD process, the data mining step. However, other related information to data mining, relevant for the KDD process, is also presented: data warehouses, OLAP, data preprocessing.
The course introduces data mining concepts, methods and techniques, from a database perspective. The focus is on different data mining problems (tasks) and their corresponding solutions. The students will learn various data analysis techniques, and will apply these techniques for solving data mining problems using special software systems and tools. A perception of data mining as a strong application field, as well as a significant database research domain, will be formed.
Content
1. Introduction
Data mining - what is it, what are the factors that favoured this domain development, data mining and KDD (Knowledge Discovery in Databases) process
Types of data explored in data mining
Data mining functionalities
Patterns and interesting patterns
Data mining from a database perspective
2. Data warehouses and OLAP tehnology - overview
What are data warehouses
A multidimensional data model
Data warehouse architecture
Data warehouse implementation
From data warehouses to data mining
3. Concept description - characterization and comparison
Definitions
Data generalization and summarization-based characterization
Analytical characterization: attribute relevance analysis
Class comparison: discriminating between classes
Descriptive statistical measures in large databases
4. Data preprocessing
Data cleaning
Data transformation and integration
Data reduction
Discretization and concept hierarchy generation
5. Mining association rules (associations analysis)
Problem definition
Algorithms for mining single-dimensional boolean association rules from transaction databases - Apriori, FP-Growth
Algorithms for mining multi-level association rules, multi-dimensional association rules, association rules with constraints
Correlation analysis
ODM and association analysis in ODM
6. Classification and prediction
Problem definition
Classification using decision tree induction
Bayes classification
Other classification methods
Prediction - linear regression
Classifier accuracy
ODM and classification in ODM
7. Clustering (cluster analysis)
Problem definition
Types of data in cluster analysis
Clustering methods classification
Clustering methods classes: partitioning, hierarchical, density-based, grid- based, model-based clustering methods
Outliers detection
ODM and cluster analysis in ODM
8. Data mining standards and software - ODM, Microsoft OLE DB
9. Applications and trends in data mining
Applications: telecommunications, financial data analysis, biological data analysis, etc.
Data mining in statistical, audio, video databases
Data mining, data security and privacy
References
1. Han, J., Kamber, M., Data Mining: Concepts and Techniques, 1st Edition, Morgan Kaufmann, 2000.
2. ODM (Oracle Data Mining) Documentation (electronic format).
3. P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
4. P. Adriaans, D. Zantinge, Data Mining, Addison-Wesley, 1996.
5. Conference and journal papers (provided by the instructor).
6. Weka system and documentation (http://www.cs.waikato.ac.nz/ml/weka/). Weka is a suite of machine learning / data mining software. It contains Java implementation for various mining algorithms, data preprocessing filters, and experimentation capabilities. Weka is free open-source software under the GNU General Public License (GPL).
Assessment
The activity ends with a written exam (grade E). During the semester, the students will prepare and present a theoretical report (grade R) and several practical (lab) projects, consisting in implementing data mining (association analysis, classification, cluster analysis) algorithms and performing data analysis using specialized software tools (grade P). The final grade is a weighted mean of the three grades mentioned above: Final Grade = 40%E + 25%R + 35%P. The students who will show considerable research abilities, involving into projects development and research results publication will be granted additional 10% score to the final grade. In order to successfully pass the exam, the final grade has to be at least 5.
Links: Syllabus for all subjects
Romanian version for this subject
Rtf format for this subject