DATA MINING
CSE5DMI
2019
Credit points: 15
Subject outline
Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex data including big data, multimedia, spatial and temporal data, biological and health data. Data Mining has evolved from several areas including: databases, artificial intelligence, algorithms, information retrieval and statistics. This subject is designed to provide students with a solid understanding of data mining concepts and tools. The subject covers algorithms and techniques for data preprocessing, data classification, association rule mining, and data clustering. The subject also covers domain applications where data mining techniques are used.
School: School Engineering&Mathematical Sciences
Credit points: 15
Subject Co-ordinator: Phoebe Chen
Available to Study Abroad Students: Yes
Subject year level: Year Level 5 - Masters
Exchange Students: Yes
Subject particulars
Subject rules
Prerequisites: CSE1OOF or CSE4OOF or CSE5CES or equivalent (discuss with subject coordinator)
Co-requisites: N/A
Incompatible subjects: CSE4DMI
Equivalent subjects: N/A
Special conditions: N/A
Learning resources
Readings
| Resource Type | Title | Resource Requirement | Author and Year | Publisher |
|---|---|---|---|---|
| Readings | Introduction to Data Mining | Recommended | Tan, PN, Steinback, M & Kumar, V; 2006 | MORGAN KAUFMANN |
| Readings | Data Mining: Concepts and Techniques | Recommended | Jiawei Han, Micheline Kamber and Jian Pei; 2011 | Morgan Kaufmann |
Graduate capabilities & intended learning outcomes
01. Perform critical and effective data- preprocessing tasks.
- Activities:
- Students will learn different types of data and their related issues such as sampling, similarity metrics, feature selection, dimensionality issue. They also learn and practice effective data-preprocessing techniques in lecture 1, laboratory classes 3 and assignment.
02. Evaluate major data mining classification methodologies.
- Activities:
- In lectures 3 to 6, student learn provide details for a wide range of classification approaches such as decision tree, rule-based classification, nearest neighbour classification, Bayes classification, artificial neural network (ANN), and support vector machine (SVM). Related issues covering under-fitting and over-fitting will also be discussed. Students will also apply various classification approaches to different datasets in laboratory classes 4 to 6 and assignments.
03. Critique association rules mining approaches.
- Activities:
- In lectures 7 and 8, students learn the concept of association analysis for transaction data, including frequent item sets, association rule mining, rule generation and evaluation, and Apriori algorithm. Students will practise association rules mining in laboratory classes 7 and 8 and assignment.
04. Evaluate Data Mining Algorithms based on data clustering techniques.
- Activities:
- In lectures 9 to 11 students learn major data clustering techniques, such as K-means clustering, hierarchical clustering, and DBSCAN, for pattern extraction and knowledge discovery from unlabeled data. Students will apply these approaches to real datasets in laboratory classes 9 and 10 and assignment.
05. Apply advanced data mining techniques for pattern discovery from selected datasets.
- Activities:
- In lecture the techniques will be demonstrated and students will apply the techniques in the laboratory class.
Melbourne, 2019, Semester 2, Blended
Overview
Online enrolment: Yes
Maximum enrolment size: N/A
Enrolment information:
Subject Instance Co-ordinator: Phoebe Chen
Class requirements
LectureWeek: 31 - 43
One 2.0 hours lecture per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.
Computer LaboratoryWeek: 32 - 43
One 2.0 hours computer laboratory per week on weekdays during the day from week 32 to week 43 and delivered via face-to-face.
Assessments
| Assessment element | Comments | % | ILO* |
|---|---|---|---|
| Assignment 1 - Data preprocessing and decision tree (1,200-words equivalent) | Source code and a written report on data preprocessing and decision trees | 20 | 02, 03, 05 |
| Assignment 2 - Classification and Clustering (1,200-words equivalent) | Source code and a written report on classification and clustering | 20 | 02, 04, 05 |
| One 3-hour examination (3,000-words equivalent) | Hurdle requirement: To pass the subject, a pass in the examination is mandatory. | 50 | 01, 02, 03, 04, 05 |
| Completion of laboratory class tasks (1,000-words total) | 10 | 05 |