DATA EXPLORATION AND VISUALISATION

CSE5DEV

2019

Credit points: 15

Subject outline

The goal of this subject is to equip graduate students with in-depth practical knowledge, and solid understanding of the latest data exploration techniques and tools.This subject puts forward the most frequently encountered problems by data scientists in practice and offers effective solutions from the most up-to-date techniques proposed in the machine learning community. The problems discussed in the subject involve problems with input dimension, unstructured data, big data, data streams, and bias and variance tradeoff. To solve these problems, students learn fundamentals of statistical learning, logistic regression, deep learning, sample selection, active learning, online learning and ensemble learning. This subject also covers data visualisation and related technologies used in the field of visual analytics. Numerous real-world data sets from the manufacturing industry, financial time-series, web-news articles, etc will be used. Students are taught how to implement data exploration methods and visualisation techniques in the R environment.

School: School Engineering&Mathematical Sciences

Credit points: 15

Subject Co-ordinator: Nasser Sabar

Available to Study Abroad Students: Yes

Subject year level: Year Level 5 - Masters

Exchange Students: Yes

Subject particulars

Subject rules

Prerequisites: CSE4DBF or MAT4NLA

Co-requisites: N/A

Incompatible subjects: N/A

Equivalent subjects: N/A

Special conditions: N/A

Learning resources

Readings

Resource Type	Title	Resource Requirement	Author and Year	Publisher
Readings	The Elements of Statistical Learning: Data Mining, Inference, and Prediction	Prescribed	Trevor Hastie, Robert Tibshirani, Jerome Friedman	Springer
Readings	Pattern Classification	Recommended	Richard O. Duda, Peter E. Hart, David G. Stork	Wiley
Readings	Evolving Fuzzy Systems --- Fundamentals, Reliability, Interpretability, Useability, Applications	Recommended	Edwin Lughofer	Springer
Readings	Data Mining : Practical Machine Learning Tools and Techniques	Recommended	Ian H. Witten, Eibe Frank, Mark A. Hall	Morgan Kaufman
Readings	Machine Learning: A Probabilistic Perspective	Prescribed	Kevin P. Murphy	The MIT Press

Graduate capabilities & intended learning outcomes

01. Analyse common problems encountered by data scientists in practice.

Activities:: Lecture 7, 9, 10, 13, 16, 21 focus on challenges of data exploration in practice. These include relationship between complexity and overfitting, unstructured data, big data, data stream mining, problem with structural complexity, bias and variance trade-off.

02. Synthesize possible solutions for problems frequently encountered by data scientists.

Activities:: Lecture 3, 4 are on regularization techniques: ridge regression, LASSO. These lectures put into perspective cross validation as a parameter selection technique. Lecture 7 and 8 are on several techniques to solve the dimensionality problems: feature selection and feature extraction. Lecture 9-12 are on how to handle unstructured problem, using popular deep learning methods. Lecture on 13-15 discusses on techniques to handle big data: conventional sample selection techniques and active learning scenario. Lecture 16-20 outline how to handle data stream in online learning scenario - the concept of incremental learning and evolving system. Lecture 21-23 introduce various ensemble methods: bagging, boosting, stacking, and bootstrapping.

03. Evaluate visualization techniques and related tools in the context of data analytics

Activities:: During lectures, every problem that is used is visualized using visualization tools. Furthermore, the student will be taught how to use visualisation tools to describe the problems and to explain the logic why the proposed algorithms work.

04. Evaluate several machine learning and data exploration techniques to solve a given problem

Activities:: Lecture 3, 4 discuss strength and weakness of the Ridge regression and Lasso. Lecture 7, 8 discuss advantages and disadvantages between feature selection and feature extraction to handle the dimensionality problem. Lecture 9-12 analyse pros and cons of several deep learning techniques: autoencoder, LSTM, convnets. Lecture 13-15 offer the sample selection and active learning approaches. Lecture 16-20 discuss the incremental learning technique and the evolving approach for data stream mining. Several ensemble approaches, namely bagging, boosting, and stacking, are put forward to deal with the bias and variance tradeoff in Lecture 21-23.

05. Analyse performance of proposed data exploration techniques to solve real-world problems.

Activities:: Lab 1-2 is on the implementation of the maximum likelihood, Bayesian learning in R, including the regularization techniques and cross-validation in R. Lab 3 is on the implementation of logistic regression in R. Lab 4 is on the implementation of the feature extraction and selection techniques. Lab 5, 6 are on the implementation of deep learning algorithms in Torch 7. Lab 7 focus on the machine learning approach in big data. These involve the implementation of the sample selection technique and the active learning in R. Lab 8-10 are on the implementation of the incremental learning, and evolving learning algorithms in R. Lab 11 is on the implementation of the ensemble techniques in R.

Melbourne, 2019, Semester 2, Blended

Overview

Online enrolment: Yes

Maximum enrolment size: N/A

Enrolment information:

Subject Instance Co-ordinator: Nasser Sabar

Class requirements

LectureWeek: 31 - 43
One 1.0 hours lecture per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.

Computer LaboratoryWeek: 31 - 43
One 2.0 hours computer laboratory per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.

Assessments

Assessment element	%	ILO*
Assignment 1 equivalent to 1,200 words	22	01, 02, 04
Assignment 2 equivalent to 1,200 words	22	01, 02, 04
Weekly quizzes equivalent to 1,120 words	20	03, 05
One 2-hour examination equivalent to 2,000 words	36	01, 02, 04