DATA EXPLORATION AND VISUALISATION

CSE5DEV

2018

Credit points: 15

Subject outline

The goal of this subject is to equip graduate students with in-depth practical knowledge, and solid understanding of the latest data exploration techniques and tools.This subject puts forward the most frequently encountered problems by data scientists in practice and offers effective solutions from the most up-to-date techniques proposed in the machine learning community. The problems discussed in the subject involve problems with input dimension, unstructured data, big data, data streams, and bias and variance tradeoff. To solve these problems, students learn fundamentals of statistical learning, logistic regression, deep learning, sample selection, active learning, online learning and ensemble learning. This subject also covers data visualisation and related technologies used in the field of visual analytics. Numerous real-world data sets from the manufacturing industry, financial time-series, web-news articles, etc will be used. Students are taught how to implement data exploration methods and visualisation techniques in the R environment.

SchoolSchool Engineering&Mathematical Sciences

Credit points15

Subject Co-ordinatorFei Liu

Available to Study Abroad StudentsYes

Subject year levelYear Level 5 - Masters

Exchange StudentsYes

Subject particulars

Subject rules

Prerequisites CSE4DBF or MAT4NLA

Co-requisitesN/A

Incompatible subjectsN/A

Equivalent subjectsN/A

Special conditionsN/A

Readings

Resource TypeTitleResource RequirementAuthor and YearPublisher
ReadingsThe Elements of Statistical Learning: Data Mining, Inference, and PredictionPrescribedTrevor Hastie, Robert Tibshirani, Jerome FriedmanSpringer
ReadingsPattern ClassificationRecommendedRichard O. Duda, Peter E. Hart, David G. StorkWiley
ReadingsEvolving Fuzzy Systems --- Fundamentals, Reliability, Interpretability, Useability, ApplicationsRecommendedEdwin LughoferSpringer
ReadingsData Mining : Practical Machine Learning Tools and TechniquesRecommendedIan H. Witten, Eibe Frank, Mark A. HallMorgan Kaufman
ReadingsMachine Learning: A Probabilistic PerspectivePrescribedKevin P. MurphyThe MIT Press

Graduate capabilities & intended learning outcomes

01. Analyse common problems encountered by data scientists in practice.

Activities:
Lecture 7, 9, 10, 13, 16, 21 focus on challenges of data exploration in practice. These include relationship between complexity and overfitting, unstructured data, big data, data stream mining, problem with structural complexity, bias and variance trade-off.

02. Synthesize possible solutions for problems frequently encountered by data scientists.

Activities:
Lecture 3, 4 are on regularization techniques: ridge regression, LASSO. These lectures put into perspective cross validation as a parameter selection technique. Lecture 7 and 8 are on several techniques to solve the dimensionality problems: feature selection and feature extraction. Lecture 9-12 are on how to handle unstructured problem, using popular deep learning methods. Lecture on 13-15 discusses on techniques to handle big data: conventional sample selection techniques and active learning scenario. Lecture 16-20 outline how to handle data stream in online learning scenario - the concept of incremental learning and evolving system. Lecture 21-23 introduce various ensemble methods: bagging, boosting, stacking, and bootstrapping.

03. Evaluate visualization techniques and related tools in the context of data analytics

Activities:
During lectures, every problem that is used is visualized using visualization tools. Furthermore, the student will be taught how to use visualisation tools to describe the problems and to explain the logic why the proposed algorithms work.

04. Evaluate several machine learning and data exploration techniques to solve a given problem

Activities:
Lecture 3, 4 discuss strength and weakness of the Ridge regression and Lasso. Lecture 7, 8 discuss advantages and disadvantages between feature selection and feature extraction to handle the dimensionality problem. Lecture 9-12 analyse pros and cons of several deep learning techniques: autoencoder, LSTM, convnets. Lecture 13-15 offer the sample selection and active learning approaches. Lecture 16-20 discuss the incremental learning technique and the evolving approach for data stream mining. Several ensemble approaches, namely bagging, boosting, and stacking, are put forward to deal with the bias and variance tradeoff in Lecture 21-23.

05. Analyse performance of proposed data exploration techniques to solve real-world problems.

Activities:
Lab 1-2 is on the implementation of the maximum likelihood, Bayesian learning in R, including the regularization techniques and cross-validation in R. Lab 3 is on the implementation of logistic regression in R. Lab 4 is on the implementation of the feature extraction and selection techniques. Lab 5, 6 are on the implementation of deep learning algorithms in Torch 7. Lab 7 focus on the machine learning approach in big data. These involve the implementation of the sample selection technique and the active learning in R. Lab 8-10 are on the implementation of the incremental learning, and evolving learning algorithms in R. Lab 11 is on the implementation of the ensemble techniques in R.

Subject options

Select to view your study options…

Start date between: and    Key dates

Melbourne, 2018, Semester 2, Blended

Overview

Online enrolmentYes

Maximum enrolment sizeN/A

Enrolment information

Subject Instance Co-ordinatorFei Liu

Class requirements

Lecture Week: 31 - 43
One 1.0 hours lecture per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.

Computer Laboratory Week: 31 - 43
One 2.0 hours computer laboratory per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.

Assessments

Assessment elementComments% ILO*
Assignment 1 equivalent to 1,200 words22 01, 02, 04
Assignment 2 equivalent to 1,200 words22 01, 02, 04
Weekly quizzes equivalent to 1,120 words20 03, 05
One 2-hour examination equivalent to 2,000 words36 01, 02, 04