DATA EXPLORATION AND VISUALISATION
Credit points: 15
The goal of this subject is to equip graduate students with in-depth practical knowledge, and solid understanding of the latest data exploration techniques and tools.This subject puts forward the most frequently encountered problems by data scientists in practice and offers effective solutions from the most up-to-date techniques proposed in the machine learning community. The problems discussed in the subject involve problems with input dimension, unstructured data, big data, data streams, and bias and variance tradeoff. To solve these problems, students learn fundamentals of statistical learning, logistic regression, deep learning, sample selection, active learning, online learning and ensemble learning. This subject also covers data visualisation and related technologies used in the field of visual analytics. Numerous real-world data sets from the manufacturing industry, financial time-series, web-news articles, etc will be used. Students are taught how to implement data exploration methods and visualisation techniques in the R environment.
SchoolSchool Engineering&Mathematical Sciences
Subject Co-ordinatorFei Liu
Available to Study Abroad StudentsYes
Subject year levelYear Level 5 - Masters
Prerequisites CSE4DBF or MAT4NLA
|Resource Type||Title||Resource Requirement||Author and Year||Publisher|
|Readings||The Elements of Statistical Learning: Data Mining, Inference, and Prediction||Prescribed||Trevor Hastie, Robert Tibshirani, Jerome Friedman||Springer|
|Readings||Pattern Classification||Recommended||Richard O. Duda, Peter E. Hart, David G. Stork||Wiley|
|Readings||Evolving Fuzzy Systems --- Fundamentals, Reliability, Interpretability, Useability, Applications||Recommended||Edwin Lughofer||Springer|
|Readings||Data Mining : Practical Machine Learning Tools and Techniques||Recommended||Ian H. Witten, Eibe Frank, Mark A. Hall||Morgan Kaufman|
|Readings||Machine Learning: A Probabilistic Perspective||Prescribed||Kevin P. Murphy||The MIT Press|
Graduate capabilities & intended learning outcomes
01. Analyse common problems encountered by data scientists in practice.
- Lecture 7, 9, 10, 13, 16, 21 focus on challenges of data exploration in practice. These include relationship between complexity and overfitting, unstructured data, big data, data stream mining, problem with structural complexity, bias and variance trade-off.
02. Synthesize possible solutions for problems frequently encountered by data scientists.
- Lecture 3, 4 are on regularization techniques: ridge regression, LASSO. These lectures put into perspective cross validation as a parameter selection technique. Lecture 7 and 8 are on several techniques to solve the dimensionality problems: feature selection and feature extraction. Lecture 9-12 are on how to handle unstructured problem, using popular deep learning methods. Lecture on 13-15 discusses on techniques to handle big data: conventional sample selection techniques and active learning scenario. Lecture 16-20 outline how to handle data stream in online learning scenario - the concept of incremental learning and evolving system. Lecture 21-23 introduce various ensemble methods: bagging, boosting, stacking, and bootstrapping.
03. Evaluate visualization techniques and related tools in the context of data analytics
- During lectures, every problem that is used is visualized using visualization tools. Furthermore, the student will be taught how to use visualisation tools to describe the problems and to explain the logic why the proposed algorithms work.
04. Evaluate several machine learning and data exploration techniques to solve a given problem
- Lecture 3, 4 discuss strength and weakness of the Ridge regression and Lasso. Lecture 7, 8 discuss advantages and disadvantages between feature selection and feature extraction to handle the dimensionality problem. Lecture 9-12 analyse pros and cons of several deep learning techniques: autoencoder, LSTM, convnets. Lecture 13-15 offer the sample selection and active learning approaches. Lecture 16-20 discuss the incremental learning technique and the evolving approach for data stream mining. Several ensemble approaches, namely bagging, boosting, and stacking, are put forward to deal with the bias and variance tradeoff in Lecture 21-23.
05. Analyse performance of proposed data exploration techniques to solve real-world problems.
- Lab 1-2 is on the implementation of the maximum likelihood, Bayesian learning in R, including the regularization techniques and cross-validation in R. Lab 3 is on the implementation of logistic regression in R. Lab 4 is on the implementation of the feature extraction and selection techniques. Lab 5, 6 are on the implementation of deep learning algorithms in Torch 7. Lab 7 focus on the machine learning approach in big data. These involve the implementation of the sample selection technique and the active learning in R. Lab 8-10 are on the implementation of the incremental learning, and evolving learning algorithms in R. Lab 11 is on the implementation of the ensemble techniques in R.
Select to view your study options…
Melbourne, 2017, Semester 2, Blended
Maximum enrolment sizeN/A
Subject Instance Co-ordinatorFei Liu
One 1.0 hours lecture per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.
One 2.0 hours computer laboratory per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.
|Assignment 1 equivalent to 1,200 words||22||01, 02, 04|
|Assignment 2 equivalent to 1,200 words||22||01, 02, 04|
|Weekly assessment equivalent to 1,120 words||20||03, 05|
|One 2-hour examination equivalent to 2,000 words||36||01, 02, 04|