Microsoft’s top data scientist shares 4 insights for your big data career

Microsoft’s top data scientist shares 4 insights for your big data career

How much data have you created in your lifetime? Take a moment to imagine the ever-expanding pool. Every video streamed, product clicked, message sent or GPS signal logged feeds an exponential ocean of big data – and the world around you is swimming in it.

Naturally, the demand for data scientists and analysts is accelerating, too. According to Microsoft Asia Pacific’s Director of Data Science, Graham Williams, having professionals who can capture and mine big data for business insights and solutions will be critical for every organisation. We spoke with Graham to find out what your career in big data can look like.

1. You’ll solve real-world problems, from VIP customers to broken powerlines

A typical day as a data scientist involves deploying state-of-the-art algorithms to bring complex data alive. Through representative models that reflect processes in the real world, you’ll generate insight, stimulate innovation and create significant value for government, businesses and industry.

“Data science is exciting because it discovers new knowledge from ever-increasing data, which can be used to improve the human endeavour. We have the opportunity to explore so many different areas, from improving outcomes for businesses to improving the health of patients,” Graham says.

Machine learning – an application of artificial intelligence (AI) that lets systems access data and use it to learn for themselves – is key to this exploration. By analysing massive amounts of image data, for example, machine learning algorithms can build intelligent apps that accurately recognise objects in images. This leads to technology like self-driving cars, or autonomous drones that scan hundreds of kilometres of powerlines to find sections needing maintenance.

The applications of big data are endless – which means the variety in your career is, too. Graham’s team has worked on projects to identify the reasons for school drop-out rates among students in India, forecast future demand for chip fabrication, and identify key customers of a bank that could benefit from new services. Who knew data could bring such diversity?

By applying machine learning to complex image data, drones can remove the need for time-consuming and dangerous manual powerline inspections.

2. You’ll tackle cloud computing advances using ‘data genetics’

As a data scientist, your journey into what Graham calls a ‘fast-moving, data-rich, information-driven, knowledge-based, interconnected future’ will demand superior skills and knowledge in machine learning, AI and statistics.

As businesses take advantage of the cloud’s potential for massive compute over big data, you’ll help build enormous population models from their customer database and undertake extremely refined analysis akin to mapping the customer genome.

“Data scientists will slice and dice the population into thousands of overlapping groups, so each customer can be finely understood and modelled – a kind of data genetics. This will deliver deeper insight from the data, allowing businesses to more intelligently support the needs of individual customers,” Graham says.

And while many data tasks will be automated by machine learning – such as systems like PROSE, which automatically generate computer programs to clean up data – there’ll still be a high demand for “the skills of a serious programmer, with a strong computer science background and software engineering skills”.

3. You’ll embrace open source and continuous improvement

Big data tools and technology change constantly, so you’ll need to find new ways to explore data and innovate over algorithms throughout your career. As the developer behind Rattle, a free and open source toolkit for data mining, Graham is a strong advocate for fast innovation. He believes working within the open source community is a useful way to develop collaboration skills and a growth mindset.

“Free and open source software is fundamental to the future of technology. It facilitates rapid and shared innovation and provides the freedom to build on the shoulders of the giants who have gone before us,” he says.

Adopting a collaborative approach can also help you make faster progress by building on the work of others.

“We can learn and enhance solutions with our own new ideas, then feed that back into the open source community for others to benefit almost immediately.”

This feedback loop is something Graham encourages at Microsoft. He works collaboratively with data scientists and product teams to ensure Microsoft’s leading data science platform, built on the Azure cloud, is continually enhanced. Best practice lessons are then shared through open source products that any data scientist can use and build on, from the Azure Machine Learning suite, to the Team Data Science Process and a gallery of solution guides.

4. You’ll learn to listen to the data

Perhaps the most important insight for a data scientist is to listen to what the data is telling you. Being able to appreciate the exhilarating potential of data, while also understanding its limitations, is essential.

“We must avoid bringing too many of our preconceived ideas to the data and instead, by ‘living and breathing’ the data, come to understand the data intimately. The models we build can only be as good as the data we build them from, so it’s also crucial we understand the limitations,” Graham says.

It’s this inherent possibility in data that makes a career like Graham’s so exciting. If you’re seeking diverse challenges, a collaborative, fast-paced environment and the chance to continually challenge the status-quo, a career in data science and analytics could be your ultimate clever move.

Ready to make big data meaningful? Study Data Science or Business Analytics at La Trobe University.

Graham Williams

Dr Graham Williams is Director of Data Science, Cloud AI, Microsoft Asia. He has a PhD in machine learning and is an artificial intelligence researcher, practitioner, and educator with over 30 years in the industry. He has authored many books and papers as well as a number of popular software packages, including Rattle for data mining.