Data Scientist Profile
Michael Clark
What is your academic background? What and where did you study?
I completed a Bachelor of Science (Honours), 1st class, in Physics at the University of Canterbury in Christchurch, New Zealand. Then I got a Masters in Science with a focus on Geoscience at the University of Victoria, Wellington, NZ.
Have you completed any other training in data science? (Up-skilling, MOOCS, short courses etc)
I did the online course cs231n “Convolutional Neural Networks for Visual Recognition” and read through a lot of other courses like Fast AI to find advice on debugging machine learning models.
If you pivoted into a data science role from another area, how did you go about this and what advice would you give others looking to do the same?
After leaving my job in the oil industry I started with a base of programming skills. Then I completed online courses to fill gaps in my knowledge. My goal was to complete interesting projects and show them to people at meetups, hackathons, Kaggle, or online on GitHub or blogs.
My advice would be for people to do the same: get good at programming, then do projects, and show your work.
You can do this at hackathons, competitions, meetups, and by sharing your code on Github. It can be embarrassing to show your code but it’s necessary to use and show your skills in public if you want to get good. My code varies in quality, and is embarrassing at times, but you can see it here http://github.com/wassname.
My only caveat is that it does take some time to hone a skill like programming. There are many places online that exaggerate and claim you can learn programming in a weekend, but like any other skills, it can take years to get truly proficient. To keep up practice over that time, it’s essential to allow yourself to enjoy it. This will become easier as you get better at it.
What sparked your interest in working with data?
I like programming and have always been interested in Artificial General Intelligence and the control problem. Data interested me because it’s a way to have a career in Perth that lets me stay near these interests, as well as contribute to problems that matter in industry.
How did you come to work in your current role?
After working in the Oil & Gas industry I founded ThinkCDS in 2016 to provide machine learning and data science services in the field. Then I merged with Three Springs Technology in 2019.
What sort of projects have you been working on?
Many of my projects are machine learning applied to engineering problems in oil and gas, mining, or in utilities. For example reinforcement learning applied to mining control systems in bucketwheel reclaimers. Machine learning applied to find leaks in satellite images. I’ve put more details on our company website https://threespringstechnology.com/projects/. Some of the most interesting have not been approved by clients for public release, unfortunately.
What tools/platforms do you use in your work?
I use python because it’s the most general of two de facto data science languages, with the other being R. I generally use Linux, GitHub, PyTorch, docker, visual studio code and a range of other popular programming tools.
What has been a highlight of your data science career so far?
I’ve enjoyed treading new ground by applying reinforcement learning to bucketwheel reclaimers. I used a transformer-based model to build a comment moderation product which achieves human-level performance when moderating comments https://threespringstechnology.com/products/ai-moderation/.
There have been some other exciting results that I hope to have cleared for public release soon.
What challenges have you faced as a data scientist?
Lack of data, messy data, and complex problem domains are some of the biggest challenges in taking data science into the real world. These are best faced by bringing good programming skills into a project, and engaging with domain experts in the early stages of a project.
What are some of the big areas of opportunity/questions you want to tackle in this space?
I’m interested in applying reinforcement learning to control systems in Mining since this field has shown a wide ability to generalise and this is a big area for Perth. Unfortunately you need a simulator, and algorithms are quite data intensive.
What excites you most about recent developments in Data Science?
In 2020 advancements in transformers for NLP (Natural language processing) show promise for disrupting text data in the near term. It can be useful in Perth for things like work order classification or document search. Google is already using it in their search.
In the longer term, I’m excited by the progress towards data efficiency in reinforcement learning. This may have impacts in control systems, factories, self-driving cars and eventually may contribute to AGI (Artificial general intelligence).
What does the future of data science look like?
A lot of existing roles and projects have been relabelled data science due to its popularity. As it’s got more popular there has been hype generated as well as some real value. This presents a risk to the whole field. We need to focus on real value by doing expectation management and validating projects.
For people considering a career in data science, what is one piece of advice you would give?
Start by seeing if you could potentially enjoy programming, perhaps by going to https://codingbat.com/python. Programming is a prerequisite skill that has a wide applicability so it should be time well spent.
We would like to acknowledge and pay respect to the traditional owners of the land on which the WADSIH office is located, the Whadjuk people of the Nyungar Nation.