Data Scientist Profile

Shereif Khorshid

Sherief Khorshid

Founder of Three Springs Technology

What is your academic background? 

BSc in Geology from UWA. 20+ years of commercial software development experience in pension funds, hedge funds, investment banks and AI consulting.

 

Have you completed any other training in data science?

I had a strong science and maths background, and lots of experience, and I managed my own learning in data science. There are a lot of resources if you know how to learn. Personally I read online documentation, have joined many Kaggle competitions and have watched many hours of lectures and training videos on YouTube.

 

If you pivoted into a data science role from another area, how did you go about this and what advice would you give others looking to do the same?

I think you need strong math, computer science fundamentals and programming skills (ideally in python) before you attempt to learn AI. Once you have the fundamentals sorted out I would recommend leveraging free online learning resources like fastai and Kaggle. I found Kaggle particularly useful because you get to see many different approaches to the same problem. You can see state of the art approaches and compare them to your own approach. You learn a lot from seeing others work.

 

What sparked your interest in working with data?

I find it rewarding developing algorithms and software that make business processes more efficient. Solving difficult problems thought to be impossible is very satisfying.

 

How did you come to work in your current role?

I founded Three Springs Technology in 2016 after repatriating back to Australia after working internationally for 15 years.

 

What sort of projects have you been working on?

We have been working on projects across many sectors including Health, Finance, Utilities and Mining / Oil & Gas. Checkout our website for specific project details.

 

What tools/platforms do you use in your work?

Typically we use python and open source libraries like keras/tensorflow and pytorch. We usually work on Linux, and our products are typically shipped in docker containers.

 

What has been a highlight of your data science career so far?

A Medical imagery model we developed for our client, Resonance Health, was cleared by the TGA, CE Mark and the FDA back in 2018. It was only the 14th AI model in the world to be cleared for use on patients by the FDA.

 

What challenges have you faced as a data scientist?

Finding clients with realistic expectations, suitable datasets and an appropriate budget to fund AI projects is the biggest challenge. Sometimes prospective clients find it difficult to articulate what their real problem is, they can lack data or labelled data sets or have unrealistic expectations around cost, accuracy requirements or application integration effort. For some business problems, having an AI solution generate a probabilistic response is not suitable. For example, summarising liabilities defined in a set of legal documents. Thankfully in the past four years we have found and worked with a set of high quality clients that have facilitated the growth of our business.

 

What are some of the big areas of opportunity/questions you want to tackle in this space?

As the AI tech improves to near human or better than human performance in several subfields, previously non viable commercial applications will become viable. For example, in the NLP space we used a gpt2 based model (only released in Nov 2019 by open-ai) to build a comment moderation product which can achieve human level performance implementing moderation policies.

 

What excites you most about recent developments in Data Science?

I like the fact that solving AI problems is becoming easier with the development and continuous improvement of open source libraries, github repos and software.

 

For people considering a career in data science, what is one piece of advice you would give?

Always persevere. Initial results are often poor. I find that the third or fourth attempt at a problem yields the best results, so don’t give up too soon.

Be inquisitive and independent. You will encounter many error messages and issues as you go through the development process of AI solutions. Learning to understand and fix these yourself will make you a better data scientist.

If it’s too good to be true, it probably is. Avoid publishing seemingly excellent results until you have thoroughly checked your code. Remember to check for data leakage and to separate your test and training sets. Peer code reviews are also a good idea.

Keep on top of the latest developments in the field. If you are in Perth going to Perth Machine Learning Group is a great way to keep informed about progress and breakthroughs in the field, and to meet people in your field. Maybe you can even find a data science job there.

View All Profiles