Skip to content
Home » What I learnt from Failures as a Data Science Consultant

What I learnt from Failures as a Data Science Consultant

  • by

✍️ Achyuthuni Sri Harsha (MSc Business Analytics Part Time, online)

Being a data science consultant in one of the Big 4 is a challenging yet exciting job. I interact
with multiple data science and business teams across different organisations and implement
the same technologies in different business problems in unexpected ways.

How do we avoid the common pitfalls of data science consulting? Photo credit here.


I have been a data scientist for four years, generally working in the retail, supply chain and
manufacturing domains. The first few projects that I was involved in were failures, which taught me a lot. However, since then, I have been able to complete projects successfully (fingers crossed). Before discussing the learnings, what is meant by success in a consulting job for a data scientist?

In my opinion, a successful project is either a working (deployed) version or a POC which the client development team can (and knows how to) easily integrate into their systems. Additionally, the business partners should trust the systems that we built to start using them.

This is a problem that every data scientist faces. A Gartner study says that around 85% of
the data science projects never make it to production.

… around 85% of
the data science projects never make it to production.

https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/


Nowadays, when I get a proposal for a data science project, I ask some questions:

  1. Is this a data science problem?: Although data literacy among businesses is growing, some managers across industries still think every problem can be solved using AI and Data Science. Not every business problem is a data science problem. Some problems are too complex/challenging to be solved with the existing AI and data science technologies.
  2. Can this be solved efficiently using other techniques?: Many business problems can be answered using a dashboard and some SQL queries. Many others can be modelled using simple if-else queries. We should always look if using data science is overkill for solving the problem.
  3. How would this be consumed/ deployed?: Another aspect that affects the data science process is understanding how the solution will be deployed/consumed. After building the models in some of my earlier projects, I realised that I had to re-do all of my work because of the way customers wanted to consume the product. By looking at the deployment before we start a project, we can better clarify and align better with the business’s goals. 
  4. Do we have data?: Many organisations have challenges with the quality, veracity, and type of data available. Furthermore, even if they have good quality data, they might have issues sharing it with a consultant outside the company/geography. Among all the other points, this is the biggest problem that I faced.
  5. Do we have relevant data?: There is a methodology called CRISM-DM that I use when trying to start a new project. The first thing many data scientists do is to have a look at the data. I don’t do that. Instead, I have discussions with the business and write down the features I think are essential for solving the problem. I try to keep my features mutually exclusive as well as collectively exhaustive. Once I have all the features that the business feels are relevant, I will look at the data. This will help me in two ways. Almost every project I was given missed data related to one or two features, which was captured/used by a different team. Feature engineering also becomes easier as we would know how each of these features affects the problem. 
  6. Do I know the relevant tools and technologies? Data science is a vast topic, all the way from optimisation, regression, classification, clustering, forecasting to ML and AI. I don’t know every topic in data science. Every problem is unique and will require a different technology/model/tool to be used. Using the wrong tools will make the problem more complex and will eventually lead to not being deployed. The top 5 algorithms I use are optimisation, linear/logistic regression, decision trees (random forest and XGboost), and clustering. 


While starting a consulting project, these are the top 5 (+1) points I look out for.


Check out Harsha’s website for more articles about Data Science!


Message from the Commmittee

This article was contributed from one of the students from Imperial College. The ICDSS Team would like to express thanks to Harsha for sharing some of the expertise he gained whilst working in industry. If you have some stories to share and would like to contribute to our blog, we would love to hear from you!