Why most data science projects fail?

In September 2015, Gartner published an article in which they predicted that: “through 2017, 60 percent of big data projects will fail to go beyond piloting and experimentation, and will be abandoned.” That is a massive number of project. Think about all the wasted effort, time, and money.

It is hard to tell if they were right, but for sure many data projects has failed. Perhaps not all of them were called “big data projects.” Probably, way more than 60% of projects failed. It is even possible that some people did not even realize that their project had failed.

In 2017, Kaggle published their first data science survey. One of the asked questions is: “At work, which barriers or challenges have you faced this past year?”

People pointed out a diverse set of reasons, such as:

  • Lack of data science talent in the organization

  • Company politics / Lack of management/financial support for a data science team

  • Unavailability of/difficult access to data

  • Data Science results not used by business decision makers

  • Lack of significant domain expert input

  • Limitations of tools

  • Need to coordinate with IT

  • Lack of funds to buy useful datasets from external sources

  • Difficulties in deployment/scoring

  • Limitations in state of the art in machine learning

For me, the most surprising was: “The lack of a clear question to be answering or a clear direction to go in with the available data.”

The one problem

Honestly, I think it is the reason why most projects fail. It is all about setting the goal. If you decide what questions you want to answer or what you want to achieve dealing with other problems becomes significantly easier.

The goal helps you decide what skills are required, so you know who should be hired for the job. Hence lack of skills should no longer be a problem.

Lack of management/financial support is way easier to overcome when the desired outcome is known. You can make a plan, decide who should be involved in the project, and even estimate the required budget.

If you can’t tell what the purpose of a project is, it may indicate that the project does not support any of the business goals. It should not be a surprise that business decision makers did not use such a project.

Problems related to cooperation with other teams, lack of funds, or limited access to experts don’t exist when you have a clear goal that supports the primary purpose of the business because such initiatives typically have both management support and huge budgets.

Conclusion

I am not saying that having a purpose and knowing what you want to achieve magically solves all of your problems. No. It does not, but without a goal, you don’t even know what problems you have, so good lack solving any of them.

It is easy to fail a project if you have no idea what you want to achieve. What if 60% percent of projects which were predicted to fail shouldn’t have been started in the first place? What if the outcome of those projects didn’t matter because they were trying to solve an issue that looked cool but were irrelevant from the perspective of the whole business?

Perhaps we should ask “why” more often. After all, curiosity is the fundamental trait of a scientist, and many of us dare to call themselves data scientists. It is not just a trendy job description. Those words have actual meaning, and people have some expectations when they hear them.

Everything gets easier when you have a goal which you deeply understand. We should not only know what we are doing but also know all of the reasons and the context.

To achieve that, we must be interested not only in data science techniques and tools but also in business. We must know not only how the company operates, but also understand how the whole industry works.

Additionally, we must be aware of company politics because sometimes projects are started when the Big Boss with Big Ego wants an extraordinary success just to brag in front of the Board of directors. That is ok too! As long as you know about the real purpose and understand how it affects you.

Older post

Product/market fit - buidling a data-driven product

How to test a product idea?

Newer post

Minkowski distance explained

Manhattan distance, Euclidean distance, and Chebyshev distance are types of Minkowski distances