“Data Science” Project Cycle (Part 1)

It is not possible to make flow loops that cover every Data Science project. However, the question of how should a Data Science project cycle be for projects with Machine Learning output will be tried to be answered in these article series. This series consists of 5 separate articles and is the first article in this series. In this part, we will talk about “Business Understanding”.


Note: I usually will use some abbreviated words below:

  • Data Science — DS
  • Artificial Intelligence — AI
  • Machine Learning — ML
  • Big Data — BD
  • Deep Learning — DL
  • Statistical Learning — SL

Let’s start…

This visual is quoted from Hadley Wickham’s book R for DS:


With which tool the above loops can be done (for R):


It’s hard to plan ML projects — this checklist makes it easier:


1. Business Understanding

All items in this article are the most important items for projects developed within corporate companies.

1.1. Understanding the Business Domain and Defining the Problem Clearly


I think this quote from Einstein is enough to show where most of our preparation, thinking, and action will go.

1.2. DS Project Expectation Management

After the problem is fully expressed, the second item that is also very important is the management of the expectations of the relevant units for the DS project. Many of us have experienced that expectations may differ from the moment of departure to the end of the project many times. The reason for this situation is that all concerned should expect the same results regarding the project. However, necessary ideas should be exchanged with everyone who has more or less experience in the relevant subject. The understanding of decision making by looking at data should turn into a way of making a decision by looking at data and applying business information.


You may not be able to meet everyone’s expectations, but it is very important that you try to do it systematically to your best!

1.3. Determining the Success Metric of the Project

In other words, will the success of the project be the prediction performance of the ML model used in the project or will it be determined by the recycling that will occur after a live system to be integrated? Or will both be a way of evaluating success? The answer to these questions should be known at the beginning of the study by all critics. This is another very important item.


After defining the problem completely, fixing the project output expectations of the relevant people, and determining what the success measurement will be, the steps of literature review and best practice search come.


Let me put a stop to our topic here and say we will see you in our next topic, “Literature & Best Practice, and Data Understanding.”

1. https://bigdataanalyticsnews.com/10-essential-skills-you-need-to-be-a-data-scientist/
2. https://books.google.nl/books?id=vfi3DQAAQBAJ&printsec=frontcover&hl=nl&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false
3. https://spark.rstudio.com/post/
4. https://spark.rstudio.com/guides/data-lakes/
5. https://towardsdatascience.com/the-machine-learning-project-checklist-d9ee6e33a2b2
6. https://www.slideshare.net/AjoyBasu/problem-solving-using-six-sigma
7. https://peadarcoyle.com/2018/05/21/expectations-management-in-data-science-projects/
8. https://www.ranorex.com/blog/metrics-measure-automation-success/
9. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

Experienced Ph.D. with a demonstrated history of working in the higher education industry. Skilled in Data Science, AI, Deep Learning, Big Data, & Mathematics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store