“Data Science” Project Cycle (Part 4)

Mehmet Akturk
3 min readJan 17, 2021

--

The Data Science Project Cycle series consists of 5 separate articles, and this part is the fourth article in the series. In this part, we will talk about “Modelling and Model Evaluation”.

https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/

Note: I usually will use some abbreviated words below:

  • Data Science — DS
  • Artificial Intelligence — AI
  • Machine Learning — ML
  • Big Data — BD
  • Deep Learning — DL
  • Statistical Learning — SL

Let’s continue…

Modelling and Model Evaluation

We have data free of all kinds of pollution, mess and anomalies, and it’s time to prophesy!

The question is this: I’m going to build a predictive model, I have many algorithms. But the coolest is the X algorithm used in a competition I’ve seen in the kaggle. What do you think I will make my predictive model using this algorithm?

https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html

Answer: You are going out in the evening, going to a rock concert with friends, there are a lot of dresses in the closet, but the coolest tuxedo, you wear it.

One of the most important steps in DS processes is to develop an approach to solve a business problem with existing analytical approaches (SL + ML).

One of the most common problems we encounter in our observations and recruitment processes is the inability to develop a solution to a business problem using statistics and machine learning.

https://www.apm.org.uk/blog/10-common-problems-project-teams-face/

To give a few examples:

  • The most important item: The theory of work must be looked at for the relevant problem! What does economic theory say? What does the behavioral science say?
  • Machine learning obsession!
  • Tend to use the same algorithm for every problem. The model should be selected by making performance comparisons among possible tens, maybe hundreds of models.
  • There is a persistent focus on classification algorithms for the solution of problems whose dependent variable is a continuous variable.
  • For cases where the classes of the dependent variable are certain and the number of classes is more than 2 -for example, 10- entering the algorithm selection without questioning whether the dependent variable has 10 classes.
  • Confusion of clustering and classification problems.
  • Not being aware of the distinction between machine learning and causation principle. We can clearly state that, in general — in the business world — awareness on this issue is very low.
  • Examining theoretical assumptions, if necessary.

See you on “Running on Production”, the last article of this series.

http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

References
1. https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/
2. https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html
3. https://www.veribilimiokulu.com/blog/veri-bilimi-proje-dongusu/
4. https://www.apm.org.uk/blog/10-common-problems-project-teams-face/
5. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

--

--

Mehmet Akturk
Mehmet Akturk

Written by Mehmet Akturk

Experienced Ph.D. with a demonstrated history of working in the higher education industry. Skilled in Data Science,AI,NLP,Deep Learning,Big Data,& Mathematics.

Responses (1)