“Data Science” Project Cycle (Part 4)

3 min readJan 17, 2021

The Data Science Project Cycle series consists of 5 separate articles, and this part is the fourth article in the series. In this part, we will talk about “Modelling and Model Evaluation”.

https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/

Note: I usually will use some abbreviated words below:

Data Science — DS
Artificial Intelligence — AI
Machine Learning — ML
Big Data — BD
Deep Learning — DL
Statistical Learning — SL

Let’s continue…

Modelling and Model Evaluation

We have data free of all kinds of pollution, mess and anomalies, and it’s time to prophesy!

The question is this: I’m going to build a predictive model, I have many algorithms. But the coolest is the X algorithm used in a competition I’ve seen in the kaggle. What do you think I will make my predictive model using this algorithm?

https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html

Answer: You are going out in the evening, going to a rock concert with friends, there are a lot of dresses in the closet, but the coolest tuxedo, you wear it.
One of the most important steps in DS processes is to develop an approach to solve a business problem with existing analytical approaches (SL + ML).

One of the most common problems we encounter in our observations and recruitment processes is the inability to develop a solution to a business problem using statistics and machine learning.

https://www.apm.org.uk/blog/10-common-problems-project-teams-face/

To give a few examples:

The most important item: The theory of work must be looked at for the relevant problem! What does economic theory say? What does the behavioral science say?
Machine learning obsession!
Tend to use the same algorithm for every problem. The model should be selected by making performance comparisons among possible tens, maybe hundreds of models.
There is a persistent focus on classification algorithms for the solution of problems whose dependent variable is a continuous variable.
For cases where the classes of the dependent variable are certain and the number of classes is more than 2 -for example, 10- entering the algorithm selection without questioning whether the dependent variable has 10 classes.
Confusion of clustering and classification problems.
Not being aware of the distinction between machine learning and causation principle. We can clearly state that, in general — in the business world — awareness on this issue is very low.
Examining theoretical assumptions, if necessary.

See you on “Running on Production”, the last article of this series.

http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

References
1. https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/
2. https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html
3. https://www.veribilimiokulu.com/blog/veri-bilimi-proje-dongusu/
4. https://www.apm.org.uk/blog/10-common-problems-project-teams-face/
5. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/

“Data Science” Project Cycle (Part 4)

Modelling and Model Evaluation

Written by Mehmet Akturk

Responses (1)