“Data Science” Project Cycle (Part 4)
The Data Science Project Cycle series consists of 5 separate articles, and this part is the fourth article in the series. In this part, we will talk about “Modelling and Model Evaluation”.
Note: I usually will use some abbreviated words below:
- Data Science — DS
- Artificial Intelligence — AI
- Machine Learning — ML
- Big Data — BD
- Deep Learning — DL
- Statistical Learning — SL
Let’s continue…
Modelling and Model Evaluation
We have data free of all kinds of pollution, mess and anomalies, and it’s time to prophesy!
The question is this: I’m going to build a predictive model, I have many algorithms. But the coolest is the X algorithm used in a competition I’ve seen in the kaggle. What do you think I will make my predictive model using this algorithm?
Answer: You are going out in the evening, going to a rock concert with friends, there are a lot of dresses in the closet, but the coolest tuxedo, you wear it.
One of the most important steps in DS processes is to develop an approach to solve a business problem with existing analytical approaches (SL + ML).
One of the most common problems we encounter in our observations and recruitment processes is the inability to develop a solution to a business problem using statistics and machine learning.
To give a few examples:
- The most important item: The theory of work must be looked at for the relevant problem! What does economic theory say? What does the behavioral science say?
- Machine learning obsession!
- Tend to use the same algorithm for every problem. The model should be selected by making performance comparisons among possible tens, maybe hundreds of models.
- There is a persistent focus on classification algorithms for the solution of problems whose dependent variable is a continuous variable.
- For cases where the classes of the dependent variable are certain and the number of classes is more than 2 -for example, 10- entering the algorithm selection without questioning whether the dependent variable has 10 classes.
- Confusion of clustering and classification problems.
- Not being aware of the distinction between machine learning and causation principle. We can clearly state that, in general — in the business world — awareness on this issue is very low.
- Examining theoretical assumptions, if necessary.
See you on “Running on Production”, the last article of this series.
References
1. https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/
2. https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html
3. https://www.veribilimiokulu.com/blog/veri-bilimi-proje-dongusu/
4. https://www.apm.org.uk/blog/10-common-problems-project-teams-face/
5. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/