# “Data Science” Project Cycle (Part 4)

The Data Science Project Cycle series consists of 5 separate articles, and this part is the fourth article in the series. In this part, we will talk about “**Modelling and Model Evaluation”**.

*Note:** I usually will use some abbreviated words below:*

*Data Science — DS**Artificial Intelligence — AI**Machine Learning — ML**Big Data — BD**Deep Learning — DL**Statistical Learning — SL*

*Let’s continue…*

**Modelling and Model Evaluation**

We have data free of all kinds of pollution, mess and anomalies, and it’s time to prophesy!

The question is this: I’m going to build a predictive model, I have many algorithms. But the coolest is the X algorithm used in a competition I’ve seen in the kaggle. What do you think I will make my predictive model using this algorithm?

Answer: You are going out in the evening, going to a rock concert with friends, there are a lot of dresses in the closet, but the coolest tuxedo, you wear it.

One of the most important steps in DS processes is to develop an approach to solve a business problem with existing analytical approaches (SL + ML).

One of the most common problems we encounter in our observations and recruitment processes is the inability to develop a solution to a business problem using statistics and machine learning.

To give a few examples:

- The most important item: The theory of work must be looked at for the relevant problem! What does economic theory say? What does the behavioral science say?
- Machine learning obsession!
- Tend to use the same algorithm for every problem. The model should be selected by making performance comparisons among possible tens, maybe hundreds of models.
- There is a persistent focus on classification algorithms for the solution of problems whose dependent variable is a continuous variable.
- For cases where the classes of the dependent variable are certain and the number of classes is more than 2 -for example, 10- entering the algorithm selection without questioning whether the dependent variable has 10 classes.
- Confusion of clustering and classification problems.
- Not being aware of the distinction between machine learning and causation principle. We can clearly state that, in general — in the business world — awareness on this issue is very low.
- Examining theoretical assumptions, if necessary.

See you on “Running on Production”, the last article of this series.

**References**

1. https://area.autodesk.com/blogs/the-maya-blog/rigging-and-grooming-a-3d-cartoon-character-in-maya/

2. https://www.dailymail.co.uk/tvshowbiz/article-7774673/Robert-Downey-Jr-puts-dapper-display-paisley-suit-Rainforest-Fund-benefit-concert-NYC.html

3. https://www.veribilimiokulu.com/blog/veri-bilimi-proje-dongusu/

4. https://www.apm.org.uk/blog/10-common-problems-project-teams-face/

5. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/