The Data Science Project Cycle series consists of 5 separate articles, and this part is the second article in the series. In this part, we will talk about “Literature & Best Practice, and Data Understanding”.
Note: I usually will use some abbreviated words below:
- Data Science — DS
- Artificial Intelligence — AI
- Machine Learning — ML
- Big Data — BD
- Deep Learning — DL
- Statistical Learning — SL
Literature and Best Practice Search
Since there is no need to re-discover the world, to investigate what kind of solutions the world has found in related issues and to investigate the suitability of the approaches that are known to be successful at the point of application, if any, for the relevant project. As a result of the study, knowing the approach of the world will give us an idea to start our project.
For problems that cannot be found directly in the world, we will have to produce solutions using analytical thinking, asking the right questions, using SL, ML methods. This is the most valuable feature for a Data Scientist. Developing an approach. Making assumptions, testing them, making decisions based on the results and moving forward. To know the mathematics of the job and to be able to approach problems by making modifications.
What is meant as understanding data in this article is to understand the internal data structure. If there is a large amount of data, this item is not very important. After the theoretical preliminary examination and research for the project to be developed, it is to investigate whether the internal data is suitable for the relevant project. Determining the tables to be touched by the project, creating aggregated tables if necessary, and transferring the necessary data between data platforms if necessary.
See you in our next topic “Feature Engineering”.