Roadmap to Become a “Data Scientist”

Mehmet Akturk
8 min readDec 6, 2020

Do you say I want to become a Data Scientist but I don’t know how to start? Then I am offering you a roadmap. For this, I will experience what you have to do and try to rank it based on what I see.

https://www.analyticsvidhya.com/blog/2020/01/learning-path-data-scientist-machine-learning-2020/

Hey Data Scientist Candidate!

Take the first step by updating your LinkedIn profile now: Data Scientist Candidate, Data Science Enthusiast or Data Science Researcher. Write even if you don’t feel ready, no one is born a data scientist out of the womb. While this profile keeps everyone to see your motivation, it also keeps you from stopping working to reach your goal and becomes a driving force to maintain your motivation.

Those who heed the following advice will achieve their desires in three times.

After updating your profile:

1. Attending All Events Related to Data Science

https://www.springboard.com/blog/data-science-2020-election/

There are free events in the country you are in, even go to paid ones to be valuable. Start following them and attending events without wasting time. This will teach you the language of data science, get you to know the concepts faster, enable you to learn from the experts and, most importantly, help you create the connections you will need later. Some communities that will help you follow;

So why it’s important?
After swallowing some of the dust of these events, you will say that “the meetups are empty, we do not go into the details of anything…”.
You are right, but the aim is to give information about the general framework. These frames are what you need right now. Try to participate in all of these events.

2. Follow Data Science Blogs and Data Scientists

https://dataconomy.com/2020/08/online-events-for-data-scientists-that-you-cant-miss-this-autumn/

There is a possibility of some information pollution and confusion here. For this, I recommend that you prefer blogs that are up-to-date, write their articles and continuously produce content.

Some Data Science Blogs:
- analyticsvidhya.com
- datasciencecentral.com
- kdnuggets.com
- towardsdatascience.com

Some Data Scientist / Artificial Intelligence Expert Profiles:
- Trevor Hastie
- Hadley Wickham
- Andrew Ng
- Yann LeCun
- Carla Gentry
- Jeremy Achin
- Geoff Hinton
- Vincent Granville
- Kirk Borne
- DJ Patil

You may not know the names above, but as time passes, you will see which trends Hastie and Andrew represent, what innovations Hinton has made, that Wickham is the living legend of the world of R, and that Vincent is a troll in this area in a positive sense. Similarly, you will see that you cannot become a Data Scientist without adding others to your network. There are many more talented data scientists, of course, but we added the ones that came to mind at first sight.

3. Choose a Programming Language

https://www.devsaran.com/blog/10-best-programming-languages-2020-you-should-know

After licking and swallowing SQL, choose one of R or Python and start learning right away. You have to be very good at either of these.
Our advice is to learn both. R is the best tool for the job, but it has limitations on some points. Python and Scala provide the best answer to the need for larger projects and scalability. However, we can qualify Python and R as indispensable because it is data science, not a product or a big data-driven business.

Most of the level you need for SQL at this address is offered for free. You can check the resources for R and Python for paid users.

Resources you can learn R and Python:

4. Eliminate Statistical Learning

Enough is postponed, it’s time. You see, there is not a lot of it. LEARN. Even if they don’t teach, they will LEARN. Best of course to learn!

We Expressed The Following Items For The Abilities That Should Be Acquired Under The Title Of Statistical Learning:

  • Tidy Data Process and Data Pre-Processing (missing data, outlier, inconsistency reviews, etc.)
  • Discovery Data Analysis (Descriptive Statistics, Data Visualization)
    Inferential Statistics (sample theory, probability distributions, random variables, hypothesis testing, bayesian inference, robust methods)
  • Multivariate Statistical Methods (correlation, dimension reduction (PCA, LDA, Kernel PCA), analysis of variance, cluster analysis, factor analysis, fit analysis, path analysis, separation analysis etc.)
  • Regression Models: Linear regression, logit-probit, m.logit-m.probit, quantile regression etc.
  • Resampling Methods (resampling methods: cross-validation, bootstrap)
  • Linear Model Selection and Regularization
  • Linearity and Causality

For these abilities, courses on the following topics can be completed via udemy, udacity or coursera:

Udacity: Intro to Statistics
Udacity: Intro to Descriptive Statistics
Udacity: Intro to Inferential Statistics
Udacity: Exploratory Data Analysis
Udemy: Random Variables & Probability Distributions
Udemy: Statistics for Data Science and Business Analysis
Coursera: Bayesian Statistics: Techniques and Models

Or you can put all of them aside and take the slightest variations and get just these two trainings, which are collective trainings:

edX: Learn how statistics plays a central role in the data science approach: Statistical Thinking for Data Science and Analytics

Coursera: Statistics with R Specialization

At worst, we strongly recommend that you consider the last two courses above.

5. Don’t Let Machine Learn, You Learn

https://medium.com/app-affairs/9-applications-of-machine-learning-from-day-to-day-life-112a47a429d0

We Expressed The Following Items For The Abilities That Should Be Acquired Under The Heading Of Machine Learning:

  • Regression Models: Multiple Regression, Polynomial Regression, SVR,
  • Regression Trees, Random Forest Regression et al.
  • Classification: Logistic Regression, K-NN, SVM, Naive Bayes, Decision Trees, Community Learning Methods (bagging, boosting, RF, etc.)
  • Clustering: Hierarchical and Non-Hierarchical Clustering Methods (Hierarchical Clustering, K-Means)
  • Association Rules: Apriori, Eclat
  • Text Mining, NLP
  • Reinforcement Learning
  • Deep Learning
  • Model Selection (validation, test failure methods, model performance evaluation, parameter tuning) and Knowing Learning Disorders (underfitting, overfitting, good fitting)
  • Awareness that the simple will always be better and the words “All Models Are Bad, Some Are Useful” (George E.P. Box)
  • Is the forecast closeness? Causality? Very good understanding of their situation.

For this topic, we strongly recommend that you complete all of the following trainings in the order given:

1. Statistical Learning(SL)

Rob Tibshirani: Professor of Health Research and Policy, and Statistics, Stanford

Trevor Hastie: Professor of Statistics, Stanford

The dear citizens above, who are the academic consultants of the h2o company, which will be well known by those working in the field of machine learning, explain almost every topic under the ML title under the title SL. It will create great awareness of understanding the rationale and how logical it is and the importance of statistics used at almost every point in ML — DS. For this reason, these resources are recommended after the study of the SL title is finished.

I also highly recommend the following books for free as someone who has printed and scribbled them all:
- An Introduction to Statistical Learning with Applications in R
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Computer Age Statistical Inference: Algorithms, Evidence and Data Science
- Statistical Learning with Sparsity: The Lasso and Generalizations

2. Machine Learning

You should definitely get this legendary training on Coursera from Andrew Ng, who is also one of the living legends of the machine learning and deep learning world and is also the founder of Coursera.

Andrew Ng, Co-founder, Coursera; Adjunct Professor, Stanford University; formerly head of Baidu AI Group / Google Brain

We also highly recommend Andrew Ng’s deep learning lessons at https://www.deeplearning.ai/.

3. Machine Learning A-Z ™: Hands-On Python & R In Data Science(DS)

Finally, we recommend you to get this training, which is one of the best sellers in udemy, because of its unifying feature in terms of R, Python and DS.

6. Aquire Big Data Talents

https://www.digitalvidya.com/blog/data-analytics-skills/

We also recommend aggregated training on big data:
Coursera: Big Data Specialization
Udemy: Spark and Python for Big Data with PySpark

7. Make a Project

https://www.wrike.com/blog/how-to-write-a-project-plan-easy-steps/

You should try to get used to this life cycle to do good or bad projects. The biggest shortcoming of Data Scientist candidates is that they usually do not have any experience with a project. To remedy this shortcoming, you can:

  • Finishing All Projects in the Trainings Given in Previous Chapters
  • Reviewing Kaggle Projects

Kaggle: A platform that organizes competitions on DS, ML, and DL. By examining the projects done here, you can get an idea of how DS projects are being produced using all of the tools above. My advice is to come to the minimum Expert position in Kaggle.

8. How to Make a Data Science Project?

https://www.kdnuggets.com/2018/05/descriptive-analytics-machine-learning-deep-learning-crisp-dm.html

Now that we have all the skills and tools we have at our disposal, it is now time to do projects. But how? How is a Data Science Project done? What should be considered when making a data science project? How should we go about this? If you are looking for a bedside guide to answer your questions and take a Data Science Project, you can check out this article:
Data Science Project Cycle

9. Job Applications

https://www.debatingeurope.eu/2020/03/02/should-job-applications-be-anonymous-to-reduce-bias/#.X81jP7PvLIU

With the meetups and events you attended since the beginning of the process, the environment you have acquired may have some contribution here. You can start this process by using this environment and preparing a good CV. More valuable than a good CV is a nice linkedin, github, medium, kaggle account. Do not forget to add all of these topics that you have trained on Linkedin to your skills, or even ask the people you are trained to validate their skills.

If you want to know more about DS, you can check out my other serial articles. Sample:
What is Data Science(DS) and How can it be learned?

You can reach me from my Linkedin account for all your questions and requests.

https://cdn.mos.cms.futurecdn.net/kMFJjsRDbNLmQ3MGCwbhsL-970-80.jpg.webp

Hope to meet you in other series articles and articles…🖖🏼

References
1. https://www.analyticsvidhya.com/blog/2020/01/learning-path-data-scientist-machine-learning-2020/
2. https://www.springboard.com/blog/data-science-2020-election/
3. https://dataconomy.com/2020/08/online-events-for-data-scientists-that-you-cant-miss-this-autumn/
4. https://www.devsaran.com/blog/10-best-programming-languages-2020-you-should-know
5. https://medium.com/app-affairs/9-applications-of-machine-learning-from-day-to-day-life-112a47a429d0
6. https://www.veribilimiokulu.com/blog/veri-bilimci-olmak-icin-yol-haritasi/
7. https://www.digitalvidya.com/blog/data-analytics-skills/
8. https://www.wrike.com/blog/how-to-write-a-project-plan-easy-steps/
9. https://towardsdatascience.com/how-to-start-your-first-data-science-project-9c2afcaaa1a
10. https://www.kdnuggets.com/2018/05/descriptive-analytics-machine-learning-deep-learning-crisp-dm.html
11. https://www.debatingeurope.eu/2020/03/02/should-job-applications-be-anonymous-to-reduce-bias/#.X81jP7PvLIU
12. https://cdn.mos.cms.futurecdn.net/kMFJjsRDbNLmQ3MGCwbhsL-970-80.jpg.webp

--

--

Mehmet Akturk

Experienced Ph.D. with a demonstrated history of working in the higher education industry. Skilled in Data Science,AI,NLP,Deep Learning,Big Data,& Mathematics.