Do you say I want to become a Data Scientist but I don’t know how to start? Then I am offering you a roadmap. For this, I will experience what you have to do and try to rank it based on what I see.
Hey Data Scientist Candidate!
Take the first step by updating your LinkedIn profile now: Data Scientist Candidate, Data Science Enthusiast or Data Science Researcher. Write even if you don’t feel ready, no one is born a data scientist out of the womb. While this profile keeps everyone to see your motivation, it also keeps you from stopping working to reach your goal and becomes a driving force to maintain your motivation.
Those who heed the following advice will achieve their desires in three times.
After updating your profile:
1. Attending All Events Related to Data Science
There are free events in the country you are in, even go to paid ones to be valuable. Start following them and attending events without wasting time. This will teach you the language of data science, get you to know the concepts faster, enable you to learn from the experts and, most importantly, help you create the connections you will need later. Some communities that will help you follow;
So why it’s important?
After swallowing some of the dust of these events, you will say that “the meetups are empty, we do not go into the details of anything…”.
You are right, but the aim is to give information about the general framework. These frames are what you need right now. Try to participate in all of these events.
2. Follow Data Science Blogs and Data Scientists
There is a possibility of some information pollution and confusion here. For this, I recommend that you prefer blogs that are up-to-date, write their articles and continuously produce content.
You may not know the names above, but as time passes, you will see which trends Hastie and Andrew represent, what innovations Hinton has made, that Wickham is the living legend of the world of R, and that Vincent is a troll in this area in a positive sense. Similarly, you will see that you cannot become a Data Scientist without adding others to your network. There are many more talented data scientists, of course, but we added the ones that came to mind at first sight.
3. Choose a Programming Language
After licking and swallowing SQL, choose one of R or Python and start learning right away. You have to be very good at either of these.
Our advice is to learn both. R is the best tool for the job, but it has limitations on some points. Python and Scala provide the best answer to the need for larger projects and scalability. However, we can qualify Python and R as indispensable because it is data science, not a product or a big data-driven business.
Most of the level you need for SQL at this address is offered for free. You can check the resources for R and Python for paid users.
Resources you can learn R and Python:
4. Eliminate Statistical Learning
Enough is postponed, it’s time. You see, there is not a lot of it. LEARN. Even if they don’t teach, they will LEARN. Best of course to learn!
We Expressed The Following Items For The Abilities That Should Be Acquired Under The Title Of Statistical Learning:
- Tidy Data Process and Data Pre-Processing (missing data, outlier, inconsistency reviews, etc.)
- Discovery Data Analysis (Descriptive Statistics, Data Visualization)
Inferential Statistics (sample theory, probability distributions, random variables, hypothesis testing, bayesian inference, robust methods)
- Multivariate Statistical Methods (correlation, dimension reduction (PCA, LDA, Kernel PCA), analysis of variance, cluster analysis, factor analysis, fit analysis, path analysis, separation analysis etc.)
- Regression Models: Linear regression, logit-probit, m.logit-m.probit, quantile regression etc.
- Resampling Methods (resampling methods: cross-validation, bootstrap)
- Linear Model Selection and Regularization
- Linearity and Causality
For these abilities, courses on the following topics can be completed via udemy, udacity or coursera:
Udacity: Intro to Statistics
Udacity: Intro to Descriptive Statistics
Udacity: Intro to Inferential Statistics
Udacity: Exploratory Data Analysis
Udemy: Random Variables & Probability Distributions
Udemy: Statistics for Data Science and Business Analysis
Coursera: Bayesian Statistics: Techniques and Models
Or you can put all of them aside and take the slightest variations and get just these two trainings, which are collective trainings:
edX: Learn how statistics plays a central role in the data science approach: Statistical Thinking for Data Science and Analytics
At worst, we strongly recommend that you consider the last two courses above.
5. Don’t Let Machine Learn, You Learn
We Expressed The Following Items For The Abilities That Should Be Acquired Under The Heading Of Machine Learning:
- Regression Models: Multiple Regression, Polynomial Regression, SVR,
- Regression Trees, Random Forest Regression et al.
- Classification: Logistic Regression, K-NN, SVM, Naive Bayes, Decision Trees, Community Learning Methods (bagging, boosting, RF, etc.)
- Clustering: Hierarchical and Non-Hierarchical Clustering Methods (Hierarchical Clustering, K-Means)
- Association Rules: Apriori, Eclat
- Text Mining, NLP
- Reinforcement Learning
- Deep Learning
- Model Selection (validation, test failure methods, model performance evaluation, parameter tuning) and Knowing Learning Disorders (underfitting, overfitting, good fitting)
- Awareness that the simple will always be better and the words “All Models Are Bad, Some Are Useful” (George E.P. Box)
- Is the forecast closeness? Causality? Very good understanding of their situation.
For this topic, we strongly recommend that you complete all of the following trainings in the order given:
1. Statistical Learning(SL)
Rob Tibshirani: Professor of Health Research and Policy, and Statistics, Stanford
Trevor Hastie: Professor of Statistics, Stanford
The dear citizens above, who are the academic consultants of the h2o company, which will be well known by those working in the field of machine learning, explain almost every topic under the ML title under the title SL. It will create great awareness of understanding the rationale and how logical it is and the importance of statistics used at almost every point in ML — DS. For this reason, these resources are recommended after the study of the SL title is finished.
I also highly recommend the following books for free as someone who has printed and scribbled them all:
- An Introduction to Statistical Learning with Applications in R
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Computer Age Statistical Inference: Algorithms, Evidence and Data Science
- Statistical Learning with Sparsity: The Lasso and Generalizations
2. Machine Learning
You should definitely get this legendary training on Coursera from Andrew Ng, who is also one of the living legends of the machine learning and deep learning world and is also the founder of Coursera.
Andrew Ng, Co-founder, Coursera; Adjunct Professor, Stanford University; formerly head of Baidu AI Group / Google Brain
Finally, we recommend you to get this training, which is one of the best sellers in udemy, because of its unifying feature in terms of R, Python and DS.
6. Aquire Big Data Talents
7. Make a Project
You should try to get used to this life cycle to do good or bad projects. The biggest shortcoming of Data Scientist candidates is that they usually do not have any experience with a project. To remedy this shortcoming, you can:
- Finishing All Projects in the Trainings Given in Previous Chapters
- Reviewing Kaggle Projects
Kaggle: A platform that organizes competitions on DS, ML, and DL. By examining the projects done here, you can get an idea of how DS projects are being produced using all of the tools above. My advice is to come to the minimum Expert position in Kaggle.
8. How to Make a Data Science Project?
Now that we have all the skills and tools we have at our disposal, it is now time to do projects. But how? How is a Data Science Project done? What should be considered when making a data science project? How should we go about this? If you are looking for a bedside guide to answer your questions and take a Data Science Project, you can check out this article:
— Data Science Project Cycle
9. Job Applications
With the meetups and events you attended since the beginning of the process, the environment you have acquired may have some contribution here. You can start this process by using this environment and preparing a good CV. More valuable than a good CV is a nice linkedin, github, medium, kaggle account. Do not forget to add all of these topics that you have trained on Linkedin to your skills, or even ask the people you are trained to validate their skills.
If you want to know more about DS, you can check out my other serial articles. Sample:
What is Data Science(DS) and How can it be learned?
You can reach me from my Linkedin account for all your questions and requests.
Hope to meet you in other series articles and articles…🖖🏼