Weak Learners & Strong Learners for Machine Learning
This series(“Bagging & Boosting Ensemble Methods and What is the Difference Between Them?”) consists of 6 separate articles and is the second article in this series. In this part, we will talk about “Weak Learners & Strong Learners for Machine Learning”.
Note: I usually will use some abbreviated words below:
- Machine Learning — ML
Let’s start…
In both, homogeneous and heterogeneous ensemble methods we said that the individual models are called weak learners, in the homogeneous ensemble method these weak learners are built using the same ML algorithms, whereas in the heterogeneous ensemble methods these weak learners are built using different ML algorithms.
The questions are:
- So what do these weak learners do?
- Why are they more important for understanding any ensemble methods?
Weak learning is the same as any ML model, unlike the strong ML models they won’t try to generalize for all the possible target cases. The weak learners only try to predict a combination of target cases or a single target accurately.
Are you confused?
Try to understand this with an example. Before that, we need to understand about bootstrapping. Once we learn about bootstrapping, then we will take an example to understand weak learning and strong learning methodology in more detail.
Bootstrapping
For building multiple models whether it is a homogeneous or heterogeneous ensemble method the dataset is the same.
So how to use the same dataset for building multiple models?
For each model, we need to take a sample of data, but we need to be very careful while creating these samples of data. Because if we randomly take the data, in a single sample we will end up with only one target class or the target class distribution won’t be the same. This will affect model performance.
To overcome this we need a smart way to create these samples, known as bootstrapping samples.
Bootstrapping is a statistical method to create sample data without leaving the properties of the actual dataset. The individual samples of data called bootstrap samples.
Each sample is an approximation for the actual data. These individual sample has to capture the underlying complexity of the actual data. All data points in the samples are randomly taken with replacement.
Above image from the actual dataset, we created 3 bootstrap samples. In this case, we are creating an equal size sample. We don’t have any hard rule saying all the bootstrap sample sizes should be the same.
In the bootstrapping properties, we said that the data points will take randomly and with replacement. From the above image, the second bootstrapping sample is having a repeated data point (this is light green).
Same as the above image, using the actual dataset we will create bootstrap samples. Then each bootstrap sample is used to create multiple models.
By now we have learned how the individual sample datasets are created and we also learned these datasets are used for building the multiple weak learners. The combination of all weak learners makes a strong learner or strong model.
1. Weak Learners
Let’s try to understand about weak learning with the help of the above example.
Week learners are the individual models to predict the target outcome. But these models are not the optimal models. In other words, we can say they are not generalized to predict accurately for all the target classes and for all the expected cases.
They will focus on predicting accurately only for a few cases. If you see the above example.
The original dataset is having two possible outcomes:
- Red
- Green
The above representation predicts the target red or green with some features.
The first learner accurately predicted the green, the second weak learner also accurately predicting the green. Whereas the last weak learner is accurately predicting red. As we said before, weak learning accurately predicts one target class.
Combining all the weak learners makes the strong model which generalized and optimized well enough for accurately predicting all the target classes.
2. Strong Learners
We said a combination of all the weak learners builds a strong model. And I think the above figure explains this well.
How do these individuals build trains at once, how do they perform the predictions?
Based on the way the individual models (weak learners) training phase the bagging and boosting methods will vary.
Let me put a stop to our topic here and say we will see you in our next topic, What Is “Bagging” Ensemble Method?.
References
1. https://livebook.manning.com/book/grokking-machine-learning/chapter-10/v-9/14
2. https://dataaspirant.com/ensemble-methods-bagging-vs-boosting-difference/
3. https://hudsonthames.org/bagging-in-financial-machine-learning-sequential-bootstrapping-python/
4. https://www.sciencedirect.com/topics/engineering/adaboost
5. https://livebook.manning.com/book/grokking-machine-learning/chapter-10/v-9/45
6. http://www.plusxp.com/2011/02/back-to-the-future-the-game-episode-1-review/