This series(“Bagging & Boosting Ensemble Methods and What is the Difference Between Them?”) consists of 6 separate articles and is the third article in this series. In this part, we will talk about “What is the “Bagging” Ensemble Method”.
In fact, we can see that the bagging method, as we have selected above, gives us visually what it tries to explain. But let’s try to explain the idea behind it.
Behind idea in bagging is to combine the results of multiple models (eg all decision trees) to produce a more general result.
The ones that come to mind are:
- Would it be useful if you create and merge all models on the same dataset?
- Since they take the same input, these models have a high chance of producing the same result.
- How can solve this problem?
We can call this method a sampling technique that we create by changing subsets of observations from the original data set. With subsets, it is the same size as the original set.
The Bootstrapping and Bagging technique uses these subsets to get a fair idea of the whole set. The size of the subsets generated for bagging may be less than the original set.
- Multiple subsets(samples) are created from the original dataset(super population), selecting observations with replacement.
- A base model (weak model) is created on each of these subsets.
- The models run in parallel and are independent of each other.
- The final predictions are determined by combining the predictions from all the models.
What are the pros and cons shortly:
If you want to see an example of how the bagging algorithm works, you can look at the algorithms of Random Forest and other models in my notebooks on my Kaggle page.
Let me put a stop to our topic here and say we will see you in our next topic, What Is the “Boosting” Ensemble Method?.