After explaining the Ensemble Method and its underlying Bagging Vs Boosting methods, it is time to explain( with Question/Answer) the differences between them. This article is the sixth and the last part of the series(Bagging & Boosting Ensemble Methods and What is the Difference Between Them?).
In this article, I will try to discuss what Random Forests, GBM, XGBoost, LightGBM, and CatBoost, etc. have to do with these approaches.
Here we will ask a general question and try to answer it.
In tree-based methods, there are the expressions “Bagging” and “Boosting”. What do they mean and what is the difference between them?
Two methods are technically over-learning and emerging approaches to increase model performance.
Bagging is short for “bootstrap aggregation”. What does this mean? It means bringing together trees obtained by resampling method. In this respect, it brought a completely different perspective to single tree structures and moved the machine learning world forward. In this method, new trees are created by taking samples from the sample data set to replace them, and a community emerges from these trees. So the tree community. For this reason, it is referred to as community learning methods. All of the resulting trees are used for modeling, and each is asked about their ideas. The opinion of each is evaluated and a voting / average opinion is obtained and the result of the estimation process is presented as a single statement. Bagging should be seen as a methodology. For example, when the CART algorithm is connected, a bagged tree appears. When the CART algorithm is linked on the basis of both observation and variable, Random Forests emerges.
Another expression of the result of this effort: randomness on the basis of observation and variable is achieved, the problem of over-learning is eased, and the prediction success increases.
Boosting methods are expressed in a single sentence: now tries to increase performance based on optimization. So models are adaptive. There is a cumulative error evaluation. Boosting should also be seen as a methodology. AdaBoost derives its Gradient Boosting Machines, XGboost features from this “boost” approach. At its core is the idea of putting weak classifiers together. The algorithmic equivalent of this idea is to create a series of models in the form of a single predictive model. A model in the series is created by building (fit) over the prediction residuals or residuals of the previous model in the series. Here, the concept of adaptive learning will be better understood. Trees are dependent, not independent.
This is the key difference between bagging and boosting. Whether the trees to be created have dependencies with the previous trees and whether the error optimization is cumulative over the residuals.
As a result, the algorithms where the ideas of working with more than one tree, using randomness in observation and variable selection, creating independent trees or dependent trees reach the top by blending them are algorithms such as GBM, XGboost and LightGBM. The most fundamental very subtle issue is to try to provide RANDOMITY.
This is all I have written about the “Bagging & Boosting Ensemble Methods and What is the Difference Between Them?”. If you want to know more about DS and related others, you can check out my other serial articles. Sample:
Roadmap to Become a “Data Scientist”
You can reach me from my Linkedin account for all your questions and requests.
Hope to meet you in other series articles and articles…🖖🏼