Bagging meta-estimator

Bagging meta-estimator is an ensembling algorithm that can be used for both classification (BaggingClassifier) and regression (BaggingRegressor) problems. It follows the typical bagging technique to make predictions. Following are the steps for the bagging meta-estimator algorithm:

  1. Random subsets are created from the original dataset (Bootstrapping).

  2. The subset of the dataset includes all features.

  3. A user-specified base estimator is fitted on each of these smaller sets.

  4. Predictions from each model are combined to get the final result.

Code:

from sklearn.ensemble import BaggingClassifier
from sklearn import tree
model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
model.fit(x_train, y_train)
model.score(x_test,y_test)
0.75135135135135134

Sample code for regression problem:

from sklearn.ensemble import BaggingRegressor
model = BaggingRegressor(tree.DecisionTreeRegressor(random_state=1))
model.fit(x_train, y_train)
model.score(x_test,y_test)

Parameters used in the algorithms:

  • base_estimator:

    • It defines the base estimator to fit on random subsets of the dataset.

    • When nothing is specified, the base estimator is a decision tree.

  • n_estimators:

    • It is the number of base estimators to be created.

    • The number of estimators should be carefully tuned as a large number would take a very long time to run, while a very small number might not provide the best results.

  • max_samples:

    • This parameter controls the size of the subsets.

    • It is the maximum number of samples to train each base estimator.

  • max_features:

    • Controls the number of features to draw from the whole dataset.

    • It defines the maximum number of features required to train each base estimator.

  • n_jobs:

    • The number of jobs to run in parallel.

    • Set this value equal to the cores in your system.

    • If -1, the number of jobs is set to the number of cores.

  • random_state:

    • It specifies the method of random split. When random state value is same for two models, the random selection is same for both models.

    • This parameter is useful when you want to compare different models.

Last updated