# XGBoost hyperparameter tuning in Python using grid search

Fortunately, XGBoost implements the scikit-learn API, so tuning its hyperparameters is very easy.

I assume that you have already preprocessed the dataset and split it into training, test dataset, so I will focus only on the tuning part.

First, we have to import XGBoost classifier and GridSearchCV from scikit-learn.

1
2
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV


After that, we have to specify the constant parameters of the classifier. We need the objective. In this case, I use the “binary:logistic” function because I train a classifier which handles only two classes. Additionally, I specify the number of threads to speed up the training, and the seed for a random number generator, to get the same results in every run.

1
2
3
4
5
estimator = XGBClassifier(
objective= 'binary:logistic',
seed=42
)


In the next step, I have to specify the tunable parameters and the range of values.

1
2
3
4
5
parameters = {
'max_depth': range (2, 10, 1),
'n_estimators': range(60, 220, 40),
'learning_rate': [0.1, 0.01, 0.05]
}


In the last setup step, I configure the GridSearchCV object. I choose the best hyperparameters using the ROC AUC metric to compare the results of 10-fold cross-validation.

1
2
3
4
5
6
7
8
grid_search = GridSearchCV(
estimator=estimator,
param_grid=parameters,
scoring = 'roc_auc',
n_jobs = 10,
cv = 10,
verbose=True
)


Now, we can do the training.

1
grid_search.fit(X, Y)


Here are the results:

1
2
3
4
5
6
7
Fitting 10 folds for each of 96 candidates, totalling 960 fits
[Parallel(n_jobs=10)]: Using backend LokyBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   11.0s
[Parallel(n_jobs=10)]: Done 180 tasks      | elapsed:   40.1s
[Parallel(n_jobs=10)]: Done 430 tasks      | elapsed:  1.7min
[Parallel(n_jobs=10)]: Done 780 tasks      | elapsed:  3.1min
[Parallel(n_jobs=10)]: Done 960 out of 960 | elapsed:  4.0min finished


The best_estimator_ field contains the best model trained by GridSearch.

1
grid_search.best_estimator_


Remember to share on social media!