# Using a surrogate model to interpret a machine learning model

In my opinion, training a surrogate model is the easiest method of interpreting the behavior of an existing machine learning model.

To apply this method, we are going to need:

• an existing machine learning model
• input data that can be processed by the existing model (for example the test dataset used for training the model or a sample of real-world data from the production environment)

We don’t need to know anything about the existing model. It is just a black box. It has an input, and when we pass the data, we get an output. That is all we need.

In the first step, I am going to pass the data into the black box model and get the prediction.

1
2
3
4
5
6
=======================
=                     =
data =>   =     black box       =  => prediction
=       model         =
=======================



Now, I have to decide what kind of model I want to train as the surrogate model. It should be a model that I know how to interpret and explain to people who have no machine learning knowledge, for example, linear regression or decision trees.

I am going to train the surrogate model, using the independent variables from input data and the prediction from the black box as the dependent variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
independent variables       prediction
from input dataset          from black box model
||              ||
||              ||
\/              \/
=======================
=                     =
=     surrogate       =
=       model         =
=======================
||
\/
surrogate's
prediction


After that, I can calculate the prediction error of the surrogate model and compare it with the predictions of the black box. The smaller the error I get, the better the surrogate model explains the black box.

When I get a surrogate model which has an acceptable prediction error, I can look at its parameters to understand which features are important and how the black box model works.

