Deploying your first ML model in production

What do you need to deploy your first machine learning model in production? It can’t be easy, can it? After all, tons of MLOps tools exist. You can’t possibly get your first model running in production without them. Can you?

If that’s really your first model in production, don’t look for the MLOps tools yet. Let’s focus on deploying the model and getting the business results. You probably postponed the release a few times already. Most likely, you promised way better results than the ones you got. We all did it ;) Let’s not delay it any further.

Things you don’t need yet

Let’s be pragmatic here. We want the model in production as fast as possible. Overengineering is fun, but right now, we need results. Fast.

We will install the missing tools later. Later, when you need them, after the second or third deployment. A lack of tooling will hurt you at that time, so you will feel the need to have them anyway.

You don’t need a fully automated training pipeline for your first model. It is okay if a data scientist trained the model on their laptop, uploaded it to S3, and decided it is good enough to deploy it.

The data scientists probably don’t remember the exact SQL query they used to get the training data anyway. Because of that, you can skip data lineage and feature stores too. We will need it soon, but don’t postpone the first deployment for such a reason. Although, you may create a Jira ticket as a reminder.

The data scientists, most likely, didn’t track the training parameters and the evaluation metrics during the training either. We will work on experiment tracking soon, but we don’t need it before the first deployment. Don’t worry about it yet.

Things you actually need

You need to deploy the model somewhere. Will you deploy it as a module in an existing backend application, or will you deploy it as a separate service? If you want a separate service, I suggest using BentoML to generate a Docker container with your model. It will also contain the necessary data preprocessing code.

Remember to get the preprocessing code from the data scientist! Just don’t expect you will get a single function. Probably, they will send you the entire Jupiter notebook, and you will have to search for the relevant parts.

When you have your Docker container with the model and the custom code, it is time to deploy it somewhere. I recommend services like Amazon Sagemaker Endpoints. They take care of logging, runtime metrics, and instance scaling. You have to pay extra, but I think it’s worth it.

What’s next? Now, you have to decide whether you will immediately start using the model in production. You don’t have to do it. In fact, it is wise to run a shadow deployment first.

In the shadow deployment phase, you will send real data to the model, but you won’t use the predictions for anything. The results get logged, and you can review them to check whether the model works as expected.

The next phase may be a canary release. It is a release strategy resembling A/B testing. You will use the model to handle some percentage of the real traffic, but the rest will use the previous implementation. It may be tricky when you have only one model. What will happen with the traffic that the model doesn’t handle? Do you ignore it? Return a constant value?

You can find more information about shadow deployment and canary releases in my other article.

What to do next

I have just told you to cut corners, so there will be some work before working on the next model. First, I recommend implementing a proper, deterministic training pipeline. It would be good to have a button you can click to get a model trained automatically using predefined data.

Later, you will parameterize the input data. When the input data starts changing, you will have to track it. You can start thinking about a feature store and data lineage when that happens.

The next thing is deployment automation. If you can click another button and get the trained model deployed in production, it’s even better than having a training pipeline. It is better because you can quickly revert to the previous version in case of a failure.

Do you want to create a perfect MLOps setup? You already have a Docker image with the model. Why don’t you start the Docker container during the deployment and run test scripts to check whether you get correct predictions?

In the meantime, the data scientist starts working on the next model. They may already have a mess in the Excel files or whatever they use to track the experiments. At this point, you may introduce a tool for experiment tracking. It is easier to convince people to use it when they already feel the pain of not having it.

Start small. You don’t need everything at once. After all, your first model may be a flop, and the business may decide to give up on machine learning. (Imagine that…)

Older post

Is it overengineered?

What's the difference between reasonable future-proof architecture and overengineering? Is there a difference?

Newer post

What is the essential KPI of an MLOps team?

What KPI to measure in an MLOps team