Ludwig machine learing model in Kaggle

Ludwig machine learing model in Kaggle

There is something beautifully simple about using Ludwig in Kaggle. It is almost like calling a “make_me_a_model(data)” function.

Almost. I decided to give it a try. It was the first time I was using Ludwig, so I decided to make it a little bit hardcore. It is advertised as a “magical” tool that trains deep learning models without the need to write code. Let’s use it like this. I am going to provide a minimal configuration and see what happens.

First of all, we must install the Ludwig library. It turns out, that the version available in pip repository was throwing errors while running in a Kaggle kernel, so I installed the most recent code from the repository.

1
!pip install https://github.com/uber/ludwig/archive/master.zip

After that, it was easy. I loaded the training dataset, defined the features, and ran Ludwig. Note that I used the “PUBG Finish Placement Prediction” dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd

data = pd.read_csv('../input/train_V2.csv')

model_definition = {
    'input_features': [
        {'name': 'assists', 'type': 'numerical'},
        {'name': 'boosts', 'type': 'numerical'},
        ...
        {'name': 'matchType', 'type': 'category'}
    ],
    'output_features': [{'name': 'winPlacePerc', 'type': 'numerical'}]
}

from ludwig import LudwigModel

model = LudwigModel(model_definition)
model.train(data)

After that, I loaded the test dataset, ran the “predict” function and saved the results in an output file.

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd
data = pd.read_csv('../input/test_V2.csv')
ids = data['Id']

predictions = model.predict(data)
model.close()

output = pd.concat([
    pd.DataFrame(ids, columns = ['Id']),
    pd.DataFrame(predictions, columns = ['winPlacePerc'])
], axis = 1)

output.to_csv('submission.csv', index=False)

It is still running, so I have no idea what the score is. Maybe it is going to run out of memory or the available processing time. I don’t know yet.

If it works well without any human-driven preprocessing, I will start to worry a little bit. I think it is going to fail miserably.

Do you want to show your product/service to 25000 data science enthusiasts every month? I am looking for companies which would like to become a partner of this blog.

Are you interested? Is your employer interested? Here are the details of the offer.

How such tools are going to change data science and machine learning engineering? I think that the only change that we are going to see is less time spent on writing boring code. After all, implementing a preprocessing pipeline is fun the first time you are doing it. Maybe the second or even the fifth time it is fun too. At some point it gets boring.

Perhaps we should start reading more business books because it looks that simple machine learning can be successfully automated. What is left for us? In my opinion, now we can focus on the creative part of data science like finding new data, adding features to the dataset, looking for new business problems, and applying ML to solve them.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


If this article was helpful, consider donating to WWF or any other charity of your choice.
Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group