Ludwig machine learing model in Kaggle

Ludwig machine learing model in Kaggle

There is something beautifully simple about using Ludwig in Kaggle. It is almost like calling a “make_me_a_model(data)” function.

Almost. I decided to give it a try. It was the first time I was using Ludwig, so I decided to make it a little bit hardcore. It is advertised as a “magical” tool that trains deep learning models without the need to write code. Let’s use it like this. I am going to provide a minimal configuration and see what happens.

First of all, we must install the Ludwig library. It turns out, that the version available in pip repository was throwing errors while running in a Kaggle kernel, so I installed the most recent code from the repository.

1
!pip install https://github.com/uber/ludwig/archive/master.zip

After that, it was easy. I loaded the training dataset, defined the features, and ran Ludwig. Note that I used the “PUBG Finish Placement Prediction” dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import pandas as pd

data = pd.read_csv('../input/train_V2.csv')

model_definition = {
    'input_features': [
        {'name': 'assists', 'type': 'numerical'},
        {'name': 'boosts', 'type': 'numerical'},
        ...
        {'name': 'matchType', 'type': 'category'}
    ],
    'output_features': [{'name': 'winPlacePerc', 'type': 'numerical'}]
}

from ludwig import LudwigModel

model = LudwigModel(model_definition)
model.train(data)

After that, I loaded the test dataset, ran the “predict” function and saved the results in an output file.

1
2
3
4
5
6
7
8
9
10
11
12
13
import pandas as pd
data = pd.read_csv('../input/test_V2.csv')
ids = data['Id']

predictions = model.predict(data)
model.close()

output = pd.concat([
    pd.DataFrame(ids, columns = ['Id']),
    pd.DataFrame(predictions, columns = ['winPlacePerc'])
], axis = 1)

output.to_csv('submission.csv', index=False)

It is still running, so I have no idea what the score is. Maybe it is going to run out of memory or the available processing time. I don’t know yet.

If it works well without any human-driven preprocessing, I will start to worry a little bit. I think it is going to fail miserably.

How such tools are going to change data science and machine learning engineering? I think that the only change that we are going to see is less time spent on writing boring code. After all, implementing a preprocessing pipeline is fun the first time you are doing it. Maybe the second or even the fifth time it is fun too. At some point it gets boring.

Perhaps we should start reading more business books because it looks that simple machine learning can be successfully automated. What is left for us? In my opinion, now we can focus on the creative part of data science like finding new data, adding features to the dataset, looking for new business problems, and applying ML to solve them.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

For business inquiries, send me a message on LinkedIn or Twitter.


Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group