How to install scikit-automl in a Kaggle notebook

I wanted to give scikit-automl a try, but I don’t like to install many packages on my machine. Conda solves the problem of creating a mess, although it does not deal with the issue of running unknown code on my computer. Hence, I decided to try it on Kaggle.

I followed the installation instructions and did not expect any issues. After all, the instruction consists of only two steps:

!curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
!pip install auto-sklearn

What can go wrong?

Swig. Swig can go wrong. I was installing the packages, and at some point, I saw this error message. Pip was installing the pyrfr package but failed because of a problem with swig.

building 'pyrfr._regression' extension
    swigging pyrfr/regression.i to pyrfr/regression_wrap.cpp
    swig -python -c++ -modern -features nondynamic -I./include -o pyrfr/regression_wrap.cpp pyrfr/regression.i
    unable to execute 'swig': No such file or directory
    error: command 'swig' failed with exit status 1

It turns out that Kaggle has some old swig version installed on their machines, but we can quickly fix it:

!apt-get remove swig
!apt-get install swig3.0
!ln -s /usr/bin/swig3.0 /usr/bin/swig

After that change, I ran the installation script, and everything worked fine.

Older post

Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model

How to estimate the CLV from a list of customer transactions using the lifetimes library in Python

Newer post

Preprocessing the input Pandas DataFrame using ColumnTransformer in Scikit-learn

How to encode text/categorical variables and scale numerical values using only one Scikit-learn class