You're a data scientist.
You've built an elaborate ensamble model that consists of various machine learning algorithms using scipy, scikit-learn, xgboost.
Now you need to make it available to others for testing.
A simple RESTful API seems to be a good way to expose the functionality. Flask is a popular and lightweight web framework so you decide to go with that. So you make a simple web app with a couple of endpoints and it works on your machine.
But how do you deploy it?
Development
We are going to need this system level package.
sudo apt install build-essential -y
Ever since you made a mess using sudo pip install
that caused you to reinstall your Ubuntu,
you're using Anaconda because it can install packages in your home directory and
everything always seems to work.
Here's how to download & install Anaconda into your home folder.
wget https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh
bash Anaconda2-4.1.1-Linux-x86_64.sh -b -p $HOME/anaconda2
rm Anaconda*
echo 'export PATH="$PATH:$HOME/anaconda2/bin"' >> .bashrc
bash
conda --version
Next, you create a new environment for your project and install all dependencies.
conda create -n app python -y
source activate app
conda install scipy numpy scikit-learn flask -y
pip install xgboost uwsgi
Create your web app that loads your model and does the work.
In this tutorial, I'm just returning Hello world!
.
mkdir app/
cd app/
cat > app.py <<EOF
import scipy, numpy, sklearn, xgboost
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello world!\n'
if __name__ == '__main__':
app.run()
EOF
Make sure that it runs
$ python app.py
$ curl localhost:5000
Hello world!
If you read the docs, you'll see a bolded sentence that the Flask's built-in server is not suitable for production.
So let's use uWSGI instead.
$ uwsgi --http localhost:8080 --wsgi-file app.py --callable app
$ curl localhost:8080
Hello world!
Great. But we can do better:
cat > uwsgi.ini << EOF
[uwsgi]
http = localhost:8080
wsgi-file = app.py
callable = app
EOF
Instead of typing all command line arguments or copy&pasting them all the time, we store then in a configuration file.
There are a lot of things you can tune with this file (the number of processes running at the same time, running user and group IDs, etc.) that I am not going to go into here.
$ uwsgi --ini uwsgi.ini
$ curl localhost:8080
Hello world!
Sweet!
The final step is to save a list of all the packages and their versions that you are using so that we can replicate this environment later.
conda env export > environment.yml
It'll look something like
name: app
dependencies:
- click=6.6=py27_0
- flask=0.11.1=py27_0
- itsdangerous=0.24=py27_0
- jinja2=2.8=py27_1
- libgfortran=3.0.0=1
- markupsafe=0.23=py27_2
- mkl=11.3.3=0
- numpy=1.11.1=py27_0
- openssl=1.0.2j=0
- pip=8.1.2=py27_0
- python=2.7.12=1
- readline=6.2=2
- scikit-learn=0.17.1=np111py27_2
- scipy=0.18.1=np111py27_0
- setuptools=27.2.0=py27_0
- sqlite=3.13.0=0
- tk=8.5.18=0
- werkzeug=0.11.11=py27_0
- wheel=0.29.0=py27_0
- zlib=1.2.8=3
- pip:
- uwsgi==2.0.13.1
- xgboost==0.6a2
prefix: /home/ubuntu/anaconda2/envs/app
You can exit the environment with
source deactivate
Deployment
As before, we are going to need this system level package.
Or packages installed by pip
may not be built properly.
sudo apt install build-essential -y
Since we're deploying on a potentially shared server,
we're not going to use our $HOME
directory.
Let's use /opt
instead.
For convenience, let's give ourselves write permissions to /opt
.
sudo mkdir -p /opt
sudo chown $USER /opt
Download & install Anaconda into /opt/anaconda
.
wget https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh
bash Anaconda2-4.1.1-Linux-x86_64.sh -b -p /opt/anaconda
rm Anaconda*
For convenience, add Anaconda to $PATH
for this terminal session.
export PATH=/opt/anaconda/bin:$PATH
conda --version
We're going to store our web app code and models in /opt/apps/app
.
mkdir -p /opt/apps/app
Recreate the environment for our app. Then test that it works.
conda env create -f /opt/apps/app/environment.yml -n app
source activate app
cd /opt/apps/app
uwsgi --ini uwsgi.ini
curl localhost:8080
source deactivate
We want to make sure that our app is started at boot if the server gets rebooted for any reason.
sudo tee /etc/systemd/system/app.service << EOF
[Unit]
Description=uWSGI instance to serve app
[Service]
ExecStart=/bin/bash -c 'cd /opt/apps/app && source /opt/anaconda/bin/activate app && uwsgi --ini uwsgi.ini'
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl start app.service
As always, make sure that it's working:
$ curl localhost:8080
Hello world!
One last optional step is to add a reverse web proxy server in front of your app. It can be useful if you don't want to use port 80 and not run as root or if you are going to host several apps on your server under the same domain.
export DOMAIN=domain.com
Install & configure nginx, a proper web server.
sudo apt install -y nginx
sudo tee /etc/nginx/sites-available/$DOMAIN <<EOF
server {
listen 80;
server_name $DOMAIN www.$DOMAIN;
location / {
proxy_pass http://localhost:8080;
}
}
EOF
sudo ln -s /etc/nginx/sites-available/$DOMAIN /etc/nginx/sites-enabled/$DOMAIN
sudo systemctl restart nginx
Test that it works:
$ curl $DOMAIN
Hello world!
Voila!
Docker
Did that seem messy or a lot of work?
There is another way.
You could set everything up in a virtual machine and give it to someone to deploy. But virtual machines have a lot of overhead (large image sizes as well as CPU/memory overhead).
Or you could use containers which are isolated like virtual machines but virtually no overhead.
Install Docker
sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install docker-engine
docker --version
There are two ways to do this. One is to create a script that will build a Docker image that has everything installed. The other way is to work inside the container and then just ship it for deployment.
We're going to take a look at the second option.
Create a new Docker container based on Ubuntu 16.04 and go into it.
apt update
apt install python python-pip -y
pip install scipy numpy scikit-learn flask xgboost uwsgi
You could still install Anaconda inside the container, this time I didn't and just installed pip packages globally.
To be continued...