How to build a deep learning server based on Docker

--

The fastest way to start with deep learning is a cloud service, like AWS. It does not require any investment in hardware, but costs can quickly stack up at the current price of $0.90 cents per hour for Tesla K80 GPU or $3.06 per hour for Tesla V100 GPU. It is a good solution for scaling up your software (e.g. when you have established robust process chain and just want to run it on more data / many GPUs), but not really a suitable one for education / experiments on one GPU. Below is a simple comparison of single-GPU performance between AWS GPU instances and Nvidia GTX 1080 Ti (using MNIST example from tensorflow).

Performance on MNIST (from tensorflow examples)

Another argument for using a cloud could be ease of remote access and no burden with machine configuration (you can just grab a suitable image available on the cloud). Both can be quite easily achieved on a private server as well.

I will just glimpse through my hardware setup and focus on setting up the software. For more general overview of the hardware subject these are the good starting points:

Deep Learning PC Build
Building your own deep learning box

Hardware

I decided to go with 1080 Ti and focus on a possibly small form factor and quiet operation. I ended up with this part list.

The only tool needed is a long Philips screwdriver. First step is to mount CPU cooler to the motherboard and fix motherboard in the case. If you care about quiet operation, change stock Intel cooler for something better.

After motherboard is fixed, it’s time to connect all the cables. The only tricky part was connecting case LEDs, since orientation has to be right. For this board it was easiest to connect RAM at the end.

Once everything except GPU is in place, power on the computer and change “Initial Display Output” or similar in BIOS settings to integrated graphics card on the motherboard (“IGFX” option for this board). Otherwise there will be no way to control BIOS once GPU is installed (at least on this setup).

Finally it’s time to connect GPU, do not forget to fix it to the case and connect power supply. Since it is a small case, compatible GPU version in terms of size is needed, Founders Edition fits.

The last step is to put a power supply inside and fix hard drive tray. The box can easily fit inside a hand luggage.

Software

Quick setup

My setup is based on docker, so all that is necessary to get the server up and running is to install nvidia drivers, docker and nvidia-docker2 on a clean Ubuntu Server 16.04. You can clone this repository and execute build-1-nvidia-driver.sh and build-2-nvidia-docker-v2.sh. More details below.

Introduction

The primary objective in setting up the software was straightforward configuration and easy maintenance. Tensorflow was a requirement. Nvidia 1080 Ti proved to be especially difficult to set up at the time of writing, as it is not compatible with drivers provided in a CUDA8.0 package (contrary to the older cards, like 1080). It works fine with CUDA9.0, but tensorflow does not support it yet (this will change with version 1.5). There seemed to be three options:

  • installation of CUDA8.0 via runscript, Nvidia drivers from package manager, cuDNN6.0 from Nvidia website and tensorflow 1.3 via pip (see this blog for details how to do it),
  • installation of CUDA9.0 from package manager, cuDNN7.0 from Nvidia website and compilation of tensorflow from source,
  • installation of Anaconda / Miniconda and installation of CUDA, cuDNN and tensorflow using conda package manager,
  • installation of Nvidia drivers, docker and nvidia-docker2 from package manager, and using a docker image with preinstalled CUDA, cuDNN and tensorflow (or any other library).

Note that last two solution allow to completely skip installation of CUDA / cuDNN manually from Nvidia website! They also allow to use multiple versions of these packages on the same machine.

I decided to rely on the docker solution, as it seems to be the cleanest way. It is also future-proof, currently I’m using tensorflow 1.3 with CUDA8.0 and cuDNN6.0, but as soon as version 1.5 with CUDA9.0 and cuDNN7.0 support is published, all it would take to switch is update of Docker image.

Side section: Anaconda

I will focus on setting up a Docker, but Anaconda deserves at least this short section. I would recommend it for beginners.

Anaconda started as a python distribution, but includes now many more non-python packages. It comes with a conda package manager and allows to set up separate environments (note that these environments are created simply by extending PATH variable, so they are not isolated from the rest of the system).

First step is to install Anaconda or Miniconda (remember to set up PATH environment variable to include conda directory, installer will offer to do it). Running:

conda create -n my_env tensorflow-gpu

will install GPU version of tensorflow, together with corresponding CUDA and cuDNN packages. Now the new environment can be activated:

source activate my_env

and GPU version of tensorflow should be good to go. It is a very easy solution, although I find using conda sometimes unstable (newest versions of packages are pushed very quickly to the conda repository, which allows bugs to slip through from time to time). Also be careful not to accidentally install non-GPU version (“tensorflow”) on top of “tensorflow-gpu”, it results in non-GPU version being the default.

Containers are isolated, lightweight environments where applications can be launched. Contrary to virtual machines, they do not require their own operating system, resulting in fewer resources needed and faster boot times. Images are files which define the container, generated using Dockerfiles, and can be uploaded to the online registry like Docker Hub.

Docker containers are designed to be software and hardware independent, which allows the same application to run across different hardware setups and operating systems. This would not work for applications depending on Nvidia GPU acceleration, as it clearly needs specialized hardware. Nvidia-docker is an extension of Docker which allows GPU-accelerated applications to run across machines equipped with Nvidia GPU (e.g. home desktop with GeForce GPU and AWS server with Tesla GPU).

Docker tutorial is a good starting point for learning about containerization. Overview of nvidia-docker can be found here.

Recently-launched Nvidia GPU cloud relies on Docker containers.

Operating system

I chose Ubuntu 16.04. Another interesting option is CentOS7. Ubuntu has a larger user base, on the other hand development of CentOS is more stable, which makes it generally more compatible. If you plan to turn your box into Docker Cloud node, the latter system may be a better option as Docker Cloud does not support Ubuntu 16.04 yet.

Sever version of Ubuntu would be generally a better choice, as it comes with minimal system. Following instructions assume clean server version to be installed, although there is good chance that everything will work on a desktop version. It is easy to install some desktop environment on a server version if needed (or even full unity desktop, although it is not recommended), more details below.

If using Windows on the same machine is required, it should be installed first, and then free space can be created on a hard drive by shrinking Windows partition in a Disk Management tool. For Windows and Ubuntu to coexist correctly, Ubuntu has to be installed in UEFI mode (should be possible to set Ubuntu USB stick to boot in UEFI mode in BIOS settings, installation screen should look like the one below).

Nvidia driver

There is an option to install drivers from ppa repository, although it requires specifying driver version. Installing directly from nvidia repository seems to be a cleaner solution, as a generic “cuda-drivers” package is installed, which I assume will be kept up to date with current driver version.

Nvidia docker

Current version of Docker community edition has to be installed. For nvidia-docker there are two options: version 1 which is currently supported and version 2, in alpha state. I’m using version 2, although I managed to deploy to AWS only with version 1.

To install docker, nvidia-docker, and optionally docker-machine for deploying to the cloud follow instructions on linked websites or use the scripts available on GitHub.

Minimal desktop environment

After completing previous steps and running “nvidia-smi” command, state of the GPU should appear, together with the current “performance level”, ranging from P8 (lowest) to P0 (highest). Lowest performance level consumes 10–15W when idle, while the highest around 50–60W. For my setup it was stuck at the highest level, and I found out that launching X server on a GPU solves it (this issue does not seem to affect server-grade cards like Tesla K80, obviously). The easiest way was just to install some minimal desktop environment, shut down, connect monitor to HDMI port on a GPU and launch again. It will result in “/usr/lib/xorg/Xorg” process running on a GPU, and driver functioning properly, reducing performance level to P8 when idle. Also nvidia-settings becomes accessible. It is likely possible to “fake” a monitor by adjusting “/etc/X11/xorg.conf”, but a pragmatic solution is just to keep the monitor connected, even with disconnected power supply.

This answer on askubuntu gives good overview of possible desktop environments for a server. This one worked for me:

sudo apt install --no-install-recommends xubuntu-core

My desktop environment setup

It is a matter of personal preference, but for using a server to do basic daily tasks I could not recommend i3 window manager more. It takes some time to get used to it, but it is a great productivity booster. This is a good starting point. Installing xubuntu-core prior to i3 was still necessary to get Nvidia drivers working properly.

It is a good idea to create a script which configures a desktop environment, in case there is a need to reinstall it somewhere else (I keep mine here). My configuration is based on zsh shell with oh-my-zsh and pure prompt for integration with git and Google Chrome to get basic apps (calendar, keep, inbox, messengers). It also installs Slack and Spotify. As a final touch it displays current Nvidia GPU temperature / memory available on a status bar (together with the values for CPU / RAM).

Setting up remote access

To set up outgoing ssh connections it is necessary to generate key pair (it is recommended to secure private key with a passphrase):

ssh-keygen

It is also useful to install “keychain” tool, with which it would be necessary to enter passphrase only once in the session.

One way of accessing a server remotely which should always be available is ssh. Installing ssh server is trivial:

sudo apt install openssh-server
sudo systemctl restart ssh

From a security standpoint it is recommended to disable password authentication in /etc/ssh/sshd_config, and paste public ssh keys of authorized machines in ~/.ssh/authorized_keys.

This bash script generates key pair, sets up keychain, installs ssh server and disables password authentication.

It is useful to have a router with build-in VPN, otherwise it might be necessary to open ports on the router for remote access (which is not recommended, if there is no other way use non-standard ports).

For console-only access to the server byobu, screen and tmux provide a lot of functionality. These tools are also useful for working within containers!

Another options is to set up a server as a Docker Cloud node, although Ubuntu 16.04 is not yet officially supported.

Using Docker: deep learning example

If you are new to Docker, start here and here (note that in the example below nvidia-docker2 is used).

Running a command:

docker run --runtime=nvidia --rm -it \
-p 8888:8888 tensorflow/tensorflow:latest-gpu-py3

will pull latest tensorflow with GPU support and launch jupyter notebook, which can be accessed with a link provided in the console. Note that it was not necessary to install CUDA or cuDNN (these are provided by the image), the only software required was Nvidia driver, Docker and Nvidia-docker2.

Nvidia maintains Docker repository with extensive list of CUDA/cuDNN combinations to choose from.

Docker images are created using Dockerfile, which is just a set of instructions usually extending already existing image. Minimal example of a Dockerfile could be:

FROM tensorflow/tensorflow:latest-gpu-py3COPY mnist_deep.py /CMD ["python", "/mnist_deep.py"]

which is built starting from tensorflow image, copies mnist_deep.py example from tensorflow and launches it at startup (it was used to generate a plot at the beginning of this article, mnist_deep.py was just slightly modified to print out training time). It is useful to keep Dockerfiles together with files used to create the images on GitHub, and link it to Docker Hub account to enable automated builds.

To access jupyter notebook from tensorflow/tensorflow:latest-gpu-py3 remotely, access a link provided by:

ssh -t -L 8888:localhost:8888 \
user_name@server_ip_address \
docker run --runtime=nvidia --rm -it -p 8888:8888 \
tensorflow/tensorflow:latest-gpu-py3

Making Docker a friendly environment

Using Docker for everyday tasks can be burdensome, especially if using only default images provided by specific applications. It is convenient to use one docker image with all the tools needed. Popular choice is dl-docker, although it is not updated often to the newest versions of the tools. My custom docker image can be found here.

To make working with Docker even easier, long commands to launch a container can be aliased. I’m using tsh - a wrapper around docker image with recent deep learning tools, which also mounts current directory and X11 socket inside a container, as well as provides zsh shell and modern prompt. As a result, using docker becomes no more difficult than just launching conda environment.

tsh wrapper

Hope you will find it useful!

--

--