The $1700 great Deep Learning box: Assembly, setup and benchmarks
The $1700 great Deep Learning box: Assembly, setup and benchmarks
qiuliangliang 发表于9个月前
The $1700 great Deep Learning box: Assembly, setup and benchmarks
  • 发表于 9个月前
  • 阅读 4
  • 收藏 0
  • 点赞 0
  • 评论 0


The most important reason was saving time while prototyping models — if they trained faster, the feedback time would be shorter. Thus it would be easier on my brain to connect the dots between the assumptions I had for the model and its results.

Then I wanted to save money — I was using Amazon Web Services (AWS), which offered P2 instances with Nvidia K80 GPUs. Lately the AWS bills were around $60–70/month with a tendency to get larger. Also it is expensive to store large datasets, like ImageNet.

And lastly I haven’t had a desktop for over 10 years, and wanted to see what has changed in the meantime (spoiler alert: mostly nothing).

What follows are my choices, inner monologue and gotchas: from choosing the components to benchmarking.

Table of contents

1. Choosing components
2. Putting it together
3. Software Setup
4. Benchmarks

Choosing the components

A sensible budget for me would be about 2 years worth of my current compute spending. At $70/month for AWS, this put it at around $1700 for the whole thing.

You can checkout all the components used. The PC Part Picker site is also really helpful in detecting if some of the components don’t play well together.


The GPU is the most crucial component in the box. It will train these deep networks fast, shortening the feedback cycle.

The GPU is important is because: a) most calculations in DL are matrix operations, like matrix multiplication. They can be slow if done on the CPU. b) As we are doing thousands of these operations in a typical neural network, the slowness really adds up (as we will see on the benchmarks later). On the other hand, GPUs, rather conveniently, are able to run all these operations in parallel. They have a large number of cores, which can run even larger number of threads. GPUs also have much higher memory bandwidth which enables them to run these parallel operations on a bunch of data at once.

My choice was between a few of Nvidia’s cards: GTX 1070 ($360), GTX 1080 ($500), GTX 1080 Ti ($700) and finally the Titan X ($1320).

On performance side: GTX 1080 Ti and Titan X are similar, roughly speaking the GTX 1080 is about 25% faster than GTX 1070, and GTX 1080 Ti is about 30% faster than GTX 1080.

Tim Dettmers has a great article on picking a GPU for Deep Learning, which he regularly updates as new cards come on the market.

Here are the things to consider when picking a GPU:

  1. Maker: No contest on this one — get Nvidia. They has been focusing on Machine Learning for a number of years now, and it’s paying off. Their CUDA toolkit is entrenched so deeply that it is literally the only choice for the DL practitioner.
  2. Budget: The Titan X got a really bad mark here as it is offering the same performance as the 1080 Ti for about $500 more.
  3. One or multiple: I considered picking a couple of 1070s instead of 1080 or 1080 Ti. This would have allowed me to either train a model on two cards or train two models at once. Currently training a model on multiple cards is a bit of a hassle, though things are changing with PyTorch and Caffe 2 offering almost linear scaling with the number of GPUs. The other option — training two models at once seemed to have more value, but I decided to get a single more powerful card now and add a second one later.
  4. Memory: More memory is better. With more memory, we could deploy bigger models and use sufficiently large batch size during training (which helps the gradient flow).
  5. Memory bandwidth: This enables the GPU to operate on large amounts of memory. Tim Dettmers points out that this is the most important characteristic of a GPU.

Considering all of this, I picked the GTX 1080 Ti, mainly for the training speed boost. I plan to add a second 1080 Ti soonish.


Even thought the GPU is the MVP in deep learning, the CPU still matters. For example, data preparation is usually done on the CPU. The number of cores and threads per core is important if we want to parallelize all that data prep.

To stay on budget, I picked a mid-range CPU, the Intel i5 7500 for about $190. It’s relatively cheap but good enough to not slow things down.

Memory (RAM)

It’s nice to have a lot of memory if we are to be working with rather big datasets. I got 2 sticks of 16 GB, for a total of 32 GB of RAM for $230, and plan to buy another 32 GB later.


Following Jeremy Howard’s advice, I got a fast SSD disk to keep my OS and current data on, and then a slow spinning HDD for those huge datasets (like ImageNet).
SSD: I remember when I got my first Macbook Air years ago, how blown away was I by the SSD speed. To my delight, a new generation of SSD called NVMe has made its way to market in the meantime. A 480 GB MyDigitalSSDNVMe drive for $230 was a great deal. This baby copies files at gigabytes per second. 
HDD: 2 TB for $66. While SSDs have been getting fast, HDD have been getting cheap. To somebody who has used Macbooks with 128 GB disk for the last 7 years, having this much space feels almost obscene.


The one thing that I kept in mind when picking a motherboard was the ability to support two GTX 1080 Ti, both in number of PCI Express Lanes (the minimum is 2x8) and the physical size of 2 cards. Also make sure it’s compatible with the chosen CPU. An Asus TUF Z270 for $130 did it for me.

Power Supply

Rule of thumb: it should provide enough power for the CPU and the GPUs, plus 100 watts extra. 
The Intel i5 7500 processor uses 65W, and the GPUs (1080 Ti) need 250W each, so I got a Deepcool 750W Gold PSU for $75. The “Gold” here refers to the power efficiency, i.e how much of the power consumed is wasted as heat.


The case should be the same form factor as the motherboard. Also having enough LEDs to embarrass a Burner is a bonus.

A friend recommended the Thermaltake N23 case for $50, which I promptly got. No LEDs sadly.

Putting it all together

If you don’t have much experience with hardware and fear you might break something, a professional assembly might be the best option. However, this was a great learning opportunity that I couldn’t pass (even though I’ve had my share of hardware-related horror stories).

The first and important step is to read the installation manuals that came with each component. Especially important for me, as I’ve done this before once or twice, and I have just the right amount of inexperience to mess things up.

Install the CPU on the Motherboard


The CPU in its slot, the lever refusing to go down.

This is done before installing the motherboard in the case. Next to the processor there is a lever that needs to be pulled up. The processor is then placed on the base (double-check the orientation). Finally the lever comes down to fix the CPU in place.


Me being assisted in installing the CPU



But I had a quite the difficulty doing this: once the CPU was in position the lever wouldn’t go down. I actually had a more hardware-capable friend of mine video walk me through the process. Turns out the amount of force required to get the lever locked down was more than what I was comfortable with.


The installed fan

Next is fixing the fan on top of the CPU: the fan legs must be fully secured to the motherboard. Also consider where the fan cable will go before installing. The processor I had came with thermal paste. If yours doesn’t, make sure to put some paste between the CPU and the cooling unit. Also replace the paste if you take off the fan.

Install Power Supply in the Case


Fitting the power cables through the back side.

I put the Power Supply Unit (PSU) in before the motherboard to get the power cables snugly placed in case back side.





Install the Motherboard in the case


Having fun with magnets

Pretty straight forward — carefully place it and screw it in. A magnetic screwdriver was really helpful.

Then connect the power cables and the case buttons and LEDs.



Install the NVMe Disk


Just slide it in the M2 slot and screw it in. Piece of cake.


Install the RAM


The GTX 1080 Ti calmly waiting its turn as I struggle with the RAM in the background.

The memory proved quite hard to install, requiring too much effort to properly lock in. A few times I almost gave up, thinking I must be doing it wrong. Eventually one of the sticks clicked in and the other one promptly followed.

At this point I turned the computer on to make sure it works. To my relief, it started right away!

Install the GPU


The GTX 1080 Ti setting into its new home

Finally, the GPU slid in effortlessly. 14 pins of power later and it was running.

NB: Do not plug your monitor in the external card right away. Most probably it needs drivers to function (see below).

Finally, it’s complete!


Software Setup

Now that we have the hardware in place, only the soft part remains. Out with the screwdriver, in with the keyboard.

Note on dual booting: If you plan to install Windows (because, you know, for benchmarks, totally not for gaming), it would be wise to do Windows first and Linux second. I didn’t and had to reinstall Ubuntu because Windows messed up the boot partition. Livewire has a detailed article on dual boot.

Install Ubuntu

Most DL frameworks are designed to work on Linux first, and eventually support other operating systems. So I went for Ubuntu, my default Linux distribution. An old 2GB USB drive was laying around and worked great for the installation. UNetbootin (OSX) 0r Rufus (Windows) can prepare the Linux thumb drive. The default options worked fine during the Ubuntu install.

At the time of writing, Ubuntu 17.04 was just released, so I opted for previous version (16.04), whose quirks are much better documented online.

Ubuntu Server or Desktop: The Server and Desktop editions of Ubuntu are almost identical, with the notable exception of the visual interface (called X) not being installed with Server. I installed the Desktop and disabled autostarting X, so that the computer would boot it in terminal mode. If needed, one could launch the visual desktop later by typing startx.

Getting up to date

Let’s get our install up to date. From Jeremy Howard’s excellent install-gpu script:

sudo apt-get update
sudo apt-get --assume-yes upgrade
sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common
sudo apt-get --assume-yes install git

The Deep Learning stack

To deep learn on our machine, we need a stack of technologies to use our GPU:

  • GPU driver — A way for the operating system to talk to the graphics card.
  • CUDA — Allows us to run general purpose code on the GPU.
  • CuDNN — Provides deep neural networks routines on top of CUDA.
  • A DL framework — Tensorflow, PyTorch, Theano, etc. They make live easier by abstracting the lower levels of the stack.

Install CUDA

Download CUDA from Nvidia, or just run the code below:

sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

After CUDA has been installed the following code will add the CUDA installation to the PATH variable:

cat >> ~/.tmp << 'EOF'
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
source ~/.bashrc

Now we can verify that CUDA has been installed successfully by running

nvcc --version # Checks CUDA version
nvidia-smi # Info about the detected GPUs

This should have installed the display driver as well. For me, nvidia-smishowed ERR as the device name, so I installed the latest Nvidia drivers (at time of writing) to fix it:

sudo sh
sudo reboot

Removing CUDA/Nvidia drivers

If at any point the drivers or CUDA seem broken (as they did for me — multiple times), it might be better to start over by running:

sudo apt-get remove --purge nvidia*
sudo apt-get autoremove
sudo reboot


We install CuDNN 5.1 as currently Tensoflow doesn’t support CuDNN 6. To download CuDNN, one needs to register for a (free) developer account. After downloading, install with the following:

tar -xzf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/


Anaconda is a great package manager for python. I’ve moved to python 3.6, so will be using the Anaconda 3 version:

wget -O “”
bash -b
cat >> ~/.bashrc << 'EOF'
export PATH=$HOME/anaconda3/bin:${PATH}
source .bashrc
conda upgrade -y --all
source activate root


The popular DL framework by Google. Installation:

sudo apt install python3-pip
pip install tensorflow-gpu

Validate Tensorfow install: To make sure we have our stack running smoothly, I like to run the tensorflow MNIST example:

git clone
python tensorflow/tensorflow/examples/tutorials/mnist/

We should see the loss decreasing during training:

Step 0: loss = 2.32 (0.139 sec)
Step 100: loss = 2.19 (0.001 sec)
Step 200: loss = 1.87 (0.001 sec)


Keras is a great high-level neural networks framework, an absolute pleasure to work with. Installation can’t be easier too:

pip install keras


PyTorch is a newcomer in the world of DL frameworks, but its API is modeled on the successful Torch, which was written in Lua. PyTorch feels new and exciting, mostly great, although some things are still to be implemented. We install it by running:

conda install pytorch torchvision cuda80 -c soumith

Jupyter notebook

Jupyter is an web-based IDE for Python, which is ideal for data sciency tasks. It’s installed with Anaconda, so we just configure and test it:

# Create a ~/.jupyter/ with settings
jupyter notebook --generate-config
jupyter notebook --port=8888 --NotebookApp.token='' # Start it

Now if we open http://localhost:8888 we should see a Jupyter screen.

Run Jupyter on boot

Rather than running the notebook every time the computer is restarted, we can set it to autostart on boot. We will use crontab to do this, which we can edit by running crontab -e . Then add the following after the last line in the crontab file:

# Replace 'path-to-jupyter' with the actual path to the jupyter
# installation (run 'which jupyter' if you don't know it). Also
# 'path-to-dir' should be the dir where your deep learning notebooks 
# would reside (I use ~/DL/).
@reboot path-to-jupyter ipython notebook --no-browser --port=8888 --NotebookApp.token='' --notebook-dir path-to-dir &

Outside access

I use my old trusty Macbook Air for development, so I’d like to be able to log into the DL box both from my home network, also when on the run.

SSH Key: It’s way more secure to use a SSH key to login instead of a password. Digital Ocean has a great guide on how to setup this.

SSH tunnel: If you want to access your jupyter notebook from another computer, the recommended way is to use SSH tunneling (instead of opening the notebook to the world and protecting with a password). Let’s see how we can do this:

  1. First we need an SSH server. We install it by running the following on the DL box (server):
sudo apt-get install openssh-server
sudo service ssh status

2. Then to connect over SSH tunnel, run the following script on the client:

# Replace user@host with your server user and ip.
ssh -N -f -L localhost:8888:localhost:8888 user@host

To test this, open a browser and try http://localhost:8888 from the remote machine. Your Jupyter notebook should appear.

Setup out-of-network access: Finally to access the DL box from the outside world, we need 3 things:

  1. Static IP for your home network (or a service to emulate that) — so that we know on what address to connect.
  2. A manual IP or a DHCP reservation giving the DL box a permanent address on your home network.
  3. Port forwarding from the router to the DL box (instructions for your router).

Setting up out-of-network access depends on the router/network setup, so I’m not going into details.


Now that we have everything running smoothly, let’s put it to the test. We’ll be comparing the newly built box to an AWS P2.xlarge instance, which is what I’ve used so far for DL. The tests are computer vision related, meaning convolution networks with a fully connected model thrown in. We time training models on: AWS P2 instance GPU (K80), AWS P2 virtual CPU, the GTX 1080 Ti and Intel I5 7500 CPU.

MNIST Multilayer Perceptron

The “Hello World” of computer vision. The MNIST database consists of 70,000 handwritten digits. We run the Keras example on MNIST which uses Multilayer Perceptron (MLP). The MLP means that we are using only fully connected layers, not convolutions. The model is trained for 20 epochs on this dataset, which achieves over 98% accuracy out of the box.

We see that the GTX 1080 Ti is 2.4 times faster than the K80 on AWS P2 in training the model. This is rather surprising as these 2 cards should have about the same performance. I believe this is because of the virtualization or underclocking of the K80 on AWS.

The CPUs perform 9 times slower the GPUs. As we will see later, it’s a really good result for the processors. This is due to the small model which fails to fully utilize the parallel processing power of the GPUs.

Interestingly, the desktop Intel i5–7500 achieves 2.3x speedup over the virtual CPU on Amazon.

VGG Finetuning

A VGG net will be finetuned for the Kaggle Dogs vs Cats competition. In this competition we need to tell apart pictures of dogs and cats. Running the model on CPUs for the same number of batches wasn’t feasible. Therefore we finetune for 390 batches (1 epoch) on the GPUs and 10 batches on the CPUs. The code used is on github.

The 1080 Ti is 5.5 times faster that the AWS GPU (K80). The difference in

  • 打赏
  • 点赞
  • 收藏
  • 分享
共有 人打赏支持
粉丝 0
博文 2
码字总数 0
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
* 支付类型