The $1700 great Deep Learning box: Assembly, setup and benchmarks

发布于 2017/05/30 18:57
字数 3146
阅读 6
收藏 0
点赞 0
评论 0

The most important reason was saving time while prototyping models — if they trained faster, the feedback time would be shorter. Thus it would be easier on my brain to connect the dots between the assumptions I had for the model and its results.

Then I wanted to save money — I was using Amazon Web Services (AWS), which offered P2 instances with Nvidia K80 GPUs. Lately the AWS bills were around $60–70/month with a tendency to get larger. Also it is expensive to store large datasets, like ImageNet.

And lastly I haven’t had a desktop for over 10 years, and wanted to see what has changed in the meantime (spoiler alert: mostly nothing).

What follows are my choices, inner monologue and gotchas: from choosing the components to benchmarking.

Table of contents

1. Choosing components
2. Putting it together
3. Software Setup
4. Benchmarks

Choosing the components

A sensible budget for me would be about 2 years worth of my current compute spending. At $70/month for AWS, this put it at around $1700 for the whole thing.

You can checkout all the components used. The PC Part Picker site is also really helpful in detecting if some of the components don’t play well together.


The GPU is the most crucial component in the box. It will train these deep networks fast, shortening the feedback cycle.

The GPU is important is because: a) most calculations in DL are matrix operations, like matrix multiplication. They can be slow if done on the CPU. b) As we are doing thousands of these operations in a typical neural network, the slowness really adds up (as we will see on the benchmarks later). On the other hand, GPUs, rather conveniently, are able to run all these operations in parallel. They have a large number of cores, which can run even larger number of threads. GPUs also have much higher memory bandwidth which enables them to run these parallel operations on a bunch of data at once.

My choice was between a few of Nvidia’s cards: GTX 1070 ($360), GTX 1080 ($500), GTX 1080 Ti ($700) and finally the Titan X ($1320).

On performance side: GTX 1080 Ti and Titan X are similar, roughly speaking the GTX 1080 is about 25% faster than GTX 1070, and GTX 1080 Ti is about 30% faster than GTX 1080.

Tim Dettmers has a great article on picking a GPU for Deep Learning, which he regularly updates as new cards come on the market.

Here are the things to consider when picking a GPU:

  1. Maker: No contest on this one — get Nvidia. They has been focusing on Machine Learning for a number of years now, and it’s paying off. Their CUDA toolkit is entrenched so deeply that it is literally the only choice for the DL practitioner.
  2. Budget: The Titan X got a really bad mark here as it is offering the same performance as the 1080 Ti for about $500 more.
  3. One or multiple: I considered picking a couple of 1070s instead of 1080 or 1080 Ti. This would have allowed me to either train a model on two cards or train two models at once. Currently training a model on multiple cards is a bit of a hassle, though things are changing with PyTorch and Caffe 2 offering almost linear scaling with the number of GPUs. The other option — training two models at once seemed to have more value, but I decided to get a single more powerful card now and add a second one later.
  4. Memory: More memory is better. With more memory, we could deploy bigger models and use sufficiently large batch size during training (which helps the gradient flow).
  5. Memory bandwidth: This enables the GPU to operate on large amounts of memory. Tim Dettmers points out that this is the most important characteristic of a GPU.

Considering all of this, I picked the GTX 1080 Ti, mainly for the training speed boost. I plan to add a second 1080 Ti soonish.


Even thought the GPU is the MVP in deep learning, the CPU still matters. For example, data preparation is usually done on the CPU. The number of cores and threads per core is important if we want to parallelize all that data prep.

To stay on budget, I picked a mid-range CPU, the Intel i5 7500 for about $190. It’s relatively cheap but good enough to not slow things down.

Memory (RAM)

It’s nice to have a lot of memory if we are to be working with rather big datasets. I got 2 sticks of 16 GB, for a total of 32 GB of RAM for $230, and plan to buy another 32 GB later.


Following Jeremy Howard’s advice, I got a fast SSD disk to keep my OS and current data on, and then a slow spinning HDD for those huge datasets (like ImageNet).
SSD: I remember when I got my first Macbook Air years ago, how blown away was I by the SSD speed. To my delight, a new generation of SSD called NVMe has made its way to market in the meantime. A 480 GB MyDigitalSSDNVMe drive for $230 was a great deal. This baby copies files at gigabytes per second. 
HDD: 2 TB for $66. While SSDs have been getting fast, HDD have been getting cheap. To somebody who has used Macbooks with 128 GB disk for the last 7 years, having this much space feels almost obscene.


The one thing that I kept in mind when picking a motherboard was the ability to support two GTX 1080 Ti, both in number of PCI Express Lanes (the minimum is 2x8) and the physical size of 2 cards. Also make sure it’s compatible with the chosen CPU. An Asus TUF Z270 for $130 did it for me.

Power Supply

Rule of thumb: it should provide enough power for the CPU and the GPUs, plus 100 watts extra. 
The Intel i5 7500 processor uses 65W, and the GPUs (1080 Ti) need 250W each, so I got a Deepcool 750W Gold PSU for $75. The “Gold” here refers to the power efficiency, i.e how much of the power consumed is wasted as heat.


The case should be the same form factor as the motherboard. Also having enough LEDs to embarrass a Burner is a bonus.

A friend recommended the Thermaltake N23 case for $50, which I promptly got. No LEDs sadly.

Putting it all together

If you don’t have much experience with hardware and fear you might break something, a professional assembly might be the best option. However, this was a great learning opportunity that I couldn’t pass (even though I’ve had my share of hardware-related horror stories).

The first and important step is to read the installation manuals that came with each component. Especially important for me, as I’ve done this before once or twice, and I have just the right amount of inexperience to mess things up.

Install the CPU on the Motherboard


The CPU in its slot, the lever refusing to go down.

This is done before installing the motherboard in the case. Next to the processor there is a lever that needs to be pulled up. The processor is then placed on the base (double-check the orientation). Finally the lever comes down to fix the CPU in place.


Me being assisted in installing the CPU



But I had a quite the difficulty doing this: once the CPU was in position the lever wouldn’t go down. I actually had a more hardware-capable friend of mine video walk me through the process. Turns out the amount of force required to get the lever locked down was more than what I was comfortable with.


The installed fan

Next is fixing the fan on top of the CPU: the fan legs must be fully secured to the motherboard. Also consider where the fan cable will go before installing. The processor I had came with thermal paste. If yours doesn’t, make sure to put some paste between the CPU and the cooling unit. Also replace the paste if you take off the fan.

Install Power Supply in the Case


Fitting the power cables through the back side.

I put the Power Supply Unit (PSU) in before the motherboard to get the power cables snugly placed in case back side.





Install the Motherboard in the case


Having fun with magnets

Pretty straight forward — carefully place it and screw it in. A magnetic screwdriver was really helpful.

Then connect the power cables and the case buttons and LEDs.



Install the NVMe Disk


Just slide it in the M2 slot and screw it in. Piece of cake.


Install the RAM


The GTX 1080 Ti calmly waiting its turn as I struggle with the RAM in the background.

The memory proved quite hard to install, requiring too much effort to properly lock in. A few times I almost gave up, thinking I must be doing it wrong. Eventually one of the sticks clicked in and the other one promptly followed.

At this point I turned the computer on to make sure it works. To my relief, it started right away!

Install the GPU


The GTX 1080 Ti setting into its new home

Finally, the GPU slid in effortlessly. 14 pins of power later and it was running.

NB: Do not plug your monitor in the external card right away. Most probably it needs drivers to function (see below).

Finally, it’s complete!


Software Setup

Now that we have the hardware in place, only the soft part remains. Out with the screwdriver, in with the keyboard.

Note on dual booting: If you plan to install Windows (because, you know, for benchmarks, totally not for gaming), it would be wise to do Windows first and Linux second. I didn’t and had to reinstall Ubuntu because Windows messed up the boot partition. Livewire has a detailed article on dual boot.

Install Ubuntu

Most DL frameworks are designed to work on Linux first, and eventually support other operating systems. So I went for Ubuntu, my default Linux distribution. An old 2GB USB drive was laying around and worked great for the installation. UNetbootin (OSX) 0r Rufus (Windows) can prepare the Linux thumb drive. The default options worked fine during the Ubuntu install.

At the time of writing, Ubuntu 17.04 was just released, so I opted for previous version (16.04), whose quirks are much better documented online.

Ubuntu Server or Desktop: The Server and Desktop editions of Ubuntu are almost identical, with the notable exception of the visual interface (called X) not being installed with Server. I installed the Desktop and disabled autostarting X, so that the computer would boot it in terminal mode. If needed, one could launch the visual desktop later by typing startx.

Getting up to date

Let’s get our install up to date. From Jeremy Howard’s excellent install-gpu script:

sudo apt-get update
sudo apt-get --assume-yes upgrade
sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common
sudo apt-get --assume-yes install git

The Deep Learning stack

To deep learn on our machine, we need a stack of technologies to use our GPU:

  • GPU driver — A way for the operating system to talk to the graphics card.
  • CUDA — Allows us to run general purpose code on the GPU.
  • CuDNN — Provides deep neural networks routines on top of CUDA.
  • A DL framework — Tensorflow, PyTorch, Theano, etc. They make live easier by abstracting the lower levels of the stack.

Install CUDA

Download CUDA from Nvidia, or just run the code below:

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

After CUDA has been installed the following code will add the CUDA installation to the PATH variable:

cat >> ~/.tmp << 'EOF'
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
source ~/.bashrc

Now we can verify that CUDA has been installed successfully by running

nvcc --version # Checks CUDA version
nvidia-smi # Info about the detected GPUs

This should have installed the display driver as well. For me, nvidia-smishowed ERR as the device name, so I installed the latest Nvidia drivers (at time of writing) to fix it:

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/378.13/NVIDIA-Linux-x86_64-378.13.run
sudo sh NVIDIA-Linux-x86_64-375.39.run
sudo reboot

Removing CUDA/Nvidia drivers

If at any point the drivers or CUDA seem broken (as they did for me — multiple times), it might be better to start over by running:

sudo apt-get remove --purge nvidia*
sudo apt-get autoremove
sudo reboot


We install CuDNN 5.1 as currently Tensoflow doesn’t support CuDNN 6. To download CuDNN, one needs to register for a (free) developer account. After downloading, install with the following:

tar -xzf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/


Anaconda is a great package manager for python. I’ve moved to python 3.6, so will be using the Anaconda 3 version:

wget https://repo.continuum.io/archive/Anaconda3-4.3.1-Linux-x86_64.sh -O “anaconda-install.sh”
bash anaconda-install.sh -b
cat >> ~/.bashrc << 'EOF'
export PATH=$HOME/anaconda3/bin:${PATH}
source .bashrc
conda upgrade -y --all
source activate root


The popular DL framework by Google. Installation:

sudo apt install python3-pip
pip install tensorflow-gpu

Validate Tensorfow install: To make sure we have our stack running smoothly, I like to run the tensorflow MNIST example:

git clone https://github.com/tensorflow/tensorflow.git
python tensorflow/tensorflow/examples/tutorials/mnist/fully_connected_feed.py

We should see the loss decreasing during training:

Step 0: loss = 2.32 (0.139 sec)
Step 100: loss = 2.19 (0.001 sec)
Step 200: loss = 1.87 (0.001 sec)


Keras is a great high-level neural networks framework, an absolute pleasure to work with. Installation can’t be easier too:

pip install keras


PyTorch is a newcomer in the world of DL frameworks, but its API is modeled on the successful Torch, which was written in Lua. PyTorch feels new and exciting, mostly great, although some things are still to be implemented. We install it by running:

conda install pytorch torchvision cuda80 -c soumith

Jupyter notebook

Jupyter is an web-based IDE for Python, which is ideal for data sciency tasks. It’s installed with Anaconda, so we just configure and test it:

# Create a ~/.jupyter/jupyter_notebook_config.py with settings
jupyter notebook --generate-config
jupyter notebook --port=8888 --NotebookApp.token='' # Start it

Now if we open http://localhost:8888 we should see a Jupyter screen.

Run Jupyter on boot

Rather than running the notebook every time the computer is restarted, we can set it to autostart on boot. We will use crontab to do this, which we can edit by running crontab -e . Then add the following after the last line in the crontab file:

# Replace 'path-to-jupyter' with the actual path to the jupyter
# installation (run 'which jupyter' if you don't know it). Also
# 'path-to-dir' should be the dir where your deep learning notebooks 
# would reside (I use ~/DL/).
@reboot path-to-jupyter ipython notebook --no-browser --port=8888 --NotebookApp.token='' --notebook-dir path-to-dir &

Outside access

I use my old trusty Macbook Air for development, so I’d like to be able to log into the DL box both from my home network, also when on the run.

SSH Key: It’s way more secure to use a SSH key to login instead of a password. Digital Ocean has a great guide on how to setup this.

SSH tunnel: If you want to access your jupyter notebook from another computer, the recommended way is to use SSH tunneling (instead of opening the notebook to the world and protecting with a password). Let’s see how we can do this:

  1. First we need an SSH server. We install it by running the following on the DL box (server):
sudo apt-get install openssh-server
sudo service ssh status

2. Then to connect over SSH tunnel, run the following script on the client:

# Replace user@host with your server user and ip.
ssh -N -f -L localhost:8888:localhost:8888 user@host

To test this, open a browser and try http://localhost:8888 from the remote machine. Your Jupyter notebook should appear.

Setup out-of-network access: Finally to access the DL box from the outside world, we need 3 things:

  1. Static IP for your home network (or a service to emulate that) — so that we know on what address to connect.
  2. A manual IP or a DHCP reservation giving the DL box a permanent address on your home network.
  3. Port forwarding from the router to the DL box (instructions for your router).

Setting up out-of-network access depends on the router/network setup, so I’m not going into details.


Now that we have everything running smoothly, let’s put it to the test. We’ll be comparing the newly built box to an AWS P2.xlarge instance, which is what I’ve used so far for DL. The tests are computer vision related, meaning convolution networks with a fully connected model thrown in. We time training models on: AWS P2 instance GPU (K80), AWS P2 virtual CPU, the GTX 1080 Ti and Intel I5 7500 CPU.

MNIST Multilayer Perceptron

The “Hello World” of computer vision. The MNIST database consists of 70,000 handwritten digits. We run the Keras example on MNIST which uses Multilayer Perceptron (MLP). The MLP means that we are using only fully connected layers, not convolutions. The model is trained for 20 epochs on this dataset, which achieves over 98% accuracy out of the box.

We see that the GTX 1080 Ti is 2.4 times faster than the K80 on AWS P2 in training the model. This is rather surprising as these 2 cards should have about the same performance. I believe this is because of the virtualization or underclocking of the K80 on AWS.

The CPUs perform 9 times slower the GPUs. As we will see later, it’s a really good result for the processors. This is due to the small model which fails to fully utilize the parallel processing power of the GPUs.

Interestingly, the desktop Intel i5–7500 achieves 2.3x speedup over the virtual CPU on Amazon.

VGG Finetuning

A VGG net will be finetuned for the Kaggle Dogs vs Cats competition. In this competition we need to tell apart pictures of dogs and cats. Running the model on CPUs for the same number of batches wasn’t feasible. Therefore we finetune for 390 batches (1 epoch) on the GPUs and 10 batches on the CPUs. The code used is on github.

The 1080 Ti is 5.5 times faster that the AWS GPU (K80). The difference in


共有 人打赏支持
粉丝 0
博文 2
码字总数 0
作品 0
Python Tools for Machine Learning

原文:https://www.cbinsights.com/blog/python-tools-machine-learning/ Python is one of the best programming languages out there, with an extensive coverage in scientific computin......

Applied Deep Learning Resources

Contact GitHub API Training Shop Blog About © 2016 GitHub, Inc. Terms Privacy Security Status Help...

Developer Heroes: Meet Marcus From the Legion

Who? Marcus Noble, Senior software engineer [Developer Economics] Hello! Tell us about your role and what you do: [Marcus Noble] Hello! My name is Marcus Noble and I’m a senio......

Sofia Aliferi

Author: Zongwei Zhou | 周纵苇 Weibo: @MrGiovanni Email: zongweiz@asu.edu CVPR官网信息: CVPR录用论文集 CVPR 2017 open access CVPR的流程 PDF: (link) Word: (link) At-a-Glance Sum......

Machine Learning for Cybercriminals, Part 2

Welcome back! If you missed Part 1, you can check it out here! Machine Learning for Unauthorized Access The next step is obtaining unauthorized access to user accounts. Imagine ......

alexander polyakov
谁才是最快的消息队列:ActiveMQ, RabbitMQ, HornetQ, QPID...

Lately I performed a message queue benchmark, comparing several queuing frameworks (RabbitMQ, ActiveMQ…). Those benchmarks are part of a complete study conducted by Adina Mihai......


来自于【http://openai.org】 By Vicki Cheung, Jonas Schneider, Ilya Sutskever, and Greg Brockman August 29, 2016 Deep learning is an empirical science, and the quality of a grou......


2016-11-27 机器学习研究会 点击上方“机器学习研究会”可以订阅哦 摘要 转自:爱可可-爱生活 In this step-by-step Keras tutorial, you’ll learn how to build a convolutional neural ne...


https://github.com/rasmusbergpalm/DeepLearnToolbox 官网 Deprecation notice.关于弃用的提醒 This toolbox is outdated and no longer maintained.目前该工具箱已经过时,不再进行维护。 ......


1.【会议】Bayesian Deep Learning 简介: While deep learning has been revolutionary for machine learning, most modern deep learning models cannot represent their uncertainty nor......







一、 组件component 1. 什么是组件? 组件(Component)是 Vue.js 最强大的功能之一。组件可以扩展 HTML 元素,封装可重用的代码组件是自定义元素(对象) 2. 定义组件的方式 方式1:先创建...

Saltstack配置之 nodegroups

#cd /etc/salt #mkdir master.d #vim node.conf //按组写入文件 nodegroups: client_all: 'L@192.168._._,192.168._._' clienta: 'L@192.168.192._' clientb: 'L@192.168.192._' #/etc/init......

expect(spawn) 自动化git提交和scp拷贝---centos(linux)

**在进行SCP文件拷贝中,往往需要进行用户密码的输入,即用户交互。若采用自动化脚本的方式进行,则可用以下方式: ** #!/usr/bin/expect #设置参数 set src [lindex $argv 0] set dest [lin...


如果一个类的属性过多,用构造器来构建对象很难写,因此我们时用Build方式来构建对象。写法大致如下。 import java.io.Serializable;import java.util.Date;public class Log impleme...

利用 acme.sh 获取网站证书并配置https访问

acme.sh 实现了 acme 协议, 可以从 letsencrypt 生成免费的证书.(https://github.com/Neilpang/acme.sh/wiki/%E8%AF%B4%E6%98%8E) 主要步骤: 安装 acme.sh 生成证书 copy 证书到 nginx/ap...


微擎的框架内,图片选择后,获取的是那个字符串是media_id,相当于你这张图片在微信的图片服务器里面的id 要求是:获取https://mmbiz.qpic.cn/mmbiz_jpg/…… 微信图片的路径 而微信并没有根据m...

Spring boot中日期的json格式化

Model 在model层中,类的日期属性上面添加如下注解: @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd hh:mm:ss") 参考 Jackson Date格式化教程...

Eclipse:Failed to load the JNI shared library

1.问题背景: 由于我之前使用jdk1.9学习,当使用Luke的时候发现jdk版本过高,需要向下配置jdk,就向朋友拷了一个安装包。重新配置路径后,便开始报错。 2.问题描述: Failed to load the JNI...


少儿学习编程课程是否真的适合七八岁的低龄儿童[图]: 天下熙熙皆为利来,天下攘攘皆为利往。 这几年来,乐高教育机构在国内如同雨后春笋般出现,当然关闭/转手的也很多。从教师角度来看,部...


1.词项查询介绍 全文查询将在执行之前分析查询字符串,但词项级别查询将按照存储在倒排索引中的词项进行精确操作。这些查询通常用于数字,日期和枚举等结构化数据,而不是全文本字段。 或者,...