Deep Learning Models on Kubernetes with GPUs

发布于 2018/04/27 14:13
字数 1127
阅读 164
收藏 1

Deploying Deep Learning Models on Kubernetes with GPUs

April 19, 2018 by ML Blog Team 

This post is authored by Mathew Salvaris and Fidan Boylu Uz, Senior Data Scientists at Microsoft.

One of the major challenges that data scientists often face is closing the gap between training a deep learning model and deploying it at production scale. Training of these models is a resource intensive task that requires a lot of computational power and is typically done using GPUs. The resource requirement is less of a problem for deployment since inference tends not to pose as heavy a computational burden as training. However, for inference, other goals also become pertinent such as maximizing throughput and minimizing latency. When inference speed is a bottleneck, GPUs show considerable performance gains over CPUs. Coupled with containerized applications and container orchestrators like Kubernetes, it is now possible to go from training to deployment with GPUs faster and more easily while satisfying latency and throughput goals for production grade deployments.

In this tutorial, we provide step-by-step instructions to go from loading a pre-trained Convolutional Neural Network model to creating a containerized web application that is hosted on Kubernetes cluster with GPUs on Azure Container Service (AKS). AKS makes it quick and easy to deploy and manage containerized applications without much expertise in managing Kubernetes environment. It eliminates complexity and operational overhead of maintaining the cluster by provisioning, upgrading, and scaling resources on demand, without taking the applications offline. AKS reduces the cost and complexity of using a Kubernetes cluster by managing the master nodes for which the user does no incur a cost. Azure Container Service has been available for a while and similar approach was provided in aprevious tutorial to deploy a deep learning framework on Marathon cluster with CPUs . In this tutorial, we focus on two of the most popular deep learning frameworks and provide the step-by-step instructions to deploy pre- trained models on Kubernetes cluster with GPUs.

The tutorial is organized in two parts, one for each deep learning framework, specifically TensorFlow and Keras with TensorFlow backend. Under each framework, there are several notebooks that can be executed to perform the following steps:

  • Develop the model that will be used in the application.
  • Develop the API module that will initialize the model and make predictions.
  • Create Docker image of the application with Flask and Nginx.
  • Test the application locally.
  • Create an AKS cluster with GPUs and deploy the web app.
  • Test the web app hosted on AKS.
  • Perform speed tests to understand latency of the web app.

Below, you will find short descriptions of the steps above.

Develop the Model

As the first step of the tutorial, we load the pre-trained ResNet152 model, pre-process an example image to the required format and call the model to find the top predictions. The code developed in this step will be used in the next step when we develop the API module that initializes the model and makes predictions.

Develop the API

In this step, we develop the API that will call the model. This driver module initializes the model, transforms the input so that it is in the appropriate format and defines the scoring method that will produce the predictions. The API will expect the input to be in JSON format. Once a request is received, the API will convert the JSON encoded request into the image format. The first function of the API loads the model and returns a scoring function. The second function processes the images and uses the first function to score them.

Create Docker Image

In this step, we create the Docker image that has three main parts, the web application, the pretrained model and the driver module for executing the model based on the requests made to the web application. The Docker image is based on a Nvidia image to which we only add the necessary Python dependencies and install the deep learning framework to keep the image as lightweight as possible. The Flask web app will be running on the default port 80 which is exposed on the docker image and Nginx is used to create a proxy from port 80 to port 5000. Once the container is built, we push it to a public Docker hub account for AKS cluster to pull it in later steps.

Test the Application Locally

In this step, we test our docker image by pulling it and running it locally. This step is especially important to make sure the image performs as expected before we go through the entire process of deploying to AKS. This will reduce the debugging time substantially by checking if we can send requests to the Docker container and receive predictions back properly.

Create and AKS Cluster and Deploy

In this step, we use Azure CLI to login to Azure, create a resource group for AKS and create the cluster. We create an AKS cluster with 1 node using Standard NC6series with 1 GPU. After the AKS cluster is created, we connect to the cluster and deploy the application by defining the Kubernetes manifest where we provide the image name, map port 80 and specify Nvidia library locations. We set the number of Kubernetes replicas to 1 which can later be scaled up to meet certain throughput requirements (the latter is out of scope for this tutorial). Kubernetes also has a dashboard that can simply be accessed through a web browser.

Test the Web App

In this step, we test the web application that is deployed on AKS to quickly check if it can produce predictions against images that are sent to the service.

Perform Speed Tests

In this step, we use the deployed service to measure the average response time by sending 100 asynchronous requests with only four concurrent requests at any time. These types of tests are particularly important to perform, especially for deployments with low latency requirements to make sure the cluster is scaled to meet the demand. The result of the tests suggest that the average response times are less than a second for both frameworks with TensorFlow (~20 images/sec) being much faster than its Keras (~12 images/sec) counterpart on a single K80 GPU.

As a last step, to delete the AKS and free up the Azure resources, we use the commands provided at the end of the notebook where AKS was created.

We hope you give this tutorial a try! Reach out to us with any comments or questions below.

Mathew & Fidan


We would like to thank William Buchwalter for helping us craft the Kubernetes manifest files, Daniel Grecoe for testing the throughput of the models and lastly Danielle Dean for the useful discussions and proofreading of the blog post.


粉丝 325
博文 1140
码字总数 689435
作品 1
私信 提问

来自于【http://openai.org】 By Vicki Cheung, Jonas Schneider, Ilya Sutskever, and Greg Brockman August 29, 2016 Deep learning is an empirical science, and the quality of a grou......


Apache MXNet (incubating) for Deep Learning Apache MXNet (incubating) is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic an......

A Vision for Making Deep Learning Simple

A Vision for Making Deep Learning Simple When MapReduce was introduced 15 years ago, it showed the world a glimpse into the future. For the first time, engineers at Silicon Vall......

Adaptable DL with nGraph™ Compiler and ONNX*

Adaptable Deep Learning Solutions with nGraph™ Compiler and ONNX* Artificial intelligence methods and deep learning techniques based on neural networks continue to gain adopti......

开源后 5 个月,Google 的深度学习系统都有哪些改变?

【导读】2016年4月14日,Google发布了分布式TensorFlow。Google的博文介绍了TensorFlow在图像分类的任务 中,在100个GPUs和不到65小时的训练时间下,达到了78%的正确率。在激烈的商业竞争中,...





VMware vSphere ESXi主机的访问控制

在vShpere中,访问ESXi主机的途径很多,如下: ESXi DCUI ESXi Shell ESXi SSH ESXi Host Client vCenter --> vSphere web client / vSphere Client VMware vSphere ESXi主机的访问控制,除了......


参考资料 概念了解:CGI,FastCGI,PHP-CGI与PHP-FPM:http://www.nowamagic.net/librarys/veda/detail/1319 php中fastcgi和php-fpm是什么东西:https://www.zybuluo.com/phper/note/50231 ......

《DNS攻击防范科普系列3》 -如何保障 DNS 操作安全

引言 前两讲我们介绍了 DNS 相关的攻击类型,以及针对 DDoS 攻击的防范措施。这些都是更底层的知识,有同学就来问能否讲讲和我们的日常操作相关的知识点,今天我们就来说说和我们日常 DNS 操...


实现接口Stats, Watcher 内部类 DisconnectReason CloseRequestException EndOfStreamException(流关闭) 属性 方法 getSessionTimeout 获取session失效时间 sendResponse 发送回复数据 se......

如何将 Redis 用于微服务通信的事件存储

来源:Redislabs 作者:Martin Forstner 翻译:Kevin (公众号:中间件小哥) 以我的经验,将某些应用拆分成更小的、松耦合的、可协同工作的独立逻辑业务服务会更易于构建和维护。这些服务(也...