文档章节

Kubeflow-机器学习工作流框架

openthings
 openthings
发布于 2018/05/06 15:59
字数 1111
阅读 718
收藏 0

Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.

This repository contains the manifests for creating:

  • A JupyterHub to create and manage interactive Jupyter notebooks. Project Jupyter is a non-profit, open-source project to support interactive data science and scientific computing across all programming languages.
  • A TensorFlow Training Controller that can be configured to use either CPUs or GPUs and dynamically adjusted to the size of a cluster with a single setting
  • A TensorFlow Serving container to export trained TensorFlow models to Kubernetes

This document details the steps needed to run the Kubeflow project in any environment in which Kubernetes runs.

Quick Links

The Kubeflow Mission

Our goal is to make scaling machine learning (ML) models and deploying them to production as simple as possible, by letting Kubernetes do what it's great at:

  • Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster <-> production cluster)
  • Deploying and managing loosely-coupled microservices
  • Scaling based on demand

Because ML practitioners use a diverse set of tools, one of the key goals is to customize the stack based on user requirements (within reason) and let the system take care of the "boring stuff". While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.

Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack anywhere Kubernetes is already running, and can self configure based on the cluster it deploys into.

Who should consider using Kubeflow?

Based on the current functionality you should consider using Kubeflow if:

  • You want to train/serve TensorFlow models in different environments (e.g. local, on prem, and cloud)
  • You want to use Jupyter notebooks to manage TensorFlow training jobs
  • You want to launch training jobs that use resources -- such as additional CPUs or GPUs -- that aren't available on your personal computer
  • You want to combine TensorFlow with other processes
    • For example, you may want to use tensorflow/agents to run simulations to generate data for training reinforcement learning models.

This list is based ONLY on current capabilities. We are investing significant resources to expand the functionality and actively soliciting help from companies and individuals interested in contributing (see below).

Setup

This documentation assumes you have a Kubernetes cluster already available. If you need help setting up a Kubernetes cluster please refer to Kubernetes Setup. Minikube users please check these instructions. If you want to use GPUs, be sure to follow the Kubernetes instructions for enabling GPUs.

Quick Start

Requirements

Github Tokens

It is HIGHLY likely you'll overload Github's API if you are unauthenticated. To get around this, do the following steps:

  • Go to https://github.com/settings/tokens and generate a new token. You don't have to give it any access at all as you are simply authenticating.
  • Make sure you save that token someplace because you can't see it again. If you lose it you'll have to delete and create a new one.
  • Set an environment variable in your shell: export GITHUB_TOKEN=. You may want to do this as part of your shell startup scripts (i.e. .profile).
echo "export GITHUB_TOKEN=${GITHUB_TOKEN}" >> ~/.bashrc

Steps

In order to quickly set up all components, execute the following commands:

# Create a namespace for kubeflow deployment
NAMESPACE=kubeflow
kubectl create namespace ${NAMESPACE}

# Which version of Kubeflow to use
# For a list of releases refer to:
# https://github.com/kubeflow/kubeflow/releases
VERSION=v0.1.2

# Initialize a ksonnet app. Set the namespace for it's default environment.
APP_NAME=my-kubeflow
ks init ${APP_NAME}
cd ${APP_NAME}
ks env set default --namespace ${NAMESPACE}

# Install Kubeflow components
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow

ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}

# Create templates for core components
ks generate kubeflow-core kubeflow-core

# If your cluster is running on Azure you will need to set the cloud parameter.
# If the cluster was created with AKS or ACS choose aks, it if was created
# with acs-engine, choose acsengine
# PLATFORM=<aks|acsengine>
# ks param set kubeflow-core cloud ${PLATFORM}

# Enable collection of anonymous usage metrics
# Skip this step if you don't want to enable collection.
ks param set kubeflow-core reportUsage true
ks param set kubeflow-core usageId $(uuidgen)

# Deploy Kubeflow
ks apply default -c kubeflow-core

The above commands are used to setup JupyterHub and a custom resource for running TensorFlow training jobs. Furthermore, the ksonnet packages provide prototypes that can be used to configure TensorFlow jobs and deploy TensorFlow models. Used together, these make it easy for a user to transition from training to model serving using Tensorflow with minimal effort, and in a portable fashion across different environments.

For more detailed instructions about how to use Kubeflow, including testing the Jupyter Notebook, please refer to the user guide.

Important: The commands above will enable collection of anonymous user data to help us improve Kubeflow; for more information including instructions for explictly disabling it please refer to the Usage Reporting section of the user guide.

Troubleshooting

For detailed troubleshooting instructions, please refer to this section of the user guide.

Resources

Get Involved

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

The Kubeflow community is guided by our Code of Conduct, which we encourage everybody to read before participating.

Who should consider contributing to Kubeflow?

  • Folks who want to add support for other ML frameworks (e.g. PyTorch, XGBoost, scikit-learn, etc...)
  • Folks who want to bring more Kubernetes magic to ML (e.g. ISTIO integration for prediction)
  • Folks who want to make Kubeflow a richer ML platform (e.g. support for ML pipelines, hyperparameter tuning)
  • Folks who want to tune Kubeflow for their particular Kubernetes distribution or Cloud
  • Folks who want to write tutorials/blog posts showing how to use Kubeflow to solve ML problems

本文转载自:https://github.com/kubeflow/kubeflow

openthings
粉丝 322
博文 1133
码字总数 685064
作品 1
东城
架构师
私信 提问
AirFlow/NiFi/MLFlow/KubeFlow进展

大数据分析中,进行流程化的批处理是必不可少的。传统的大数据处理大部分是基于关系数据库系统,难以实现大规模扩展;主流的基于Hadoop/Spark体系总体性能较强,但使用复杂、扩展能力弱。大数...

openthings
06/21
326
0
Kubeflow镜像的快速下载(V0.3.3)

Kubeflow是一个面向Kubernetes集群运行的机器学习框架。要想使用得先想办法把镜像搬到自己的环境里来。 目前版本0.3.3的容器镜像已经搬回来,可以使用下面的脚本来从Aliyun的镜像服务站下载:...

openthings
2018/11/28
645
0
像Google一样构建机器学习系统3 - 利用MPIJob运行ResNet101

本系列将利用阿里云容器服务,帮助您上手Kubeflow Pipelines. 第一篇:在阿里云上搭建Kubeflow Pipelines 第二篇:开发你的机器学习工作流 第三篇:利用MPIJob运行ResNet101 从上篇文章中,我...

必嘫
05/17
0
0
Kubeflow 0.1 发布,基于 Kubernetes 的机器学习工具库

Google 发布了 Kubeflow 开源工具 0.1 版本,该工具旨在将机器学习带入 Kubernetes 容器的世界。该项目背后的想法是让数据科学家充分利用在 Kubernetes 集群上运行机器学习任务的优势。Kubef...

局长
2018/05/07
1K
1
谷歌发布Kubeflow 0.1版本,基于Kubernetes的机器学习工具包

自从 Google 发布开源容器编排工具——Kubernetes 以来,我们已经见证了其以各种方式遍地开花的景象。随着 Kubernetes 越来越受欢迎,许多辅助项目也已经发展起来。现在,Google 发布了Kubef...

Docker
2018/05/06
0
0

没有更多内容

加载失败,请刷新页面

加载更多

精华帖

第一章 jQuery简介 jQuery是一个JavaScript库 jQuery具备简洁的语法和跨平台的兼容性 简化了JavaScript的操作。 在页面中引入jQuery jQuery是一个JavaScript脚本库,不需要特别的安装,只需要...

流川偑
13分钟前
3
0
语音对话英语翻译在线翻译成中文哪个方法好用

想要进行将中文翻译成英文,或者将英文翻译成中文的操作,其实有一个非常简单的工具就能够帮助完成将语音进行翻译转换的软件。 在应用市场或者百度手机助手等各大应用渠道里面就能够找到一款...

401恶户
24分钟前
1
0
jenkins 插件下载加速最终方案

推荐做法 1、告诉jenkins 我哪些插件需要更新 jenkins插件清华大学镜像地址 https://mirrors.tuna.tsinghua.edu.cn/jenkins/updates/update-center.json 1.进入jenkins系统管理 2.进入插件管...

vasks
30分钟前
3
0
composer爆错:zlib_decode():data error

解决办法:先用 composer diagnose 命令检测 然后 composer self-update 更新composer版本 最后执行 composer update 或者 composer install composer 切换阿里云镜像 用起来还快 composer c...

koothon
37分钟前
3
0
shangcheng-my

1.数据库主键、外键类型为bigint,那么在后台应该用什么类型的变量定义? 后台用string接收,因为前段传过来的一般都是json字符串,后台直接接收,mysql是可以吧数字类型的字符串转换为对应的...

榴莲黑芝麻糊
昨天
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部