文档章节

Jupyter Hub on Kubernetes Part II: NFS

openthings
 openthings
发布于 2018/05/25 10:05
字数 1453
阅读 212
收藏 0

Jupyter Hub on Kubernetes Part II: NFS

September 10, 2016

Second part of my JupyterHub deployment in Kubernetes experiment be sure to read Part I.

Last time we got JupyterHub authenticating to LDAP and creating the single user notebooks in Kubernetes containers. As I mentioned in that post one big problem with that deployment was that the notebook files are gone when the pod is deleted so this time I add an NFS volume to the JupyterHub single user containers to persist the notebook data.

Also improved a little bit the deployment and code so to its no longer needed to build a custom image you can just pull two images from my docker hub registry and configure them using a Kubernetes ConfigMap.

All the code in this post is at danielfrg/jupyterhub-kubernetes_spawner. Specifically in the exampledirectory.

NFS

There is multiple options to have persistent data in Kubernetes containers. I chose NFS because its one of the few types that allows multiple containers to have read and write access (ReadWriteMany). Most of the persistent volume types have read and write access to only one container (ReadWriteOnce) and/or read only access to multiple containers (ReadOnlyMany).

I wanted to have all JupyterHub single user containers write their notebooks to the same location so its easier to do backups of the data but since the notebook servers are single user each one of those containers could use its own disk. On that case backup and maintenance is more complicated.

The following is heavily based on the Kubernetes NFS documentation.

In order to mount an NFS volume into the Kubernetes containers we need an NFS server but even before that we need a Volume that the NFS server can use to store the data.

In GCE we can use a persistent volume claim to ask for a Disk resource:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jupyterhub-storage
  annotations:
    volume.alpha.kubernetes.io/storage-class: any
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 200Gi

When you execute this using kubectl -f file.yaml GCE will create a new disk that you can see at the GCE cloud console, something like this:

Now that we have a Disk that the NFS server can use to store its data we create a Deployment of the NFS server and a service to expose it.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: jupyterhub-nfs
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: jupyterhub-nfs
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google-samples/nfs-server:1.1
        ports:
          - name: nfs
            containerPort: 2049
          - name: mountd
            containerPort: 20048
          - name: rpcbind
            containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
          - name: mypvc
            mountPath: /exports
      volumes:
        - name: mypvc
          persistentVolumeClaim:
            claimName: jupyterhub-storage
---
kind: Service
apiVersion: v1
metadata:
  name: jupyterhub-nfs
spec:
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
  selector:
    app: jupyterhub-nfs

All this can be found in the example/nfs.yml file when executed there should be a jupyterhub-nfs service.

$ kubectl create -f example/nfs.yml
...

$ kubectl get services
NAME                     CLUSTER-IP       EXTERNAL-IP       PORT(S)                      AGE
jupyterhub-nfs           10.103.253.185               2049/TCP,20048/TCP,111/TCP   15h

Now with the NFS service ready we need to create a Persistent Volume and another Persistent Volume Claim for the containers to use. There is a need for a small manual step here since at the moment its not possible to get a service IP from a Persistent Volume Object. So open example/nfs2.yml and change {{ X.X.X.X }} for the jupyterhub-nfs service IP, in this case 10.103.253.185.

The nfs2.yml file also includes a small NGINX deployment with autoindex on; to see the files on the server. Its the base nginx image with a ConfigMap to change the default NGNIX conf file.

apiVersion: v1
kind: ConfigMap
metadata:
  name: jupyterhub-nfs-web-config
data:
  default.conf: |-
    server {
        listen   80;
            server_name  localhost;
            root /usr/share/nginx/html;
            location / {
                index none;
                autoindex on;
                autoindex_exact_size off;
                autoindex_localtime on;
        }
    }
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: jupyterhub-nfs-web
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: jupyterhub-nfs-web
    spec:
      containers:
      - name: web
        image: nginx
        ports:
          - containerPort: 80
        volumeMounts:
          - name: nfs
            mountPath: "/usr/share/nginx/html"
          - name: config-volume
            mountPath: "/etc/nginx/conf.d/"
      volumes:
        - name: nfs
          persistentVolumeClaim:
            claimName: jupyterhub-nfs
        - name: config-volume
          configMap:
            name: jupyterhub-nfs-web-config

Start the deployment and service and you should see a new jupyterhub-nfs-web service

$ kubectl create -f example/nfs2.yml
...

$ kubectl get services
NAME                     CLUSTER-IP       EXTERNAL-IP       PORT(S)                      AGE
jupyterhub-nfs-web       10.103.241.248   104.197.178.53    80/TCP                       15h

Going to that External IP you should see an empty NGINX file listing.

Thats it! You have a NFS server and the kubernetes_spawer will take care of mounting it to the containers based on some settings.

JupyterHub

Last time we had the JupyterHub creating containers in Kubernetes now we have the same functionality but it should be easier to get it running and configured based on a Kubernetes ConfigMap and two example docker images I uploaded to my docker registry.

In the last post it was needed to create custom a JupyterHub container images for each deployment, this image used to havethe jupyterhub_config.py file with some values (credentials and ip addresses). I changed that deployment to user a Kubernetes ConfigMap that creates the config file so some base container images can be reused and just need to fill the missing values in the ConfigMap (example/hub.yml).

  1. danielfrg/jupyterhub-kube-ldap: Is based on jupyterhub/jupyterhub and includes thejupyterhub/ldapauthenticator and my jupyterhub-kubernetes_spawner
  2. danielfrg/jupyterhub-kube-ldap-singleuser: Is based on jupyterhub/singleuser and just has different startup script to create the working notebook directory before starting the jupyterhub-singleuser server

There is some missing values that need to be filled before starting the deployment but they are very easy to find.

apiVersion: v1
kind: ConfigMap
metadata:
  name: jupyterhub-config-py
data:
  jupyterhub-config.py: |-
    c.JupyterHub.confirm_no_ssl = True
    c.JupyterHub.db_url = 'sqlite:////tmp/jupyterhub.sqlite'
    c.JupyterHub.cookie_secret_file = '/tmp/jupyterhub_cookie_secret'

    c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
    c.LDAPAuthenticator.bind_dn_template = 'cn={username},cn=jupyterhub,dc=example,dc=org'
    # c.LDAPAuthenticator.server_address = '{{ LDAP_SERVICE_IP }}'
    c.LDAPAuthenticator.use_ssl = False

    c.JupyterHub.spawner_class = 'kubernetes_spawner.KubernetesSpawner'
    # c.KubernetesSpawner.host = '{{ KUBE_HOST }}'
    # c.KubernetesSpawner.username = '{{ KUBE_USER }}'
    # c.KubernetesSpawner.password = '{{ KUBE_PASS }}'
    c.KubernetesSpawner.verify_ssl = False
    c.KubernetesSpawner.hub_ip_from_service = 'jupyterhub'
    c.KubernetesSpawner.container_image = "danielfrg/jupyterhub-kube-ldap-singleuser:0.1"
    c.Spawner.notebook_dir = '/mnt/notebooks/%U'
    c.KubernetesSpawner.persistent_volume_claim_name = 'jupyterhub-nfs'
    c.KubernetesSpawner.persistent_volume_claim_path = '/mnt'

Based on the default settings the Kubernetes spawner will user a Kubernetes Persistent Volume Claim and mount it under /mnt then I just have to tell the Spawner what directory under /mnt to use for the user notebooks, for example '/mnt/notebooks/danielfrg'.

Start the JupyterHub service same as before.

$ kubectl create -f hub.yml

Wait for the service to give you an External IP and login as any LDAP user (see Part I).

After logging in a new pod will be created and mount the Persistent Volume Claim that uses the NFS Server. Create a couple of notebooks and in the jupyterhub-nfs-web service you should see something like this.

Now from the JupyterHub admin interface you should be able to terminate the server and start a new one
and the notebooks should be persisted even when they will be a different Pod.

Future

I think that the persisting notebooks was the biggest issue with the previous deployment that was fixed but there is some security issues.

All the JupyterHub single user containers run using the same user (root in this example) and the whole NFS is mounted to all the containers this means that all the users have access to all the (other users) notebooks. Even if by default it only shows the user notebooks its possible to access the files at /mnt.

There are some possible solutions in the NFS side but it can also happen in the Kubernetes and Docker side, maybe with a custom Persistent Volume Claim per user (?). More work is needed here.

The second big (also security) issue right now is how the JupyterHub access the Kubernetes API. Right now its needed to put the credentials in the ConfigMap and I was planning on user Kubernetes secrets to make it a little bit more secure but thats not needed.

While I was building this post I found (as I should have expected) that Kubernetes thought about this issue and that it has the concept of Service Accounts.

With a Service Account its possible to access the Kubernetes API without having to use the User Account credentials. These Service Account credentials can even be mounted (and are by default) to a container so it should be very easy to change the code to use them.

This removes the need to have a secondary User Account for the JupyterHub. Thanks Kubernetes!

This is probably the last (big) post about the this experiment but I plan to keep working on the kubernetes_spawner to make it better. At minimum I am for sure going to test and user the Service Account to make the deployment more secure.

Update: I have added support for Kubernetes Service Accounts to the kubernetes_spawner. So user account credentials are no longer needed. For updated docs and information take a look at the example folder here: examples/ldap_nfs.

本文转载自:http://danielfrg.com/blog/2016/09/10/jupyterhub-kubernetes-nfs/

openthings
粉丝 324
博文 1140
码字总数 689435
作品 1
东城
架构师
私信 提问
JupyterHub on Kubernetes-Helm安装

JupyterHub on Kubernetes-Helm安装 本文编译自 https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub.html 本文地址,https://my.oschina.net/u/2306127/blog/1836933,b......

openthings
2018/06/28
800
0
Kubeflow 0.1 发布,基于 Kubernetes 的机器学习工具库

Google 发布了 Kubeflow 开源工具 0.1 版本,该工具旨在将机器学习带入 Kubernetes 容器的世界。该项目背后的想法是让数据科学家充分利用在 Kubernetes 集群上运行机器学习任务的优势。Kubef...

局长
2018/05/07
1K
1
谷歌发布Kubeflow 0.1版本,基于Kubernetes的机器学习工具包

自从 Google 发布开源容器编排工具——Kubernetes 以来,我们已经见证了其以各种方式遍地开花的景象。随着 Kubernetes 越来越受欢迎,许多辅助项目也已经发展起来。现在,Google 发布了Kubef...

Docker
2018/05/06
0
0
JupyterHub on Kubernetes-- Helm Chart简介

JupyterHub on Kubernetes-- Helm Chart简介 目前,JupyterHub支持在Kubernetes上的Helm Chart安装,(2018.06)最新版本是0.6。 本文来源,https://blog.jupyter.org/announcing-the-jupyt......

openthings
2018/06/27
378
0
业界 | 谷歌发布机器学习工具库Kubeflow:可提供最佳OSS解决方案

  选自GitHub   机器之心编译      Kubeflow 是谷歌发布的一个机器学习工具库,致力于使运行在 Kubernetes 上的机器学习变的更轻松、便捷和可扩展;Kubeflow 的目标不是重建其他服务...

机器之心
2017/12/11
0
0

没有更多内容

加载失败,请刷新页面

加载更多

谁说多功能和低价格不能兼得?Aspose系列产品1024购买指南请查收!

你还在为了Word、Excel、PDF、CAD等文档格式转换而发愁吗? 你是否在寻找一款能够在应用程序中文档管理的工具呢? Aspose——支持100多种文件格式创建、编辑、转换和打印! 往下看,找一找哪...

mnrssj
25分钟前
3
0
hbase客户端API

本章介绍用于对HBase表上执行CRUD操作的HBase Java客户端API。 HBase是用Java编写的,并具有Java原生API。因此,它提供了编程访问数据操纵语言(DML)。 HBaseConfiguration类 添加 HBase 的配...

水木星辰
25分钟前
3
0
[插件化开发] 1. 初识OSGI

初识 OSGI 背景 当前product是以solution的方式进行售卖,但是随着公司业务规模的快速夸张,随之而来的是新客户的产品开发,老客户的产品维护,升级以及修改bug,团队的效能明显下降,为了解...

IsaacZhang
25分钟前
4
0
Webstorm 环境使用 nuxt.js 做开发,@ 和 ~ 别名配置

好的IDE + 好的代码提示 = 高效率的开发 webstorm 设置@和~别名,有助于代码查看和跳转. step 0 在项目下创建一个webpack.config.js,内容如下: const path = require('path')module.exp...

皇虫
29分钟前
3
0
Knative 实战:基于 Knative Serverless 技术实现天气服务-下篇

上一期我们介绍了如何基于 Knative Serverless 技术实现天气服务-上篇,首先我们先来回顾一下上篇介绍的内容: 通过高德天气 API 接口,每隔 3 个小时定时发送定时事件,将国内城市未来 3 天...

Mr_zebra
47分钟前
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部