JupyterHub on Kubernetes-容器存储
For the purposes of this guide, we’ll describe “storage” as a “volume” - a location on a disk where a user’s data resides.
Kubernetes handles the creation and allocation of persistent volumes, under-the-hood it uses the cloud provider’s API to issue the proper commands. To that extent most of our discussion around volumes will describe Kubernetes objects.
JupyterHub uses Kubernetes to manage user storage. There are two primary Kubernetes objects involved in allocating storage to pods:
PVC) specifies what kind of storage is required. Its configuration is specified in your
PV) is the actual volume where the user’s data resides. It is created by Kubernetes using details in a
As Kubernetes objects, they can be queried with the standard
kubectl commands (e.g.,
kubectl --namespace=<your-namespace> get pvc)
In JupyterHub, each user gets their own
PersistentVolumeClaim object, representing the data attached to their account. When a new user starts their JupyterHub server, a
PersistentVolumeClaim is created for that user. This claim tells Kubernetes what kind of storage (e.g., ssd vs. hd) as well as how much storage is needed. Kubernetes checks to see whether a
PersistentVolume object for that user exists (since this is a new user, none will exist). If no
PV object exists, then Kubernetes will use the
PVC to create a new
PV object for the user.
Now that a
PV exists for the user, Kubernetes next must attach (or “mount”) that
PV to the user’s pod (which runs user code). Once this is accomplished, the user will have access to their
PV within JupyterHub. Note that this all happens under-the-hood and automatically when a user logs in.
PersistentVolumes are not deleted unless the
PersistentVolumeClaim is explicitly deleted by the JupyterHub administrator. When a user shuts down their server, their user pod is deleted and their volume is detached from the pod, but the
PV objects still exist. In the future, when the user logs back in, JupyterHub will detect that the user has a pre-existing
PVC and will simply attach it to their new pod, rather than creating a new
How can this process break down?
When Kubernetes uses the
PVC to create a new user
PV, it is sending a command to the underlying API of whatever cloud provider Kubernetes is running on. Occasionally, the request for a specific
PV might fail - for example, if your account has reached the limit in the amount of disk space available.
Another common issue is limits on the number of volumes that may be simultaneously attached to a node in your cluster. Check your cloud provider for details on the limits of storage resources you request.
Some cloud providers have a limited number of disks that can be attached to each node. Since JupyterHub allocates one disk per user for persistent storage, this limits the number of users that can be running in a node at any point of time. If you need users to have persistent storage, and you end up hitting this limit, you must use more nodes in order to accommodate the disk for each user. In this case, we recommend allocating fewer resources per node (e.g. RAM) since you’ll have fewer users packed onto a single node.
Most configuration for storage is done at the cluster level and is not unique to JupyterHub. However, some bits are, and we will demonstrate here how to configure those.
Note that new
PVCs for pre-existing users will not be created unless the old ones are destroyed. If you update your users’
PVC config via
config.yaml, then any new users will have the new
PVC created for them, but old users will not. To force an upgrade of the storage type for old users, you will need to manually delete their
kubectl --namespace=<your-namespace> delete pvc <pvc-name>). This will delete all of the user’s data so we recommend backing up their filesystem first if you want to retain their data.
After you delete the user’s
PVC, upon their next log-in a new
PVC will be created for them according to your updated
Type of storage provisioned
A StorageClass object is used to determine what kind of
PersistentVolumes are provisioned for your users. Most popular cloud providers have a
StorageClass marked as default. You can find out your default
StorageClass by doing:
kubectl get storageclass
and looking for the object with
(default) next to its name.
To change the kind of
PersistentVolumes provisioned for your users,
Create a new
StorageClassobject following the kubernetes documentation
Specify the name of the
StorageClassyou just created in
singleuser: storage: dynamic: storageClass: <storageclass-name>
Note that this will only affect new users who are logging in. We recommend you do this before users start heavily using your cluster.
We will provide examples for popular cloud providers here, but will generally defer to the Kubernetes documentation.
On Google Cloud, the default
StorageClass will provision Standard Google Persistent Disks. These run on Hard Disks. For more performance, you may want to use SSDs. To use SSDs, you can create a new
StorageClass by first putting the following
yaml into a new file. We recommend a descriptive name such as
storageclass.yaml, which we’ll use below:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: jupyterhub-user-ssd provisioner: kubernetes.io/gce-pd parameters: type: pd-ssd zones: <your-cluster-zone>
<your-cluster-zone> with the Zone in which you created your cluster (you can find this with
gcloud container clusters list).
Next, create this object by running
kubectl apply -f storageclass.yaml from the commandline. The Kubernetes Docs have more information on what the various fields mean. The most important field is
parameters.type, which specifies the type of storage you wish to use. The two options are:
pd-standardwill provision non-SSD disks.
Once you have created this
StorageClass, you can configure your JupyterHub’s
PVC template with the following in your
singleuser: storage: dynamic: storageClass: jupyterhub-user-ssd
Note that for
storageClass: we use the name that we specified above in
Size of storage provisioned
You can set the size of storage requested by JupyterHub in the
PVC in your
storage: capacity: 2Gi
This will request a
2Gi volume per user. The default requests a
10Gi volume per user.
We recommend you use the IEC Prefixes (Ki, Mi, Gi, etc) for specifying how much storage you want.
2Gi (IEC Prefix) is
(2 * 1024 * 1024 * 1024) bytes, while
2G (SI Prefix) is
(2 * 1000 * 1000 * 1000) bytes.
Turn off per-user persistent storage
If you do not wish for users to have any persistent storage, it can be turned off. Edit the
config.yaml file and set the storage type to
singleuser: storage: type: none
Next apply the changes.
After the changes are applied, new users will no longer be allocated a persistent
$HOME directory. Any currently running users will still have access to their storage until their server is restarted. You might have to manually delete current users’
kubectl to reclaim any cloud disks that might have allocated. You can get a current list of
kubectl --namespace=<your-namespace> get pvc
You can then delete the
PVCs you do not want with:
kubectl --namespace=<your-namespace> delete pvc <pvc-name>
Remember that deleting someone’s
PVCs will delete all their data, so do so with caution!