文档章节

Kubernetes jobs:使用队列进行并行化处理

openthings
 openthings
发布于 2018/09/07 18:30
字数 1381
阅读 155
收藏 1

本例子中,将运行 Kubernetes Job进行多个并行工作处理,在给定的pod里执行。

每一个 pod一旦创建,就讲从队列中取出一个工作单元,然后重复,知道队列的结束。

Here is an overview of the steps in this example:

  1. Start a storage service to hold the work queue. In this example, we use Redis to store our work items. In the previous example, we used RabbitMQ. In this example, we use Redis and a custom work-queue client library because AMQP does not provide a good way for clients to detect when a finite-length work queue is empty. In practice you would set up a store such as Redis once and reuse it for the work queues of many jobs, and other things.
  2. Create a queue, and fill it with messages. Each message represents one task to be done. In this example, a message is just an integer that we will do a lengthy computation on.
  3. Start a Job that works on tasks from the queue. The Job starts several pods. Each pod takes one task from the message queue, processes it, and repeats until the end of the queue is reached.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. If you do not already have a cluster, you can create one by using Minikube, or you can use one of these Kubernetes playgrounds:

To check the version, enter kubectl version.

Starting Redis

For this example, for simplicity, we will start a single instance of Redis. See the Redis Example for an example of deploying Redis scalably and redundantly.

If you are working from the website source tree, you can go to the following directory and start a temporary Pod running Redis and a service so we can find it.

$ cd content/en/examples/application/job/redis
$ kubectl create -f ./redis-pod.yaml
pod/redis-master created
$ kubectl create -f ./redis-service.yaml
service/redis created

If you’re not working from the source tree, you could also download the following files directly:

Filling the Queue with tasks

Now let’s fill the queue with some “tasks”. In our example, our tasks are just strings to be printed.

Start a temporary interactive pod for running the Redis CLI.

$ kubectl run -i --tty temp --image redis --command "/bin/sh"
Waiting for pod default/redis2-c7h78 to be running, status is Pending, pod ready: false
Hit enter for command prompt

Now hit enter, start the redis CLI, and create a list with some work items in it.

# redis-cli -h redis
redis:6379> rpush job2 "apple"
(integer) 1
redis:6379> rpush job2 "banana"
(integer) 2
redis:6379> rpush job2 "cherry"
(integer) 3
redis:6379> rpush job2 "date"
(integer) 4
redis:6379> rpush job2 "fig"
(integer) 5
redis:6379> rpush job2 "grape"
(integer) 6
redis:6379> rpush job2 "lemon"
(integer) 7
redis:6379> rpush job2 "melon"
(integer) 8
redis:6379> rpush job2 "orange"
(integer) 9
redis:6379> lrange job2 0 -1
1) "apple"
2) "banana"
3) "cherry"
4) "date"
5) "fig"
6) "grape"
7) "lemon"
8) "melon"
9) "orange"

So, the list with key job2 will be our work queue.

Note: if you do not have Kube DNS setup correctly, you may need to change the first step of the above block to redis-cli -h $REDIS_SERVICE_HOST.

Create an Image

Now we are ready to create an image that we will run.

We will use a python worker program with a redis client to read the messages from the message queue.

A simple Redis work queue client library is provided, called rediswq.py (Download).

The “worker” program in each Pod of the Job uses the work queue client library to get work. Here it is:

application/job/redis/worker.py
#!/usr/bin/env python

import time
import rediswq

host="redis"
# Uncomment next two lines if you do not have Kube-DNS working.
# import os
# host = os.getenv("REDIS_SERVICE_HOST")

q = rediswq.RedisWQ(name="job2", host="redis")
print("Worker with sessionID: " +  q.sessionID())
print("Initial queue state: empty=" + str(q.empty()))
while not q.empty():
  item = q.lease(lease_secs=10, block=True, timeout=2) 
  if item is not None:
    itemstr = item.decode("utf=8")
    print("Working on " + itemstr)
    time.sleep(10) # Put your actual work here instead of sleep.
    q.complete(item)
  else:
    print("Waiting for work")
print("Queue empty, exiting")

If you are working from the source tree, change directory to the content/en/examples/application/job/redis/ directory. Otherwise, download worker.py, rediswq.py, and Dockerfile files, then build the image:

docker build -t job-wq-2 .

Push the image

For the Docker Hub, tag your app image with your username and push to the Hub with the below commands. Replace <username> with your Hub username.

docker tag job-wq-2 <username>/job-wq-2
docker push <username>/job-wq-2

You need to push to a public repository or configure your cluster to be able to access your private repository.

If you are using Google Container Registry, tag your app image with your project ID, and push to GCR. Replace <project> with your project ID.

docker tag job-wq-2 gcr.io/<project>/job-wq-2
gcloud docker -- push gcr.io/<project>/job-wq-2

Defining a Job

Here is the job definition:

application/job/redis/job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: job-wq-2
spec:
  parallelism: 2
  template:
    metadata:
      name: job-wq-2
    spec:
      containers:
      - name: c
        image: gcr.io/myproject/job-wq-2
      restartPolicy: OnFailure

Be sure to edit the job template to change gcr.io/myproject to your own path.

In this example, each pod works on several items from the queue and then exits when there are no more items. Since the workers themselves detect when the workqueue is empty, and the Job controller does not know about the workqueue, it relies on the workers to signal when they are done working. The workers signal that the queue is empty by exiting with success. So, as soon as any worker exits with success, the controller knows the work is done, and the Pods will exit soon. So, we set the completion count of the Job to 1. The job controller will wait for the other pods to complete too.

Running the Job

So, now run the Job:

kubectl create -f ./job.yaml

Now wait a bit, then check on the job.

$ kubectl describe jobs/job-wq-2
Name:             job-wq-2
Namespace:        default
Selector:         controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
Labels:           controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
                  job-name=job-wq-2
Annotations:      <none>
Parallelism:      2
Completions:      <unset>
Start Time:       Mon, 11 Jan 2016 17:07:59 -0800
Pods Statuses:    1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=b1c7e4e3-92e1-11e7-b85e-fa163ee3c11f
                job-name=job-wq-2
  Containers:
   c:
    Image:              gcr.io/exampleproject/job-wq-2
    Port:
    Environment:        <none>
    Mounts:             <none>
  Volumes:              <none>
Events:
  FirstSeen    LastSeen    Count    From            SubobjectPath    Type        Reason            Message
  ---------    --------    -----    ----            -------------    --------    ------            -------
  33s          33s         1        {job-controller }                Normal      SuccessfulCreate  Created pod: job-wq-2-lglf8


$ kubectl logs pods/job-wq-2-7r7b2
Worker with sessionID: bbd72d0a-9e5c-4dd6-abf6-416cc267991f
Initial queue state: empty=False
Working on banana
Working on date
Working on lemon

As you can see, one of our pods worked on several work units.

Alternatives

If running a queue service or modifying your containers to use a work queue is inconvenient, you may want to consider one of the other job patterns.

If you have a continuous stream of background processing work to run, then consider running your background workers with a replicationController instead, and consider running a background processing library such as https://github.com/resque/resque.

© 著作权归作者所有

openthings
粉丝 324
博文 1140
码字总数 689435
作品 1
东城
架构师
私信 提问
Kubernetes jobs:使用rsync建立定期备份任务

Kubernetes中的job和cronjob可用于批处理和定时任务。这里,我们使用其建立集群级别的文件备份机制。 关于Kubernetes中的job和cronjob使用,请参考: Kubernetes Jobs - 运行处理任务指南 Ku...

openthings
2018/09/09
80
0
Kubernetes jobs:使用模版进行并行处理

Kubernetes job并行处理,使用模版展开 在这个例子里,我们将运行多个Kubernetes Jobs,配置文件从同一个模版展开,然后创建多个jobs任务。 你需要首先了解 Jobs 的用法。 Kubernetes中的Job...

openthings
2018/09/07
103
0
谷歌生产级别开源容器调度和管理工具Kubernetes学习二

Batch Jobs Jobs 一个job创建一个或者更多的pods并确保指定数量的pods成功终止。当所有的pods成功结束,job会追踪这个过程。当成功结束数量达到指定数量,job本身结束。删除一个Job会清理由它...

自由linux
2017/01/27
0
0
基于 Kubernetes 的 Jenkins 构建集群实践

在大型团队的 CI 构建里具有丰富最佳实践的经验。今天我给大家分享的更多是聚焦在 Jenkins 本身,结合我在 Jenkins 实际使用过程中和整个 Jenkins Slave 管理演化的过程的案例,这样能给大家...

店家小二
2018/12/14
0
0
MySQL · 功能分析 · 5.6 并行复制实现分析

背景 我们知道MySQL的主备同步是通过binlog在备库重放进行的,IO线程把主库binlog拉过去存入relaylog,然后SQL线程重放 relaylog 中的event,然而这种模式有一个问题就是SQL线程只有一个,在...

阿里云RDS-数据库内核组
2015/08/10
0
0

没有更多内容

加载失败,请刷新页面

加载更多

c++ 虚基类

c++ 虚基类 p556

天王盖地虎626
14分钟前
10
0
Java中的面向对象

一、面向对象 面向对象和面向过程的区别 过程就是函数,就是写方法,就是方法的一种实现。 对象就是将函数,属性的一种封装。用人们思考习惯的方式思考问题。 如何自定义类 修饰符 类名{ //成...

zhiruochujian
22分钟前
3
0
k8s删除Terminating状态的命名空间

背景: 我们都知道在k8s中namespace有两种常见的状态,即Active和Terminating状态,其中后者一般会比较少见,只有当对应的命名空间下还存在运行的资源,但是该命名空间被删除时才会出现所谓的...

Andy-xu
25分钟前
15
0
seata源码阅读笔记

seata源码阅读笔记 本文没有seata的使用方法,怎么使用seata可以参考官方示例,详细的很。 本文基于v0.8.0版本,本文没贴代码。 seata中的三个重要部分: TC:事务协调器,维护全局事务和分支...

东都大狼狗
38分钟前
5
0
Rust:最小化窗口后 CPU占用率高 (winit,glutin,imgui-rust)

最近试着用 imgui-rust 绘制界面,发现窗口最小化后CPU占用会增大。 查询的资料如下: https://github.com/rust-windowing/winit/issues/783 https://github.com/ocornut/imgui/issues/1151 ...

reter
42分钟前
19
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部