文档章节

Kubernetes Resource QoS机制解读

WaltonWang
 WaltonWang
发布于 2017/02/13 20:34
字数 1764
阅读 833
收藏 4

Kubernetes Resource QoS Classes介绍

Kubernetes根据Pod中Containers Resource的requestlimit的值来定义Pod的QoS Class。

对于每一种Resource都可以将容器分为3中QoS Classes: Guaranteed, Burstable, and Best-Effort,它们的QoS级别依次递减。

  • Guaranteed 如果Pod中所有Container的所有Resource的limitrequest都相等且不为0,则这个Pod的QoS Class就是Guaranteed。

注意,如果一个容器只指明了limit,而未指明request,则表明request的值等于limit的值。

Examples:

containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
    name: bar
        resources:
            limits:
                cpu: 100m
                memory: 100Mi
containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar
        resources:
            limits:
                cpu: 100m
                memory: 100Mi
            requests:
                cpu: 100m
                memory: 100Mi
  • Best-Effort 如果Pod中所有容器的所有Resource的request和limit都没有赋值,则这个Pod的QoS Class就是Best-Effort.

Examples:

containers:
    name: foo
        resources:
    name: bar
        resources:
  • Burstable 除了符合Guaranteed和Best-Effort的场景,其他场景的Pod QoS Class都属于Burstable。

当limit值未指定时,其有效值其实是对应Node Resource的Capacity。

Examples:

容器bar没有对Resource进行指定。

containers:
    name: foo
        resources:
            limits:
                cpu: 10m
                memory: 1Gi
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar

容器foobar对不同的Resource进行了指定。

containers:
    name: foo
        resources:
            limits:
                memory: 1Gi

    name: bar
        resources:
            limits:
                cpu: 100m

容器foo未指定limit,容器bar未指定request和limit。

containers:
    name: foo
        resources:
            requests:
                cpu: 10m
                memory: 1Gi

    name: bar

可压缩/不可压缩资源的区别

kube-scheduler调度时,是基于Pod的request值进行Node Select完成调度的。Pod和它的所有Container都不允许Consume limit指定的有效值(if have)。

How the request and limit are enforced depends on whether the resource is compressible or incompressible.

Compressible Resource Guarantees

  • For now, we are only supporting CPU.
  • Pods are guaranteed to get the amount of CPU they request, they may or may not get additional CPU time (depending on the other jobs running). This isn't fully guaranteed today because cpu isolation is at the container level. Pod level cgroups will be introduced soon to achieve this goal.
  • Excess CPU resources will be distributed based on the amount of CPU requested. For example, suppose container A requests for 600 milli CPUs, and container B requests for 300 milli CPUs. Suppose that both containers are trying to use as much CPU as they can. Then the extra 10 milli CPUs will be distributed to A and B in a 2:1 ratio (implementation discussed in later sections).
  • Pods will be throttled if they exceed their limit. If limit is unspecified, then the pods can use excess CPU when available.

Incompressible Resource Guarantees

  • For now, we are only supporting memory.
  • Pods will get the amount of memory they request, if they exceed their memory request, they could be killed (if some other pod needs memory), but if pods consume less memory than requested, they will not be killed (except in cases where system tasks or daemons need more memory).
  • When Pods use more memory than their limit, a process that is using the most amount of memory, inside one of the pod's containers, will be killed by the kernel.

Admission/Scheduling Policy

  • Pods will be admitted by Kubelet & scheduled by the scheduler based on the sum of requests of its containers. The scheduler & kubelet will ensure that sum of requests of all containers is within the node's allocatable capacity (for both memory and CPU).

如何根据不同的QoS回收Resources

  • CPU Pods will not be killed if CPU guarantees cannot be met (for example if system tasks or daemons take up lots of CPU), they will be temporarily throttled.

  • Memory Memory is an incompressible resource and so let's discuss the semantics of memory management a bit.

    • Best-Effort pods will be treated as lowest priority. Processes in these pods are the first to get killed if the system runs out of memory. These containers can use any amount of free memory in the node though.

    • Guaranteed pods are considered top-priority and are guaranteed to not be killed until they exceed their limits, or if the system is under memory pressure and there are no lower priority containers that can be evicted.

    • Burstable pods have some form of minimal resource guarantee, but can use more resources when available. Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist.

OOM Score configuration at the Nodes

Pod OOM score configuration

  • Note that the OOM score of a process is 10 times the % of memory the process consumes, adjusted by OOM_SCORE_ADJ, barring exceptions (e.g. process is launched by root). Processes with higher OOM scores are killed.
  • The base OOM score is between 0 and 1000, so if process A’s OOM_SCORE_ADJ - process B’s OOM_SCORE_ADJ is over a 1000, then process A will always be OOM killed before B.
  • The final OOM score of a process is also between 0 and 1000

Best-effort

  • Set OOM_SCORE_ADJ: 1000
  • So processes in best-effort containers will have an OOM_SCORE of 1000

Guaranteed

  • Set OOM_SCORE_ADJ: -998
  • So processes in guaranteed containers will have an OOM_SCORE of 0 or 1

Burstable

  • If total memory request > 99.8% of available memory, OOM_SCORE_ADJ: 2
  • Otherwise, set OOM_SCORE_ADJ to 1000 - 10 * (% of memory requested)
  • This ensures that the OOM_SCORE of burstable pod is > 1
  • If memory request is 0, OOM_SCORE_ADJ is set to 999.
  • So burstable pods will be killed if they conflict with guaranteed pods
  • If a burstable pod uses less memory than requested, its OOM_SCORE < 1000
  • So best-effort pods will be killed if they conflict with burstable pods using less than requested memory
  • If a process in burstable pod's container uses more memory than what the container had requested, its OOM_SCORE will be 1000, if not its OOM_SCORE will be < 1000
  • Assuming that a container typically has a single big process, if a burstable pod's container that uses more memory than requested conflicts with another burstable pod's container using less memory than requested, the former will be killed
  • If burstable pod's containers with multiple processes conflict, then the formula for OOM scores is a heuristic, it will not ensure "Request and Limit" guarantees.

Pod infra containers or Special Pod init process

  • OOM_SCORE_ADJ: -998

Kubelet, Docker

  • OOM_SCORE_ADJ: -999 (won’t be OOM killed)
  • Hack, because these critical tasks might die if they conflict with guaranteed containers. In the future, we should place all user-pods into a separate cgroup, and set a limit on the memory they can consume.

源码分析

QoS的源码位于:pkg/kubelet/qos,代码非常简单,主要就两个文件pkg/kubelet/qos/policy.go,pkg/kubelet/qos/qos.go

上面讨论的各个QoS Class对应的OOM_SCORE_ADJ定义在:

pkg/kubelet/qos/policy.go:21

const (
	PodInfraOOMAdj        int = -998
	KubeletOOMScoreAdj    int = -999
	DockerOOMScoreAdj     int = -999
	KubeProxyOOMScoreAdj  int = -999
	guaranteedOOMScoreAdj int = -998
	besteffortOOMScoreAdj int = 1000
)

容器的OOM_SCORE_ADJ的计算方法定义在:

pkg/kubelet/qos/policy.go:40

func GetContainerOOMScoreAdjust(pod *v1.Pod, container *v1.Container, memoryCapacity int64) int {
	switch GetPodQOS(pod) {
	case Guaranteed:
		// Guaranteed containers should be the last to get killed.
		return guaranteedOOMScoreAdj
	case BestEffort:
		return besteffortOOMScoreAdj
	}

	// Burstable containers are a middle tier, between Guaranteed and Best-Effort. Ideally,
	// we want to protect Burstable containers that consume less memory than requested.
	// The formula below is a heuristic. A container requesting for 10% of a system's
	// memory will have an OOM score adjust of 900. If a process in container Y
	// uses over 10% of memory, its OOM score will be 1000. The idea is that containers
	// which use more than their request will have an OOM score of 1000 and will be prime
	// targets for OOM kills.
	// Note that this is a heuristic, it won't work if a container has many small processes.
	memoryRequest := container.Resources.Requests.Memory().Value()
	oomScoreAdjust := 1000 - (1000*memoryRequest)/memoryCapacity
	// A guaranteed pod using 100% of memory can have an OOM score of 10. Ensure
	// that burstable pods have a higher OOM score adjustment.
	if int(oomScoreAdjust) < (1000 + guaranteedOOMScoreAdj) {
		return (1000 + guaranteedOOMScoreAdj)
	}
	// Give burstable pods a higher chance of survival over besteffort pods.
	if int(oomScoreAdjust) == besteffortOOMScoreAdj {
		return int(oomScoreAdjust - 1)
	}
	return int(oomScoreAdjust)
}

获取Pod的QoS Class的方法为:

pkg/kubelet/qos/qos.go:50

// GetPodQOS returns the QoS class of a pod.
// A pod is besteffort if none of its containers have specified any requests or limits.
// A pod is guaranteed only when requests and limits are specified for all the containers and they are equal.
// A pod is burstable if limits and requests do not match across all containers.
func GetPodQOS(pod *v1.Pod) QOSClass {
	requests := v1.ResourceList{}
	limits := v1.ResourceList{}
	zeroQuantity := resource.MustParse("0")
	isGuaranteed := true
	for _, container := range pod.Spec.Containers {
		// process requests
		for name, quantity := range container.Resources.Requests {
			if !supportedQoSComputeResources.Has(string(name)) {
				continue
			}
			if quantity.Cmp(zeroQuantity) == 1 {
				delta := quantity.Copy()
				if _, exists := requests[name]; !exists {
					requests[name] = *delta
				} else {
					delta.Add(requests[name])
					requests[name] = *delta
				}
			}
		}
		// process limits
		qosLimitsFound := sets.NewString()
		for name, quantity := range container.Resources.Limits {
			if !supportedQoSComputeResources.Has(string(name)) {
				continue
			}
			if quantity.Cmp(zeroQuantity) == 1 {
				qosLimitsFound.Insert(string(name))
				delta := quantity.Copy()
				if _, exists := limits[name]; !exists {
					limits[name] = *delta
				} else {
					delta.Add(limits[name])
					limits[name] = *delta
				}
			}
		}

		if len(qosLimitsFound) != len(supportedQoSComputeResources) {
			isGuaranteed = false
		}
	}
	if len(requests) == 0 && len(limits) == 0 {
		return BestEffort
	}
	// Check is requests match limits for all resources.
	if isGuaranteed {
		for name, req := range requests {
			if lim, exists := limits[name]; !exists || lim.Cmp(req) != 0 {
				isGuaranteed = false
				break
			}
		}
	}
	if isGuaranteed &&
		len(requests) == len(limits) {
		return Guaranteed
	}
	return Burstable
}

PodQoS会在eviction_manager和scheduler的Predicates阶段被调用,也就说会在k8s处理超配和调度预选阶段中被使用。

输入图片说明

© 著作权归作者所有

共有 人打赏支持
WaltonWang
粉丝 196
博文 100
码字总数 207940
作品 0
深圳
程序员
私信 提问
深入解析 kubernetes 资源管理,容器云牛人有话说

资源,无论是计算资源、存储资源、网络资源,对于容器管理调度平台来说都是需要关注的一个核心问题。 • 如何对资源进行抽象、定义? • 如何确认实际可以使用的资源量? • 如何为容器分配它...

七牛云
06/25
0
0
Kubernetes1.3:QoS服务质量管理

Kubernetes1.3:QoS服务质量管理 在kubernetes中,每个POD都有个QoS标记,通过这个Qos标记来对POD进行服务质量管理。QoS的英文全称为"Quality of Service",中文名为"服务质量",它取决于用户...

xiaomin0322
04/18
0
0
如何设置 Kubernetes 资源限制

Kubernetes 作为当下最流行的的容器集群管理平台,需要统筹集群整体的资源使用情况,将合适的资源分配给pod容器使用,既要保证充分利用资源,提高资源利用率,又要保证重要容器在运行周期内能...

萧元
07/10
0
0
文档解读 | K8S中的Pod和容器配置(二)

接上篇,如何给运行在Kubernetes Pod中的容器分配CPU和RAM资源? 给运行在Kubernetes Pod中的容器分配CPU和RAM资源 开始之前 必须有一个Kubernetes集群,和一个能和集群沟通的kubectl命令行工...

店家小二
昨天
0
0
正确配置Kubelet可一定程度防止K8S集群雪崩

Kubelet Node Allocatable Kubelet Node Allocatable用来为Kube组件和System进程预留资源,从而保证当节点出现满负荷时也能保证Kube和System进程有足够的资源。 目前支持cpu, memory, epheme...

清侠
07/03
0
0

没有更多内容

加载失败,请刷新页面

加载更多

jquery通过id显示隐藏

var $div3 = $('#div3'); 显示 $div3.show(); 隐藏 $div3.hide();

yan_liu
今天
3
0
《乱世佳人》读书笔记及相关感悟3900字

《乱世佳人》读书笔记及相关感悟3900字: 之前一直听「荔枝」,后来不知怎的转向了「喜马拉雅」,一听就是三年。上班的时候听房产,买房了以后听装修,兴之所至时听旅行,分手后听亲密关系,...

原创小博客
今天
3
0
大数据教程(9.6)map端join实现

上一篇文章讲了mapreduce配合实现join,本节博主将讲述在map端的join实现; 一、需求 实现两个“表”的join操作,其中一个表数据量小,一个表很大,这种场景在实际中非常常见,比如“订单日志...

em_aaron
今天
3
0
cookie与session详解

session与cookie是什么? session与cookie属于一种会话控制技术.常用在身份识别,登录验证,数据传输等.举个例子,就像我们去超市买东西结账的时候,我们要拿出我们的会员卡才会获取优惠.这时...

士兵7
今天
3
0
十万个为什么之为什么大家都说dubbo

Dubbo是什么? 使用背景 dubbo为什么这么流行, 为什么大家都这么喜欢用dubbo; 通过了解分布式开发了解到, 为适应访问量暴增,业务拆分后, 子应用部署在多台服务器上,而多台服务器通过可以通过d...

尾生
今天
5
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部