kubernetes - Kubernetes:为cpumanager指定CPU

标签 kubernetes numa

是否可以为Kubernetes cpumanager指定CPU ID列表?目的是确保Pod从单个插槽(0)获得CPU。我将对等套接字上的所有CPU脱机,如here所述,例如:

$ echo 0 > /sys/devices/system/cpu/cpu5/online

完成此操作后,Kubernetes主站确实会看到剩余的在线CPU
kubectl describe node foo
Capacity:
 cpu:                56   <<< socket 0 CPU count
 ephemeral-storage:  958774760Ki
 hugepages-1Gi:      120Gi
 memory:             197524872Ki
 pods:               110
Allocatable:
 cpu:                54    <<< 2 system reserved CPUs
 ephemeral-storage:  958774760Ki
 hugepages-1Gi:      120Gi
 memory:             71490952Ki
 pods:               110
System Info:
 Machine ID:                 1155420082478559980231ba5bc0f6f2
 System UUID:                4C4C4544-0044-4210-8031-C8C04F584B32
 Boot ID:                    7fa18227-748f-496c-968c-9fc82e21ecd5
 Kernel Version:             4.4.13
 OS Image:                   Ubuntu 16.04.4 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.3
 Kubelet Version:            v1.11.1
 Kube-Proxy Version:         v1.11.1

但是,cpumanager似乎仍然认为有112个CPU(socket0 + socket1)。
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}

结果,kubelet系统 pods 抛出以下错误:
kube-system     kube-proxy-nk7gc                       0/1       rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110   762        36d       <IP address>   foo      <none>

最佳答案

我能够使它工作。将其发布为答案,以便有需要的人可以从中受益。

似乎从/var/lib/kubelet/cpu_manager_state文件中读取了CPU集,并且在kubelet重新启动时未对其进行更新。因此,需要在重新启动kubelet之前删除此文件。

以下为我工作:

# On a running worker node, bring desired CPUs offline. (run as root)

$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service

# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state

# Try creating pods and check the syslog:
Dec  3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466    8070 state_mem.go:84]     [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec  3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643    8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec  3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660    8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")

我报告了bug here,因为我认为在kubelet重新启动后应该更新CPU集。

关于kubernetes - Kubernetes:为cpumanager指定CPU,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53564738/

相关文章:

go - Helm : How to convert a string into lower/upper case inside if-else condition?

caching - Kubernetes中的Redis-Sidecar或客户端-服务器模型?

c++ - 获取 NUMA 系统中的节点距离(跳数)

C++ NUMA 优化

kubernetes - 尽管 --disk-size 标志,Minikube 空间不足并失败

kubernetes - 如何在 kubernetes 中切换命名空间

python - 没有这样的文件或目录 : . ../part.0.parquet

Linux 内核 : get information of page cache distribution over NUMA nodes

multithreading - OpenMP:基于 NUMA 的拆分循环

c - 在 NUMA 架构中按线程移动内存页