背景知识
同步多线程 (SMT) 允许在单个物理 CPU 内核上执行多个执行线程。该技术有许多不同的名称,例如超线程,但其操作原理相似。
下面文章介绍了英特尔超线程 (Hyper-Threading,HT)(SMT 的一种实现):
Intel® Hyper-Threading Technology
https://access.redhat.com/bounce/?externalURL=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Farchitecture-and-technology%2Fhyper-threading%2Fhyper-threading-technology.html
How to Determine the Effectiveness of Hyper-Threading Technology with an Application | Intel® Software
https://access.redhat.com/bounce/?externalURL=https%3A%2F%2Fsoftware.intel.com%2Fen-us%2Farticles%2Fhow-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application
查看系统 SMT
# lscpu | grep -e Socket -e Core -e ThreadThread(s) per core: 2Core(s) per socket: 6Socket(s): 2
以上说明系统指示有两个套接字,每个套接字有 6 个内核,每个内核有 2 个线程。拓扑结构如下:
+-------------------------------------------------------+-------------------------------------------------------+| Socket 1 | Socket 2 || +-------------------------+-------------------------+ | +-------------------------+-------------------------+ || | Core 1 | Core 2 | | | Core 1 | Core 2 | || +------------+-------------------------+------------+ | +------------+-------------------------+------------+ || | | | | | | | | | | | || | Thread 1 | Thread 2 | Thread 1 | Thread 2 | | | Thread 1 | Thread 2 | Thread 1 | Thread 2 | || | | | | | | | | | | | || +------------+-------------------------+------------+ | +------------+-------------------------+------------+ || | Core 3 | Core 4 | | | Core 3 | Core 4 | || +------------+-------------------------+------------+ | +------------+-------------------------+------------+ || | | | | | | | | | | | || | Thread 1 | Thread 2 | Thread 1 | Thread 2 | | | Thread 1 | Thread 2 | Thread 1 | Thread 2 | || | | | | | | | | | | | || +------------+-------------------------+------------+ | +------------+-------------------------+------------+ || | Core 5 | Core 6 | | | Core 5 | Core 6 | || +------------+-------------------------+------------+ | +------------+-------------------------+------------+ || | | | | | | | | | | | || | Thread 1 | Thread 2 | Thread 1 | Thread 2 | | | Thread 1 | Thread 2 | Thread 1 | Thread 2 | || | | | | | | | | | | | || +---------------------------------------------------+ | +---------------------------------------------------+ |+-------------------------------------------------------+-------------------------------------------------------+
有些时候我们期望关闭 SMT 来避免安全问题或者满足特定场景高性能的问题。
更多信息可以参考:
Simultaneous Multithreading in Red Hat Enterprise Linux
https://access.redhat.com/solutions/rhel-smt
现象
在做性能测试的过程中,我们手工关闭了 SMT,测试完成后又打开了 SMT:
echo "off" > /sys/devices/system/cpu/smt/control...echo "on" > /sys/devices/system/cpu/smt/control
我们会发现,相当于所有的容器进程只在 0-39 CPU 核心上运行了(实际上 CPU 的核心有 0-79)。
taskset -c -p 710238pid 710238's current affinity list: 0-39
查看 cpuset cgroup:
find /sys/fs/cgroup/cpuset/ -name cpuset.cpus -exec sh -c 'echo "{}: $(cat {})"' \;/sys/fs/cgroup/cpuset/kubepods/burstable/pod505efed2-1c9a-4d33-9a38-f69bf39c6764/81bced60d6fadd6d3841b9d96fd91bb11e39b3c8cb77bad113d328d871435bf8/cpuset.cpus: 0-39/sys/fs/cgroup/cpuset/kubepods/burstable/pod505efed2-1c9a-4d33-9a38-f69bf39c6764/cpuset.cpus: 0-39/sys/fs/cgroup/cpuset/kubepods/burstable/pod505efed2-1c9a-4d33-9a38-f69bf39c6764/7ec08faa73ad4650c2d3d0fcf19cbbbcb6353667c13e39638e4e2e1c1a23080f/cpuset.cpus: 0-39/sys/fs/cgroup/cpuset/kubepods/burstable/pod94dfc345-861f-4208-a973-b48a756a0cba/486e202b7d861498dcfc350950fb3d9bf4ca9df08a5a59dd08055544f23b6160/cpuset.cpus: 0-39/sys/fs/cgroup/cpuset/kubepods/burstable/pod94dfc345-861f-4208-a973-b48a756a0cba/9590f537207ae2316e6c2ad876372f15e83fe8e2fead2a1d084d2839b7d71f96/cpuset.cpus: 0-39/sys/fs/cgroup/cpuset/kubepods/burstable/pod94dfc345-861f-4208-a973-b48a756a0cba/cpuset.cpus: 0-39...问题排查
查看 CPU 基本信息
numactl --hardwareavailable: 2 nodes (0-1)node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59node 0 size: 128358 MBnode 0 free: 91406 MBnode 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79node 1 size: 128977 MBnode 1 free: 83435 MBnode distances:node 0 1 0: 10 21 1: 21 10查看 kubelet 和 docker 配置
Kubernetes 小技巧 - 通过 API 获取任意节点 kubelet 的配置,我们查看 CPU 的管理策略:
jq . configz | grep -i cpu "cpuManagerPolicy": "none",...#没有设置 CPU Manager 的策略
查看 Docker 的配置信息:
docker info | grep -i cpu CPUs: 80#CPU 数量也是正常的
问题复现
找了一个类似内核版本的环境,开关 SMT,问题复现了:
cat /sys/devices/system/cpu/smt/controlon find /sys/fs/cgroup/cpuset/ -name cpuset.cpus -exec sudo sh -c 'echo "{}: $(cat {})"' \;/sys/fs/cgroup/cpuset/kubepods/besteffort/pod5842b6be206acfeb7e57271393bfcd63/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kubepods/besteffort/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kubepods/burstable/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kubepods/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kube.slice/kubelet/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kube.slice/runtime/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/kube.slice/cpuset.cpus: 0-47/sys/fs/cgroup/cpuset/system.slice/cpuset.cpus:/sys/fs/cgroup/cpuset/cpuset.cpus: 0-47 #关闭 SMTecho "off" > /sys/devices/system/cpu/smt/control find /sys/fs/cgroup/cpuset/ -name cpuset.cpus -exec sudo sh -c 'echo "{}: $(cat {})"' \;/sys/fs/cgroup/cpuset/kubepods/besteffort/pod5842b6be206acfeb7e57271393bfcd63/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/besteffort/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/burstable/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/kubelet/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/runtime/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/system.slice/cpuset.cpus:/sys/fs/cgroup/cpuset/cpuset.cpus: 0-23 #打开 SMTecho "on" > /sys/devices/system/cpu/smt/control find /sys/fs/cgroup/cpuset/ -name cpuset.cpus -exec sudo sh -c 'echo "{}: $(cat {})"' \;/sys/fs/cgroup/cpuset/kubepods/besteffort/pod5842b6be206acfeb7e57271393bfcd63/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/besteffort/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/burstable/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kubepods/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/kubelet/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/runtime/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/kube.slice/cpuset.cpus: 0-23/sys/fs/cgroup/cpuset/system.slice/cpuset.cpus:/sys/fs/cgroup/cpuset/cpuset.cpus: 0-47#只有 /sys/fs/cgroup/cpuset/cpuset.cpus 恢复了
根因分析
我们可以看到,当关闭 SMT 的时候,cpuset 都发生了更新;当再次打开 SMT 的时候,只有 /sys/fs/cgroup/cpuset/cpuset.cpus 更新了,其他都没有更新。
相关讨论:https://lore.kernel.org/all/20200326201649.GQ162390@mtj.duckdns.org/,看起来 v1 cgroup 的行为是这样的,v2 cgroup 应该没有这个问题。
解决方案
开关 SMT 后,执行如下脚本:
sudo find /sys/fs/cgroup/cpuset/* -name cpuset.cpus -exec sudo sh -c 'echo "0-47" > {}' \;