一 问题描述
开发环境的一个服务的pod状态非Running,而是Evicted。如下:1
2
3
4
5[root@dev-k8s-master ~]# kubectl -n xxx-dev get pods | grep Evicted
xxx-server-84985cd4fc-f2q7s 0/1 Evicted 0 3d8h
xxx-server-84985cd4fc-gdb6f 0/1 Evicted 0 3d8h
xxx-server-84985cd4fc-mwbjt 0/1 Evicted 0 3d8h
xxx-server-84985cd4fc-nvllw 0/1 Evicted 0 3d8h
二 问题分析
通过k8s的describe命令查看Evicted的原因1
2
3
4[root@dev-k8s-master ~]# kubectl -n xxx-dev describe pod xxx-server-84985cd4fc-f2q7s
...
Message: Pod The node had condition: [DiskPressure].
...
从Message可知,是磁盘满了,导致的Evicted。
然后查看这个服务对应的node的磁盘情况1
2
3
4
5
6
7
8[root@dev-k8s-worker01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 11G 27G 30% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 1.7M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
...
这里看,磁盘是没满的,原因是Evicted之后,服务对应的容器已经释放了磁盘。
该服务在worker01节点多次尝试之后,起不来,就自动换了worker02节点起来,所以去看worker02节点的磁盘情况。1
2
3
4
5
6
7
8[root@dev-k8s-worker02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 24G 14G 64% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 1.9M 7.8G 1% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
...
磁盘使用的确实比其他节点多的多。
通过以下命令,查看容器日志使用磁盘情况1
2
3
4
5
6
7
8[root@dev-k8s-worker02 ~]# for name in $(docker ps -a | awk '{print $1}' | grep -v CONTAINER); do docker inspect $name | grep LogPath | awk '{print $NF}' | tr -d '",' |xargs du -sh;done
...
5.0M /var/lib/docker/containers/732326284c8a976b46284abf8ccbf1c28947824087d92b1e0d30c893415a95eb/732326284c8a976b46284abf8ccbf1c28947824087d92b1e0d30c893415a95eb-json.log
0 /var/lib/docker/containers/8e0ff02afa97851f330b31c08668f94c775a509360ebca6a200fa7c8b0fffb41/8e0ff02afa97851f330b31c08668f94c775a509360ebca6a200fa7c8b0fffb41-json.log
18G /var/lib/docker/containers/7812d610353cf7aa96a8e1f0266a47160af0258dfd410807f9adc44d0548bfae/7812d610353cf7aa96a8e1f0266a47160af0258dfd410807f9adc44d0548bfae-json.log
0 /var/lib/docker/containers/4f17f7fb5b6e2a983b62408de26843877c5a4107ff792fe85ba10d0ed6103eef/4f17f7fb5b6e2a983b62408de26843877c5a4107ff792fe85ba10d0ed6103eef-json.log
12K /var/lib/docker/containers/12d603e49d4b6470b286cf060f6fec8124c89b00d993e88f82c9af701b76f5fa/12d603e49d4b6470b286cf060f6fec8124c89b00d993e88f82c9af701b76f5fa-json.log
...
可以看出其中一个的日志已有18G,明显大大高出其他容器日志大小非常多,查看该日志对应的容器。1
2[root@dev-k8s-worker02 ~]# docker ps | grep 7812
7812d610353c registry.xxx.com/lib-server/xxx-server "/entrypoint.sh" 3 days ago Up 3 days k8s_xxx-server_xxx-server-84985cd4fc-sh9m6_xxx-dev_f50e00e6-f536-4e3e-9a45-1358e20f1f9a_0
可见,正是该服务。
三 问题解决
删除掉Evicted状态的pod1
kubectl -n xxx-dev get pods | grep Evicted |awk '{print$1}'| xargs kubectl -n xxx-dev delete pod
治标
直接清空日志1
cat /dev/null > /var/lib/docker/containers/容器id/容器id-json.log
治本
每个worker节点修改/etc/docker/daemon.json
配置并重启1
2
3
4
5
6
7
8[root@dev-k8s-worker01 ~]# vim /etc/docker/daemon.json
{
"log-driver":"json-file",
"log-opts": {"max-size":"500m", "max-file":"3"}
}
[root@dev-k8s-worker01 ~]# systemctl daemon-reload
[root@dev-k8s-worker01 ~]# systemctl restart docker
max-size : 500m,单个容器日志大小上限500M
max-file:3,单个容器最多有三个日志,分别是id+.json、id+1.json、id+2.json