一个专注于IT互联网运维的技术博客

Kubeadm创建Kubernetes集群

2019.05.19

1、准备安装环境

安装要求:

  • 每台主机必须有 至少 2核 CPU 和 至少 2G 的内存;
  • 所有主机之间网络连通;
  • 每台主机必须有唯一的 主机名、MAC 地址和 product_uuid;
  • kubelet 要求关闭 Swap 才能正常工作。

这里使用 Vagrant 创建 3 台 2核 4G 内存的 CentOS7 虚拟机,1个 master 节点,2个 worker 节点:

[wedot@dx142 kubeadm]$ cat > Vagrantfile <<EOF
Vagrant.configure("2") do |config|
  config.vm.define "master01" do |node|
    node.vm.box = "centos/7"
    node.vm.box_check_update = false
    node.vm.hostname = "master01"
    node.vm.network "private_network", ip: "192.168.81.101"
    node.vm.provision "shell", path: "post-deploy.sh" ,run: "always"
    node.vm.provider "virtualbox" do |vbox|
      vbox.cpus = 2
      vbox.memory = 4096
    end
  end
  (01..02).each do |i|
    config.vm.define "node0#{i}" do |node|
      node.vm.box = "centos/7"
      node.vm.box_check_update = false
      node.vm.hostname = "node#{i}"
      node.vm.network "private_network", ip: "192.168.81.20#{i}"
      node.vm.provision "shell", path: "post-deploy.sh" ,run: "always"
      node.vm.provider "virtualbox" do |vbox|
        vbox.cpus = 2
        vbox.memory = 4096
      end
    end
  end
end
EOF
[wedot@dx142 kubeadm]$ cat > post-deploy.sh <<"EOF"
#!/bin/bash
value=$( grep -ic "entry" /etc/hosts )
if [ $value -eq 0 ]
then
echo "
################ kubernetes host entry ############
192.168.31.101  master-001
192.168.31.102  master-002
192.168.31.103  master-003
######################################################
" >> /etc/hosts
fi
if [ -e /etc/redhat-release ]
then
  nmcli connection up System\ enp0s8
fi
EOF
[wedot@dx142 kubeadm]$ vagrant up
  • 因为Vagrant配置的private网络“System enp0s8”无故被 CentOS7 替换了(尚未找到原因),使用脚本手动 up 一下。

检查是否满足安装要求,每台主机必须都拥有唯一的 主机名、MAC 地址和 product_uuid,以 master01 为例:

[wedot@dx142 kubeadm]$ vagrant ssh master01
[vagrant@master01 ~]$ su - root
Password: 
Last login: Sun Apr  5 19:05:17 EEST 2015 on tty1
[root@master01 ~]# hostname
master01
[root@master01 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:c5:46:4e brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
       valid_lft 86064sec preferred_lft 86064sec
    inet6 fe80::a00:27ff:fec5:464e/64 scope link 
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:ac:d6:08 brd ff:ff:ff:ff:ff:ff
    inet 192.168.81.101/24 brd 192.168.81.255 scope global enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:feac:d608/64 scope link 
       valid_lft forever preferred_lft forever
[root@master01 ~]# cat /sys/class/dmi/id/product_uuid
0F1AF63D-DBAD-4EAC-A600-BDC7F5955267

关闭 Swap,所有主机都需要操作:

[root@master01 ~]# swapoff -a
[root@master01 ~]# sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

2、安装容器运行时 Docker

所有主机都需要安装容器运行时,这里以 master01 为例:

下载和安装 docker-ce

配置 Docker 的 Yum 源,这里使用阿里云的 Docker Yum 源:

[root@master01 ~]# cat > /etc/yum.repos.d/docker.repo <<EOF
[docker-main]
name=Docker Repository
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/7/x86_64/stable/
enabled=1
gpgcheck=0
EOF

安装 kubernetes v1.14 推荐的 Docker 18.06:

[root@master01 ~]# yum install docker-ce-18.06.3.ce-3.el7.x86_64 -y

启动 docker.service

[root@master01 ~]# systemctl start docker

遇到的问题:Error while creating filesystem xfs on device docker-253:1-17291963- base: exit status 1

[root@master01 ~]# systemctl start docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@master01 ~]# journalctl -u docker -n 20|more
-- Logs begin at Sun 2019-05-19 16:10:43 EEST, end at Sun 2019-05-19 17:00:07 EEST. --
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092825521+03:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=g
rpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092883548+03:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/do
cker/containerd/docker-containerd.sock 0  <nil>}]" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092899441+03:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.092935503+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201f37e0, CONNECTIN
G" module=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.093091109+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201f37e0, READY" mo
dule=grpc
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.101868083+03:00" level=warning msg="Usage of loopback devices is strongly discouraged for production 
use. Please use `--storage-opt dm.thinpooldev` or use `man dockerd` to refer to dm.thinpooldev section." storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.338689606+03:00" level=info msg="Creating filesystem xfs on device docker-253:1-17291963-base, mkfs a
rgs: [-m crc=0,finobt=0 /dev/mapper/docker-253:1-17291963-base]" storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.339831716+03:00" level=info msg="Error while creating filesystem xfs on device docker-253:1-17291963-
base: exit status 1" storage-driver=devicemapper
May 19 17:00:07 master01 dockerd[7057]: time="2019-05-19T17:00:07.339856098+03:00" level=error msg="[graphdriver] prior storage driver devicemapper failed: exit status
 1"
May 19 17:00:07 master01 dockerd[7057]: Error starting daemon: error initializing graphdriver: exit status 1
May 19 17:00:07 master01 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
May 19 17:00:07 master01 systemd[1]: Failed to start Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: Unit docker.service entered failed state.
May 19 17:00:07 master01 systemd[1]: docker.service failed.
May 19 17:00:07 master01 systemd[1]: docker.service holdoff time over, scheduling restart.
May 19 17:00:07 master01 systemd[1]: Stopped Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: start request repeated too quickly for docker.service
May 19 17:00:07 master01 systemd[1]: Failed to start Docker Application Container Engine.
May 19 17:00:07 master01 systemd[1]: Unit docker.service entered failed state.
May 19 17:00:07 master01 systemd[1]: docker.service failed.

手动使用mkfs.xfs创建文件系统,看看具体是什么错误:

[root@master01 ~]# mkfs.xfs -m crc=0,finobt=0 /dev/mapper/docker-253:1-17291963-base
unknown option -m finobt=0
Usage: mkfs.xfs
/* blocksize */		[-b log=n|size=num]
/* metadata */		[-m crc=[0|1]
/* data subvol */	[-d agcount=n,agsize=n,file,name=xxx,size=num,
			    (sunit=value,swidth=value|su=num,sw=num|noalign),
			    sectlog=n|sectsize=num
/* force overwrite */	[-f]
/* inode size */	[-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
			    projid32bit=0|1]
/* no discard */	[-K]
/* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n
			    sunit=value|su=num,sectlog=n|sectsize=num,
			    lazy-count=0|1]
/* label */		[-L label (maximum 12 characters)]
/* naming */		[-n log=n|size=num,version=2|ci,ftype=0|1]
/* no-op info only */	[-N]
/* prototype file */	[-p fname]
/* quiet */		[-q]
/* realtime subvol */	[-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */	[-s log=n|size=num]
/* version */		[-V]
			devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
      xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).

mkfs.xfs没有-m finobt=0参数,可能是 XFS 软件包版本过底的问题,升级xfsprogs软件包:

[root@master01 ~]# rpm -qa|grep -i xfs
xfsprogs-3.2.0-0.10.alpha2.el7.x86_64
[root@master01 ~]# yum update xfsprogs -y
[root@master01 ~]# rpm -qa|grep -i xfs
xfsprogs-4.5.0-19.el7_6.x86_64

升级xfsprogs软件包之后,重启 docker.service,问题解决:

[root@master01 ~]# systemctl restart docker
[root@master01 ~]# systemctl status docker -l
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2019-05-19 17:03:25 EEST; 6s ago
     Docs: https://docs.docker.com
 Main PID: 7139 (dockerd)
   Memory: 46.0M
   CGroup: /system.slice/docker.service
           ├─7139 /usr/bin/dockerd
           └─7146 docker-containerd --config /var/run/docker/containerd/containerd.toml

May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.217188797+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4202090d0, CONNECTING" module=grpc
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.217999669+03:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4202090d0, READY" module=grpc
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.218018674+03:00" level=info msg="Loading containers: start."
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.590251077+03:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.871944746+03:00" level=info msg="Loading containers: done."
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.885618616+03:00" level=info msg="Docker daemon" commit=d7080c1 graphdriver(s)=devicemapper version=18.06.3-ce
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.885797435+03:00" level=info msg="Daemon has completed initialization"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.894076881+03:00" level=warning msg="Could not register builder git source: failed to find git binary: exec: \"git\": executable file not found in $PATH"
May 19 17:03:25 master01 dockerd[7139]: time="2019-05-19T17:03:25.908769784+03:00" level=info msg="API listen on /var/run/docker.sock"
May 19 17:03:25 master01 systemd[1]: Started Docker Application Container Engine.

docker info 查看 Docker 信息

[root@master01 ~]# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 18.06.3-ce
Storage Driver: devicemapper
 Pool Name: docker-253:1-17291963-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 10.94MB
 Data Space Total: 107.4GB
 Data Space Available: 5.577GB
 Metadata Space Used: 581.6kB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.147GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.149-RHEL7 (2018-07-20)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-123.20.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.704GiB
Name: master01
ID: P4CM:J5T3:Z7X7:EUN4:J3MW:QPGL:SWY2:KGCZ:EXMD:ACOE:HDHF:4VAW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

3、安装 kubernetes 软件包

所有主机都需要安装 Kubernetes 软件包,这里以 master01 为例:

配置 Kubernetes 的 Yum 源,这里使用阿里云的 Kubernetes Yum 源:

[root@master01 ~]# cat > /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
  http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

Yum 安装 kubeadm、kubectl、kubelet 和 kubernetes-cni 软件包:

[root@master01 ~]# yum install kubeadm kubectl kubelet kubernetes-cni -y

4、kubeadm init 初始化 master 节点

kubeadm init 之前的准备工作

1)查看kubeadm init需要下载的容器镜像

[root@master01 ~]# kubeadm config images list
I0519 17:24:16.082322   17456 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:24:16.082637   17456 version.go:97] falling back to the local client version: v1.14.2
k8s.gcr.io/kube-apiserver:v1.14.2
k8s.gcr.io/kube-controller-manager:v1.14.2
k8s.gcr.io/kube-scheduler:v1.14.2
k8s.gcr.io/kube-proxy:v1.14.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1
  • 无法访问k8s.gcr.io,Kubernetes v1.14 中可以使用--image-repository参数指定自定义的镜像仓库,而不需要手动docker pulldocker tag准备容器镜像,非常实用的一个参数。

2)如果使用网络插件 flannel,必须指定--pod-network-cidr=10.244.0.0/16参数,否则部署 flannel 会报Error registering network: failed to acquire lease: node "master01" pod cidr not assigned错误。

3)如果有对块网卡,kubeadm 选择默认路由使用的网卡部署集群

[root@master01 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.2.2        0.0.0.0         UG    100    0        0 enp0s3
10.0.2.0        0.0.0.0         255.255.255.0   U     100    0        0 enp0s3
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.81.0    0.0.0.0         255.255.255.0   U     101    0        0 enp0s8
  • 这里 kubeadm 会选择网卡 enp0s3,可以使用--apiserver-advertise-address=192.168.81.101参数指定 kubeadm 使用虚拟机的 private_network:enp0s8 网卡。

4)如果kubeadm init初始化 master 节点失败,找到问题原因之后可以使用kubeadm reset命令重置集群,然后使用kubeadm init命令重新初始化 master 节点。

kubeadm init 初始化 master 节点

[root@master01 ~]# kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.81.101
I0520 02:15:19.678613   13717 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0520 02:15:19.678845   13717 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.81.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master01 localhost] and IPs [192.168.81.101 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master01 localhost] and IPs [192.168.81.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 16.503470 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node master01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 2yt3kq.ewm19tnly4rrym1a
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.81.101:6443 --token 2yt3kq.ewm19tnly4rrym1a \
    --discovery-token-ca-cert-hash sha256:d644d599cd27612c1a2fbd62aa686ad00a6fe298946f229a0668c5ef637176a4

遇到的问题和解决方法

kubeadm init 的 WARNING

[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
I0519 17:24:56.462966   17462 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:24:56.463116   17462 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
	[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
	[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

解决方法:

[root@master01 ~]# systemctl stop firewalld && systemctl disable firewalld
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
[root@master01 ~]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@master01 ~]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.
[root@master01 ~]# cat > /etc/sysctl.d/k8s.conf <<EOF
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
[root@master01 ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
  • 所有节点都需要操作。

kubeadm init 报“error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster”错误

[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
I0519 17:29:30.515120   17679 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0519 17:29:30.515257   17679 version.go:97] falling back to the local client version: v1.14.2
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master01 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master01 localhost] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15]
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

查看 kubelet 日志:

[root@master01 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2019-05-19 17:30:13 EEST; 1min 43s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 18200 (kubelet)
   Memory: 35.6M
   CGroup: /system.slice/kubelet.service
           └─18200 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1

May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.775342   18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://10.0.2.15:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster01&limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.872568   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.978461   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:56 master01 kubelet[18200]: E0519 17:31:56.985968   18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.2.15:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster01&limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.079359   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.176252   18200 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://10.0.2.15:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.179819   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.286579   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.398565   18200 kubelet.go:2244] node "master01" not found
May 19 17:31:57 master01 kubelet[18200]: E0519 17:31:57.398945   18200 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://10.0.2.15:6443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourceVersion=0: dial tcp 10.0.2.15:6443: connect: connection refused

查看 Docker 容器,没有创建 kube-apiserver容器:

[root@master01 ~]# docker ps -a|grep apiserver|grep -v pause

怀疑是多块网卡导致的问题,kubeadm reset清空 kubeadm 的所有修改,然后重新kubeadm init并使用--apiserver-advertise-address参数指定网卡:

[root@master01 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0519 17:37:43.865009     960 reset.go:73] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://10.0.2.15:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 10.0.2.15:6443: connect: connection refused
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0519 17:37:48.273587     960 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
E0519 17:38:15.554250     960 reset.go:192] [reset] Failed to remove containers: failed to remove running container a11fe6947c5e: output: Error: No such container: a11fe6947c5e
, error: exit status 1
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

[root@master01 ~]# iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

kubelet 报错“Failed to start ContainerManager failed to initialize top level QOS containers”

kubeadm init的报错信息:

[root@master01 ~]# kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --apiserver-advertise-address 192.168.81.101
...
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

kubelet 的日志:

[root@master01 ~]# journalctl -u kubelet -n 10|more
-- Logs begin at Sun 2019-05-19 16:10:43 EEST, end at Sun 2019-05-19 17:45:16 EEST. --
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.865421    5240 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.866543    5240 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 17:45:16 master01 kubelet[5240]: E0519 17:45:16.876942    5240 kubelet.go:2244] node "master01" not found
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877319    5240 cpu_manager.go:155] [cpumanager] starting with none policy
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877331    5240 cpu_manager.go:156] [cpumanager] reconciling every 10s
May 19 17:45:16 master01 kubelet[5240]: I0519 17:45:16.877344    5240 policy_none.go:42] [cpumanager] none policy: Start
May 19 17:45:16 master01 kubelet[5240]: F0519 17:45:16.877925    5240 kubelet.go:1359] Failed to start ContainerManager failed to initialize top level QOS containers: 
failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: Failed to find subsystem mount for require
d subsystem: pids
May 19 17:45:16 master01 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
May 19 17:45:16 master01 systemd[1]: Unit kubelet.service entered failed state.
May 19 17:45:16 master01 systemd[1]: kubelet.service failed.

问题原因:kubelet之前的配置没有清理干净,Github 上说systemctl stop kubepods-burstable.slice即可解决问题,但是测试并没有作用。kubeadm reset之后重启系统解决问题:

kubelet 报错“Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container”

kubeadm init失败,报错信息和上面一样,但是 kubelet 报Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container错误:

[root@master01 ~]# journalctl -u kubelet
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.185184    3532 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get http
s://192.168.81.101:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster01&limit=500&resourceVersion=0: dial tcp 192.168.81.101:6443: connect: connection refused
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.272912    3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.373281    3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.384604    3532 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: G
et https://192.168.81.101:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster01&limit=500&resourceVersion=0: dial tcp 192.168.81.101:6443: connect: connection refuse
d
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.421942    3532 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc 
= failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting containe
r process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\"": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422032    3532 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "kube-apiserver-master01_kube-system(49ec77e4
78f52e3dbb19dec81a7aab04)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: OC
I runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\"
": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422057    3532 kuberuntime_manager.go:693] createPodSandbox for pod "kube-apiserver-master01_kube-system(49ec77e
478f52e3dbb19dec81a7aab04)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-master01": Error response from daemon: O
CI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 23\
"": unknown
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.422121    3532 pod_workers.go:190] Error syncing pod 49ec77e478f52e3dbb19dec81a7aab04 ("kube-apiserver-master01_
kube-system(49ec77e478f52e3dbb19dec81a7aab04)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-master01_kube-system(49ec77e478f52e3dbb19dec81a7aab04)" wit
h CreatePodSandboxError: "CreatePodSandbox for pod \"kube-apiserver-master01_kube-system(49ec77e478f52e3dbb19dec81a7aab04)\" failed: rpc error: code = Unknown desc = f
ailed to start sandbox container for pod \"kube-apiserver-master01\": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container
 process caused \"process_linux.go:301: running exec setns process for init caused \\\"exit status 23\\\"\": unknown"
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.422465    3532 container.go:409] Failed to create summary reader for "/kubepods/burstable/pod49ec77e478f52e3dbb1
9dec81a7aab04/be115e01e71406d455c7e050251e9f2dc4f1517d989a3848c1c23900c1eb1573": none of the resources are being tracked.
May 19 18:01:07 master01 kubelet[3532]: I0519 18:01:07.427259    3532 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 18:01:07 master01 kubelet[3532]: I0519 18:01:07.427730    3532 kubelet_node_status.go:283] Setting node annotation to enable volume controller attach/detach
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.430606    3532 pod_container_deletor.go:75] Container "db801ac4e304a3bbd160b8c15db0d09e0fe036b7b32cfbc958b802aec
f7c8064" not found in pod's containers
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.450680    3532 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.473438    3532 kubelet.go:2244] node "master01" not found
May 19 18:01:07 master01 kubelet[3532]: W0519 18:01:07.552663    3532 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/kubepods/burstable/pod1353086c450
cf89683ca588f417f9971/a51c38366ae5007b72bedb0b974cd2109d8defd158ccf07faf84a32e126dc3ed": 0x40000100 == IN_CREATE|IN_ISDIR): readdirent: no such file or directory
May 19 18:01:07 master01 kubelet[3532]: E0519 18:01:07.573647    3532 kubelet.go:2244] node "master01" not found
  • 发现 Docker 创建容器失败。

查看 docker.service 的日志:

May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04+03:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6/shim.sock" debug=false pid=31559
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04+03:00" level=info msg="shim reaped" id=2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.225092029+03:00" level=error msg="stream copy error: reading from a closed fifo"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.226160661+03:00" level=error msg="stream copy error: reading from a closed fifo"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.655337215+03:00" level=error msg="2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6 cleanup: failed to delete container from containerd: no such container"
May 19 18:02:04 master01 dockerd[1854]: time="2019-05-19T18:02:04.655390191+03:00" level=error msg="Handler for POST /v1.38/containers/2737cf3ce004839f0e630ea2e1cc1492d342ccb78d38f7f0ef122f691e8600a6/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:297: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\": unknown"

查看 Docker 和内核版本:

[root@master01 ~]# docker --version
Docker version 18.06.3-ce, build d7080c1           
[root@master01 ~]# docker-runc --version
runc version 1.0.0-rc5+dev.docker-18.06
commit: a592beb5bc4c4092b1b1bac971afed27687340c5
spec: 1.0.0
[root@master01 ~]# uname -r
3.10.0-123.20.1.el7.x86_64

可能是内核版本太低而 Docker 版本较新导致的问题,yum update升级内核到最新版本,问题解决:

[root@master01 ~]# yum update -y && reboot
[root@master01 ~]# uname -r
3.10.0-957.12.2.el7.x86_64

拷贝 kubectl 的配置文件

拷贝 kubectl 的配置文件/etc/kubernetes/admin.conf$HOME/.kube/config

[root@master01 ~]# mkdir -p $HOME/.kube
[root@master01 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master01 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

使用 kubectl 命令:

[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES    AGE    VERSION
master01   NotReady   master   3m4s   v1.14.2
[root@master01 ~]# kubectl get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
coredns-d5947d4b-2s9fm             0/1     Pending   0          2m53s
coredns-d5947d4b-hjbcx             0/1     Pending   0          2m53s
etcd-master01                      1/1     Running   0          110s
kube-apiserver-master01            1/1     Running   0          2m13s
kube-controller-manager-master01   1/1     Running   0          2m14s
kube-proxy-jmdnj                   1/1     Running   0          2m53s
kube-scheduler-master01            1/1     Running   0          105s

如果提示Unable to connect to the server: x509: certificate signed by unknown authority,一般是发生在kubeadm reset之后,使用了上次kubeadm init生成的admin.conf,重新执行上面的命令覆盖~/.kube/config即可。

[root@master01 ~]# kubectl get pods -n kube-system
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

5、安装 CNI 网络插件

这里使用 flannel 网络插件:

[root@master01 ~]# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
[root@master01 ~]# kubectl apply -f kube-flannel.yml 
podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created

检查 flannel 是否启动成功,flannel 启动成功之后 CoreDNS 的将从 Pending 状态变为 Running 状态:

[root@master01 ~]# kubectl get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
coredns-d5947d4b-2s9fm             1/1     Running   0          11m
coredns-d5947d4b-hjbcx             1/1     Running   0          11m
etcd-master01                      1/1     Running   0          10m
kube-apiserver-master01            1/1     Running   0          10m
kube-controller-manager-master01   1/1     Running   0          10m
kube-flannel-ds-amd64-w2cvr        1/1     Running   0          3m17s
kube-proxy-jmdnj                   1/1     Running   0          11m
kube-scheduler-master01            1/1     Running   0          9m58s

如果kubeadm init时没有指定--pod-network-cidr=10.244.0.0/16参数,则 flannel 的 Pod 启动失败,报错信息如下:

[root@master01 ~]# kubectl logs -n kube-system kube-flannel-ds-amd64-tj4wq 
I0519 15:58:36.994408       1 main.go:514] Determining IP address of default interface
I0519 15:58:36.995315       1 main.go:527] Using interface with name enp0s3 and address 10.0.2.15
I0519 15:58:36.995395       1 main.go:544] Defaulting external address to interface address (10.0.2.15)
I0519 15:58:37.095446       1 kube.go:126] Waiting 10m0s for node controller to sync
I0519 15:58:37.095869       1 kube.go:309] Starting kube subnet manager
I0519 15:58:38.095922       1 kube.go:133] Node controller sync successful
I0519 15:58:38.095967       1 main.go:244] Created subnet manager: Kubernetes Subnet Manager - master01
I0519 15:58:38.095975       1 main.go:247] Installing signal handlers
I0519 15:58:38.097679       1 main.go:386] Found network config - Backend type: vxlan
I0519 15:58:38.097722       1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0519 15:58:38.099008       1 main.go:289] Error registering network: failed to acquire lease: node "master01" pod cidr not assigned
I0519 15:58:38.099040       1 main.go:366] Stopping shutdownHandler...

这时候查看 kubeadm 初始化的配置 ConfigMap:kubeadm-config

[root@master01 ~]# kubectl get configmaps -n kube-system kubeadm-config -o yaml
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta1
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controlPlaneEndpoint: ""
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
    kind: ClusterConfiguration
    kubernetesVersion: v1.14.2
    networking:
      dnsDomain: cluster.local
      podSubnet: ""
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      master01:
        advertiseAddress: 192.168.81.101
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2019-05-19T15:34:57Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "157"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: a7f40841-7a4b-11e9-b726-080027c5464e
  • 可以看到这里podSubnet为空,解决方法:kubeadm reset后添加--pod-network-cidr=10.244.0.0/16参数重新kubeadm init master 节点即可。

6、将 worker 节点加入集群

worker 节点加入集群非常简单,使用kubeadm join命令即可,在kubeadm init初始化 master 节点的最后会输出该命令及参数,直接拷贝到 worker 节点上执行:

[root@node1 ~]# kubeadm join 192.168.81.101:6443 --token 2yt3kq.ewm19tnly4rrym1a \
    --discovery-token-ca-cert-hash sha256:d644d599cd27612c1a2fbd62aa686ad00a6fe298946f229a0668c5ef637176a4
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

将所有的 worker 节点都加入集群,最后在 master 节点上使用kubectl get nodes检查节点是否成功加入 kubernetes 集群:

[root@master01 ~]# kubectl get nodes -o wide
NAME       STATUS     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
master01   Ready      master   18m     v1.14.2   10.0.2.15     <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
node1      Ready      <none>   3m43s   v1.14.2   10.0.2.15     <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
node2      NotReady   <none>   10s     v1.14.2   10.0.2.15     <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6

7、总结

至此,一个完整的 kubernetes 集群部署完成!总结一下,主要步骤有六个:

  • 准备操作系统环境;
  • 安装容器运行时 Docker;
  • 安装 kubernetes 软件包;
  • kubeadm init初始化 master 节点;
  • 安装 CNI 网络插件 flannel;
  • kubeadm join加入 worker 节点。

kubeadm 创建集群是不是非常简单?但是这里只有一个 master 节点,存在单点故障,后面会介绍如何部署包含多个 master 节点的高可用 kubernetes 集群。

发表评论