Kubernetes 高可用实践

Posted by 彭楷淳 on 2021-02-18
Estimated Reading Time 33 Minutes
Words 6.4k In Total
Viewed Times

统一环境配置


节点配置

主机名 IP 角色 系统 CPU/内存 磁盘
kubernetes-master-01 192.168.141.150 Master Ubuntu Server 18.04 2核2G 20G
kubernetes-master-02 192.168.141.151 Master Ubuntu Server 18.04 2核2G 20G
kubernetes-master-03 192.168.141.152 Master Ubuntu Server 18.04 2核2G 20G
kubernetes-node-01 192.168.141.160 Node Ubuntu Server 18.04 2核4G 20G
kubernetes-node-02 192.168.141.161 Node Ubuntu Server 18.04 2核4G 20G
kubernetes-node-03 192.168.141.162 Node Ubuntu Server 18.04 2核4G 20G
Kubernetes VIP 192.168.141.200 - - - -

对操作系统的配置

特别注意:以下步骤请在制作 VMware 镜像时一并完成,避免逐台安装的痛苦

关闭交换空间

1
$ swapoff -a

避免开机启动交换空间

注释 swap 开头的行

1
$ vi /etc/fstab

关闭防火墙

1
$ ufw disable

配置 DNS

取消 DNS 行注释,并增加 DNS 配置如:114.114.114.114,修改后重启下计算机

1
$ vi /etc/systemd/resolved.conf

安装 Docker

1
2
3
4
5
6
7
8
9
10
11
12
# 更新软件源
$ sudo apt-get update
# 安装所需依赖
$ sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common
# 安装 GPG 证书
$ curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
# 新增软件源信息
$ sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
# 再次更新软件源
$ sudo apt-get -y update
# 安装 Docker CE 版
$ sudo apt-get -y install docker-ce

配置 Docker 加速器

国内镜像加速器可能会很卡,请替换成你自己阿里云镜像加速器,地址如:https://yourself.mirror.aliyuncs.com,在阿里云控制台的 容器镜像服务 > 镜像加速器 菜单中可以找到。

/etc/docker/daemon.json 中写入如下内容(如果文件不存在请新建该文件)

1
2
3
4
5
{
"registry-mirrors": [
"https://registry.docker-cn.com"
]
}

安装 kubeadm、kubelet、kubectl

1
2
3
4
5
6
7
8
9
10
11
12
13
# 安装系统工具
$ apt-get update && apt-get install -y apt-transport-https

# 安装 GPG 证书
$ curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

# 写入软件源;注意:我们用系统代号为 bionic,但目前阿里云不支持,所以沿用 16.04 的 xenial
$ cat << EOF >/etc/apt/sources.list.d/kubernetes.list
> deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
> EOF

# 安装
$ apt-get update && apt-get install -y kubelet kubeadm kubectl

同步时间

设置时区

1
$ dpkg-reconfigure tzdata

选择 Asia(亚洲)Shanghai(上海)

时间同步

1
2
3
4
5
6
7
8
# 安装 ntpdate
$ apt-get install ntpdate

# 设置系统时间与网络时间同步(cn.pool.ntp.org 位于中国的公共 NTP 服务器)
$ ntpdate cn.pool.ntp.org

# 将系统时间写入硬件时间
$ hwclock --systohc

确认时间

1
2
3
4
$ date

# 输出如下(自行对照与系统时间是否一致)
Sun Jun 2 22:02:35 CST 2019

配置 IPVS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 安装系统工具
$ apt-get install -y ipset ipvsadm

# 配置并加载 IPVS 模块
$ mkdir -p /etc/sysconfig/modules/
$ vi /etc/sysconfig/modules/ipvs.modules

# 输入如下内容
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

# 执行脚本,注意:如果重启则需要重新运行该脚本
$ chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

# 执行脚本输出如下
ip_vs_sh 16384 0
ip_vs_wrr 16384 0
ip_vs_rr 16384 0
ip_vs 147456 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
nf_conntrack_ipv4 16384 3
nf_defrag_ipv4 16384 1 nf_conntrack_ipv4
nf_conntrack 131072 8 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink,ip_vs
libcrc32c 16384 4 nf_conntrack,nf_nat,raid456,ip_vs

配置内核参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# 配置参数
$ vi /etc/sysctl.d/k8s.conf

# 输入如下内容
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.swappiness=0

# 应用参数
$ sysctl --system

# 应用参数输出如下(找到 Applying /etc/sysctl.d/k8s.conf 开头的日志)
* Applying /etc/sysctl.d/10-console-messages.conf ...
kernel.printk = 4 4 1 7
* Applying /etc/sysctl.d/10-ipv6-privacy.conf ...
* Applying /etc/sysctl.d/10-kernel-hardening.conf ...
kernel.kptr_restrict = 1
* Applying /etc/sysctl.d/10-link-restrictions.conf ...
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/10-lxd-inotify.conf ...
fs.inotify.max_user_instances = 1024
* Applying /etc/sysctl.d/10-magic-sysrq.conf ...
kernel.sysrq = 176
* Applying /etc/sysctl.d/10-network-security.conf ...
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.tcp_syncookies = 1
* Applying /etc/sysctl.d/10-ptrace.conf ...
kernel.yama.ptrace_scope = 1
* Applying /etc/sysctl.d/10-zeropage.conf ...
vm.mmap_min_addr = 65536
* Applying /usr/lib/sysctl.d/50-default.conf ...
net.ipv4.conf.all.promote_secondaries = 1
net.core.default_qdisc = fq_codel
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
* Applying /etc/sysctl.conf ...

修改 cloud.cfg

1
2
3
4
$ vi /etc/cloud/cloud.cfg

# 该配置默认为 false,修改为 true 即可
preserve_hostname: true

单独节点配置


特别注意:为 Master 和 Node 节点单独配置对应的 IP主机名

配置 IP

编辑配置文件

1
$ vi /etc/netplan/50-cloud-init.yaml

修改内容如下

1
2
3
4
5
6
7
8
9
network:
ethernets:
ens33:
# 我的 Master 是 150 - 152,Node 是 160 - 162
addresses: [192.168.141.150/24]
gateway4: 192.168.141.2
nameservers:
addresses: [192.168.141.2]
version: 2

使用 netplan apply 命令让配置生效

配置主机名

1
2
3
4
5
6
7
# 修改主机名
$ hostnamectl set-hostname kubernetes-master-01

# 配置 hosts
$ cat >> /etc/hosts << EOF
192.168.141.150 kubernetes-master-01
EOF

安装 HAProxy + Keepalived


概述

Kubernetes Master 节点运行组件如下:

  • kube-apiserver:提供了资源操作的唯一入口,并提供认证、授权、访问控制、API 注册和发现等机制
  • kube-scheduler:负责资源的调度,按照预定的调度策略将 Pod 调度到相应的机器上
  • kube-controller-manager:负责维护集群的状态,比如故障检测、自动扩展、滚动更新等
  • etcd:CoreOS 基于 Raft 开发的分布式 key-value 存储,可用于服务发现、共享配置以及一致性保障(如数据库选主、分布式锁等)

kube-schedulerkube-controller-manager 可以以集群模式运行,通过 leader 选举产生一个工作进程,其它进程处于阻塞模式。

kube-apiserver 可以运行多个实例,但对其它组件需要提供统一的访问地址,本章节部署 Kubernetes 高可用集群实际就是利用 HAProxy + Keepalived 配置该组件

配置的思路就是利用 HAProxy + Keepalived 实现 kube-apiserver 虚拟 IP 访问从而实现高可用和负载均衡,拆解如下:

  • Keepalived 提供 kube-apiserver 对外服务的虚拟 IP(VIP)
  • HAProxy 监听 Keepalived VIP
  • 运行 Keepalived 和 HAProxy 的节点称为 LB(负载均衡) 节点
  • Keepalived 是一主多备运行模式,故至少需要两个 LB 节点
  • Keepalived 在运行过程中周期检查本机的 HAProxy 进程状态,如果检测到 HAProxy 进程异常,则触发重新选主的过程,VIP 将飘移到新选出来的主节点,从而实现 VIP 的高可用
  • 所有组件(如 kubeclt、apiserver、controller-manager、scheduler 等)都通过 VIP +HAProxy 监听的 6444 端口访问 kube-apiserver 服务(注意:kube-apiserver 默认端口为 6443,为了避免冲突我们将 HAProxy 端口设置为 6444,其它组件都是通过该端口统一请求 apiserver

创建 HAProxy 启动脚本

该步骤在 kubernetes-master-01 执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ mkdir -p /usr/local/kubernetes/lb
$ vi /usr/local/kubernetes/lb/start-haproxy.sh

# 输入内容如下
#!/bin/bash
# 修改为你自己的 Master 地址
MasterIP1=192.168.141.150
MasterIP2=192.168.141.151
MasterIP3=192.168.141.152
# 这是 kube-apiserver 默认端口,不用修改
MasterPort=6443

# 容器将 HAProxy 的 6444 端口暴露出去
$ docker run -d --restart=always --name HAProxy-K8S -p 6444:6444 \
-e MasterIP1=$MasterIP1 \
-e MasterIP2=$MasterIP2 \
-e MasterIP3=$MasterIP3 \
-e MasterPort=$MasterPort \
wise2c/haproxy-k8s

# 设置权限
$ chmod +x start-haproxy.sh

创建 Keepalived 启动脚本

该步骤在 kubernetes-master-01 执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ mkdir -p /usr/local/kubernetes/lb
$ vi /usr/local/kubernetes/lb/start-keepalived.sh

# 输入内容如下
#!/bin/bash
# 修改为你自己的虚拟 IP 地址
VIRTUAL_IP=192.168.141.200
# 虚拟网卡设备名
INTERFACE=ens33
# 虚拟网卡的子网掩码
NETMASK_BIT=24
# HAProxy 暴露端口,内部指向 kube-apiserver 的 6443 端口
CHECK_PORT=6444
# 路由标识符
RID=10
# 虚拟路由标识符
VRID=160
# IPV4 多播地址,默认 224.0.0.18
MCAST_GROUP=224.0.0.18

$ docker run -itd --restart=always --name=Keepalived-K8S \
--net=host --cap-add=NET_ADMIN \
-e VIRTUAL_IP=$VIRTUAL_IP \
-e INTERFACE=$INTERFACE \
-e CHECK_PORT=$CHECK_PORT \
-e RID=$RID \
-e VRID=$VRID \
-e NETMASK_BIT=$NETMASK_BIT \
-e MCAST_GROUP=$MCAST_GROUP \
wise2c/keepalived-k8s

# 设置权限
$ chmod +x start-keepalived.sh

复制脚本到其它 Master 地址

分别在 kubernetes-master-02kubernetes-master-03 执行创建工作目录命令

1
$ mkdir -p /usr/local/kubernetes/lb

kubernetes-master-01 中的脚本拷贝至其它 Master

1
2
scp start-haproxy.sh start-keepalived.sh 192.168.141.151:/usr/local/kubernetes/lb
scp start-haproxy.sh start-keepalived.sh 192.168.141.152:/usr/local/kubernetes/lb

分别在 3 个 Master 中启动容器(执行脚本)

1
$ sh /usr/local/kubernetes/lb/start-haproxy.sh && sh /usr/local/kubernetes/lb/start-keepalived.sh

验证是否成功

查看容器

1
2
3
4
5
6
$ docker ps

# 输出如下
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f50df479ecae wise2c/keepalived-k8s "/usr/bin/keepalived…" About an hour ago Up About an hour Keepalived-K8S
75066a7ed2fb wise2c/haproxy-k8s "/docker-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:6444->6444/tcp HAProxy-K8S

查看网卡绑定的虚拟 IP

1
2
3
4
5
6
$ ip a | grep ens33

# 输出如下
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 192.168.141.151/24 brd 192.168.141.255 scope global ens33
inet 192.168.141.200/24 scope global secondary ens33

特别注意:Keepalived 会对 HAProxy 监听的 6444 端口进行检测,如果检测失败即认定本机 HAProxy 进程异常,会将 VIP 漂移到其他节点,所以无论本机 Keepalived 容器异常或 HAProxy 容器异常都会导致 VIP 漂移到其他节点

部署 Kubernetes 集群


初始化 Master

创建工作目录并导出配置文件

1
2
3
4
5
# 创建工作目录
$ mkdir -p /usr/local/kubernetes/cluster

# 导出配置文件到工作目录
$ kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml

修改配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
# 修改为主节点 IP
advertiseAddress: 192.168.141.150
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: kubernetes-master
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
# 配置 Keepalived 地址和 HAProxy 端口
controlPlaneEndpoint: "192.168.141.200:6444"
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
# 国内不能访问 Google,修改为阿里云
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
# 修改版本号
kubernetesVersion: v1.14.2
networking:
dnsDomain: cluster.local
# 配置成 Calico 的默认网段
podSubnet: "192.168.0.0/16"
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
# 开启 IPVS 模式
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
mode: ipvs

kubeadm 初始化

1
2
3
4
5
6
7
8
9
10
# kubeadm 初始化
$ kubeadm init --config=kubeadm.yml --experimental-upload-certs | tee kubeadm-init.log

# 配置 kubectl
$ mkdir -p $HOME/.kube
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ chown $(id -u):$(id -g) $HOME/.kube/config

# 验证是否成功
$ kubectl get node

安装网络插件

1
2
3
4
5
# 安装 Calico
$ kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml

# 验证安装是否成功
$ watch kubectl get pods --all-namespaces

加入 Master 节点

kubeadm-init.log 中获取命令,分别将 kubernetes-master-02kubernetes-master-03 加入 Master

1
2
3
4
# 以下为示例命令
$ kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:56d53268517c132ae81c868ce99c44be797148fb2923e59b49d73c99782ff21f \
--experimental-control-plane --certificate-key c4d1525b6cce4b69c11c18919328c826f92e660e040a46f5159431d5ff0545bd

加入 Node 节点

kubeadm-init.log 中获取命令,分别将 kubernetes-node-01kubernetes-node-03 加入 Node

1
2
3
# 以下为示例命令
$ kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:56d53268517c132ae81c868ce99c44be797148fb2923e59b49d73c99782ff21f

验证集群状态

查看 Node

1
$ kubectl get nodes -o wide

查看 Pod

1
$ kubectl -n kube-system get pod -o wide

查看 Service

1
$ kubectl -n kube-system get svc

验证 IPVS,查看 kube-proxy 日志,server_others.go:176] Using ipvs Proxier.

1
$ kubectl -n kube-system logs -f <kube-proxy 容器名>

查看代理规则

1
$ ipvsadm -ln

查看生效的配置

1
$ kubectl -n kube-system get cm kubeadm-config -oyaml

查看 etcd 集群

1
2
3
4
5
6
7
8
9
10
11
$ kubectl -n kube-system exec etcd-kubernetes-master-01 -- etcdctl \
--endpoints=https://192.168.141.150:2379 \
--ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--key-file=/etc/kubernetes/pki/etcd/server.key cluster-health

# 输出如下
member 1dfaf07371bb0cb6 is healthy: got healthy result from https://192.168.141.152:2379
member 2da85730b52fbeb2 is healthy: got healthy result from https://192.168.141.150:2379
member 6a3153eb4faaaffa is healthy: got healthy result from https://192.168.141.151:2379
cluster is healthy

验证高可用

特别注意:Keepalived 要求至少 2 个备用节点,故想测试高可用至少需要 1 主 2 从模式验证,否则可能出现意想不到的问题

对任意一台 Master 机器执行关机操作

1
$ shutdown -h now

在任意一台 Master 节点上查看 Node 状态

1
2
3
4
5
6
7
$ kubectl get node

# 输出如下,除已关机那台状态为 NotReady 其余正常便表示成功
NAME STATUS ROLES AGE VERSION
kubernetes-master-01 NotReady master 18m v1.14.2
kubernetes-master-02 Ready master 17m v1.14.2
kubernetes-master-03 Ready master 16m v1.14.2

查看 VIP 漂移

1
2
3
4
5
6
$ ip a |grep ens33

# 输出如下
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
inet 192.168.141.151/24 brd 192.168.141.255 scope global ens33
inet 192.168.141.200/24 scope global secondary ens33

解决 Node 无法加入的问题


问题描述

当我们使用 kubeadm join 命令将 Node 节点加入集群时,你会发现所有 kubectl 命令均不可用(呈现阻塞状态,并不会返回响应结果),我们可以在 Node 节点中通过 kubeadm reset 命令将 Node 节点下线,此时回到 Master 节点再使用 watch kubectl get pods --all-namespaces 可以看到下图中报错了,coredns-xxx-xxx 状态为 CrashLoopBackOff

解决方案

从上面的错误信息不难看出应该是出现了网络问题,而我们在安装过程中只使用了一个网络插件 Calico ,那么该错误是不是由 Calico 引起的呢?带着这个疑问我们去到 Calico 官网再看一下它的说明,官网地址:https://docs.projectcalico.org/v3.7/getting-started/kubernetes/

在它的 Quickstart 里有两段话(属于特别提醒)

上面这段话的主要意思是:当 kubeadm 安装完成后不要关机,继续完成后续的安装步骤;这也说明了安装 Kubernetes 的过程不要出现中断一口气搞定(不过这不是重点)(* ̄rǒ ̄)

上面这段话的主要意思是:如果你的网络在 192.168.0.0/16 网段中,则必须选择一个不同的 Pod 网络;恰巧咱们的网络范围(我虚拟机的 IP 范围是 192.168.141.0/24)和该网段重叠 (ノへ ̄、);好吧,当时做单节点集群时因为没啥问题而忽略了 ♪(^∇^*)

so,能够遇到这个问题主要是因为虚拟机 IP 范围刚好和 Calico 默认网段重叠导致的,所以想要解决这个问题,咱们就需要修改 Calico 的网段了(当然也可以改虚拟机的),换句话说就是大家重装一下 o (一︿一 +) o

按照以下标准步骤重装即可

重置 Kubernetes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ kubeadm reset

# 输出如下
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0604 01:55:28.517280 22688 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

删除 kubectl 配置

1
$ rm -fr ~/.kube/

启用 IPVS

1
2
3
4
5
$ modprobe -- ip_vs
$ modprobe -- ip_vs_rr
$ modprobe -- ip_vs_wrr
$ modprobe -- ip_vs_sh
$ modprobe -- nf_conntrack_ipv4

导出并修改配置文件

1
$ kubeadm config print init-defaults --kubeconfig ClusterConfiguration > kubeadm.yml

配置文件修改如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.141.150
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: kubernetes-master-01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: "192.168.141.200:6444"
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.14.2
networking:
dnsDomain: cluster.local
# 主要修改在这里,替换 Calico 网段为我们虚拟机不重叠的网段(这里用的是 Flannel 默认网段)
podSubnet: "10.244.0.0/16"
serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
SupportIPVSProxyMode: true
mode: ipvs

kubeadm 初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
$ kubeadm init --config=kubeadm.yml --experimental-upload-certs | tee kubeadm-init.log

# 输出如下
[init] Using Kubernetes version: v1.14.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master-01 localhost] and IPs [192.168.141.150 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master-01 localhost] and IPs [192.168.141.150 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes-master-01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.141.150 192.168.141.200]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 24.507568 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in ConfigMap "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
a662b8364666f82c93cc5cd4fb4fabb623bbe9afdb182da353ac40f1752dfa4a
[mark-control-plane] Marking the node kubernetes-master-01 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kubernetes-master-01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2ea8c138021fb1e184a24ed2a81c16c92f9f25c635c73918b1402df98f9c8aad \
--experimental-control-plane --certificate-key a662b8364666f82c93cc5cd4fb4fabb623bbe9afdb182da353ac40f1752dfa4a

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --experimental-upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2ea8c138021fb1e184a24ed2a81c16c92f9f25c635c73918b1402df98f9c8aad

配置 kubectl

1
2
3
4
5
6
7
# 配置 kubectl
$ mkdir -p $HOME/.kube
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ chown $(id -u):$(id -g) $HOME/.kube/config

# 验证是否成功
$ kubectl get node

下载 Calico 配置文件并修改

1
$ wget https://docs.projectcalico.org/v3.7/manifests/calico.yaml
1
$ vi calico.yaml

修改第 611 行,将 192.168.0.0/16 修改为 10.244.0.0/16,可以通过如下命令快速查找

  • 显示行号::set number
  • 查找字符:/要查找的字符,输入小写 n 下一个匹配项,输入大写 N 上一个匹配项

安装 Calico

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ kubectl apply -f calico.yaml

# 输出如下
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.extensions/calico-node created
serviceaccount/calico-node created
deployment.extensions/calico-kube-controllers created
serviceaccount/calico-kube-controllers created

加入 Master 节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# 示例如下,别忘记两个备用节点都要加入哦
$ kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2ea8c138021fb1e184a24ed2a81c16c92f9f25c635c73918b1402df98f9c8aad \
--experimental-control-plane --certificate-key a662b8364666f82c93cc5cd4fb4fabb623bbe9afdb182da353ac40f1752dfa4a

# 输出如下
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes-master-02 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.141.151 192.168.141.200]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master-02 localhost] and IPs [192.168.141.151 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master-02 localhost] and IPs [192.168.141.151 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node kubernetes-master-02 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kubernetes-master-02 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

加入 Node 节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 示例如下
$ kubeadm join 192.168.141.200:6444 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:2ea8c138021fb1e184a24ed2a81c16c92f9f25c635c73918b1402df98f9c8aad

# 输出如下
> --discovery-token-ca-cert-hash sha256:2ea8c138021fb1e184a24ed2a81c16c92f9f25c635c73918b1402df98f9c8aad
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

验证是否可用

1
2
3
4
5
6
7
8
$ kubectl get node

# 输出如下,我们可以看到 Node 节点已经成功上线 ━━( ̄ー ̄*|||━━
NAME STATUS ROLES AGE VERSION
kubernetes-master-01 Ready master 19m v1.14.2
kubernetes-master-02 Ready master 4m46s v1.14.2
kubernetes-master-03 Ready master 3m23s v1.14.2
kubernetes-node-01 Ready <none> 74s v1.14.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ watch kubectl get pods --all-namespaces

# 输出如下,coredns 也正常运行了
Every 2.0s: kubectl get pods --all-namespaces kubernetes-master-01: Tue Jun 4 02:31:43 2019

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-8646dd497f-hz5xp 1/1 Running 0 9m9s
kube-system calico-node-2z892 1/1 Running 0 9m9s
kube-system calico-node-fljxv 1/1 Running 0 6m39s
kube-system calico-node-vprlw 1/1 Running 0 5m16s
kube-system calico-node-xvqcx 1/1 Running 0 3m7s
kube-system coredns-8686dcc4fd-5ndjm 1/1 Running 0 21m
kube-system coredns-8686dcc4fd-zxtql 1/1 Running 0 21m
kube-system etcd-kubernetes-master-01 1/1 Running 0 20m
kube-system etcd-kubernetes-master-02 1/1 Running 0 6m37s
kube-system etcd-kubernetes-master-03 1/1 Running 0 5m14s
kube-system kube-apiserver-kubernetes-master-01 1/1 Running 0 20m
kube-system kube-apiserver-kubernetes-master-02 1/1 Running 0 6m37s
kube-system kube-apiserver-kubernetes-master-03 1/1 Running 0 5m14s
kube-system kube-controller-manager-kubernetes-master-01 1/1 Running 1 20m
kube-system kube-controller-manager-kubernetes-master-02 1/1 Running 0 6m37s
kube-system kube-controller-manager-kubernetes-master-03 1/1 Running 0 5m14s
kube-system kube-proxy-68jqr 1/1 Running 0 3m7s
kube-system kube-proxy-69bnn 1/1 Running 0 6m39s
kube-system kube-proxy-vvhp5 1/1 Running 0 5m16s
kube-system kube-proxy-ws6wx 1/1 Running 0 21m
kube-system kube-scheduler-kubernetes-master-01 1/1 Running 1 20m
kube-system kube-scheduler-kubernetes-master-02 1/1 Running 0 6m37s
kube-system kube-scheduler-kubernetes-master-03 1/1 Running 0 5m14s

至此,Kubernetes 高可用集群算是彻底部署成功。


If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !