ETCD 集群的备份和恢复

一、查看集群状态

###查看节点状态
/opt/etcd/bin/etcdctl --ca-file=/opt/etcd/ssl/ca.pem --cert-file=/opt/etcd/ssl/server.pem --key-file=/opt/etcd/ssl/server-key.pem --endpoints="https://172.17.100.252:2379,https://172.17.100.253:2379,https://172.17.100.254:2379" cluster-health

###查看每个节点状态
ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://172.17.100.252:2379,https://172.17.100.253:2379,https://172.17.100.254:2379" endpoint health

二、备份

这三个节点的信息是相互同步的,要去备份只需要备份一个节点就行了,连接其中一个节点备份就行。

自动化脚本:

  • 每8h备份到/opt/etcd_backup目录;* */8 * * * sh /usr/local/sbin/backupetcd.sh
  • 备份超过6份就删除;
#! /bin/bash

DATE=`date +%Y%m%d-%H%M%S`
BACKUP_DIR="/opt/etcd_backup"

ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot save ${BACKUP_DIR}/snap-${DATE}.db --endpoints=https://172.17.100.253:2379 --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem > /dev/null

sleep 3

cd $BACKUP_DIR/;ls -lt |awk '{if(NR>6){print "rm -rf "$9}}'|sh

三、恢复

对所有的etcd节点都做暂停。如果是多master那么上面apiserver都要停止;

3.1、先暂停kube-apiserver和etcd

[root@k8s-master ~]# systemctl stop kube-apiserver
[root@k8s-master ~]# systemctl stop etcd
[root@k8s-node1 ~]# systemctl stop etcd
[root@k8s-node2 ~]# systemctl stop etcd

3.2、在每个节点上恢复

先来看看ETCD配置:

[root@k8s-master2 sbin]#  cat /opt/etcd/cfg/etcd.conf

#[Member]
ETCD_NAME="etcd-2"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://172.17.100.253:2380"
ETCD_LISTEN_CLIENT_URLS="https://172.17.100.253:2379"

#[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.17.100.253:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://172.17.100.253:2379"
ETCD_INITIAL_CLUSTER="etcd-1=https://172.17.100.252:2380,etcd-2=https://172.17.100.253:2380,etcd-3=https://172.17.100.254:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"

在第一个节点恢复:

ETCDCTL_API=3 etcdctl snapshot restore /root/snap.db \
--name etcd-2 \
--initial-cluster="etcd-1=https://172.17.100.252:2380,etcd-2=https://172.17.100.253:2380,etcd-3=https://172.17.100.254:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://172.17.100.253:2380 \
--data-dir=/var/lib/etcd/default.etcd
 
 
--name etcd-2 \   #需要修改为当前节点名称
--initial-advertise-peer-urls=https://172.17.100.253:2380 \  #当前节点IP
 
 
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl snapshot restore /root/snap.db \
> --name etcd-2 \
> --initial-cluster="etcd-1=https://172.17.100.252:2380,etcd-2=https://172.17.100.253:2380,etcd-3=https://172.17.100.254:2380" \
> --initial-cluster-token=etcd-cluster \
> --initial-advertise-peer-urls=https://172.17.100.253:2380 \
> --data-dir=/var/lib/etcd/default.etcd
{"level":"info","ts":1608453271.6452653,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"/root/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
{"level":"info","ts":1608453271.7769744,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":93208}
{"level":"info","ts":1608453271.8183022,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"1b21d5d68d61885a","local-member-id":"0","added-peer-id":"1cd5f52adf869d89","added-peer-peer-urls":["https://192.168.179.99:2380"]}
{"level":"info","ts":1608453271.8184474,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"1b21d5d68d61885a","local-member-id":"0","added-peer-id":"55857deef69d787b","added-peer-peer-urls":["https://192.168.179.100:2380"]}
{"level":"info","ts":1608453271.818473,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"1b21d5d68d61885a","local-member-id":"0","added-peer-id":"8bcf42695ccd8d89","added-peer-peer-urls":["https://192.168.179.101:2380"]}
{"level":"info","ts":1608453271.8290143,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"/root/snap.db","wal-dir":"/var/lib/etcd/default.etcd/member/wal","data-dir":"/var/lib/etcd/default.etcd","snap-dir":"/var/lib/etcd/default.etcd/member/snap"}
 
 
[root@k8s-master ~]# ls /var/lib/etcd/
default.etcd  default.etcd.bak

拷贝到其他节点,再去恢复

[root@k8s-master ~]# scp snap.db root@172.17.100.252:~
root@172.17.100.252's password: 
snap.db                                                                                           100% 3296KB  15.4MB/s   00:00    
[root@k8s-master ~]# scp snap.db root@172.17.100.254:~
root@172.17.100.254's password: 
snap.db

在二节点恢复 :

[root@k8s-node1 ~]# ls /var/lib/etcd/
default.etcd.bak
 
# ETCDCTL_API=3 etcdctl snapshot restore /root/snap.db \
--name etcd-1 \
--initial-cluster="etcd-1=https://172.17.100.252:2380,etcd-2=https://172.17.100.253:2380,etcd-3=https://172.17.100.254:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://172.17.100.252:2380 \
--data-dir=/var/lib/etcd/default.etcd
 
[root@k8s-node1 ~]# ls /var/lib/etcd/
default.etcd  default.etcd.bak

在三节点恢复:

[root@k8s-node2 ~]# ls /var/lib/etcd/
default.etcd.bak

# ETCDCTL_API=3 etcdctl snapshot restore /root/snap.db \
--name etcd-3 \
--initial-cluster="etcd-1=https://172.17.100.252:2380,etcd-2=https://172.17.100.253:2380,etcd-3=https://172.17.100.254:2380" \
--initial-cluster-token=etcd-cluster \
--initial-advertise-peer-urls=https://172.17.100.254:2380 \
--data-dir=/var/lib/etcd/default.etcd

现在恢复成功,下面将服务启动

[root@k8s-master ~]# systemctl start kube-apiserver
[root@k8s-master ~]# systemctl start etcd
[root@k8s-node1 ~]# systemctl start etcd
[root@k8s-node2 ~]# systemctl start etcd

启动完看看集群是否正常

[root@k8s-master2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://172.17.100.252:2379,https://172.17.100.253:2379,https://172.17.100.254:2379" endpoint health
https://172.17.100.254:2379 is healthy: successfully committed proposal: took = 947.021µs
https://172.17.100.253:2379 is healthy: successfully committed proposal: took = 1.416622ms
https://172.17.100.252:2379 is healthy: successfully committed proposal: took = 1.099595ms

[root@k8s-master2 sbin]# kubectl get node
NAME          STATUS   ROLES    AGE    VERSION
k8s-master1   Ready    <none>   5d1h   v1.16.0
k8s-master2   Ready    <none>   5d1h   v1.16.0
k8s-node1     Ready    <none>   5d1h   v1.16.0
k8s-node2     Ready    <none>   5d1h   v1.16.0

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!