K8S ETCD集群备份与恢复

参考链接:https://www.cnblogs.com/xishuai/p/docker-etcd.html

数据备份

查询endpoint

ETCD_ENDPOINT=`kubectl -n milvus-gpu exec my-release-etcd-0 -- etcdctl member list -w table \
| awk '/http/ {print $8}' \
| sed 's/:2380/:2379/' \
| paste -sd ','`

获取leader节点

kubectl -n milvus-gpu exec my-release-etcd-0  -- etcdctl --endpoints=$ETCD_ENDPOINT endpoint status -w table

查询数据有多少条记录

kubectl -n milvus exec my-release-etcd-0  -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379  get --prefix "" | grep -c '^'

备份etcd数据到本地

kubectl -n milvus-gpu exec my-release-etcd-0  -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379  snapshot save /tmp/etcd_backup_202603180943.tar.gz

查看备份文件的数据

kubectl -n milvus-gpu exec my-release-etcd-0  -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379 snapshot status /tmp/etcd_backup_202603180943.tar.gz -w table
Deprecated: Use `etcdutl snapshot status` instead.

+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 6f82a4f2 | 19198020 | 26607 | 11 MB |
+----------+----------+------------+------------+

同步文件到恢复节点

scp etcd_backup_20240808.tar.gz rke01:/root/
scp etcd_backup_20240808.tar.gz rke02:/root/
scp etcd_backup_20240808.tar.gz rke03:/root/

数据恢复

确认上传的恢复数据和备份数据保持一致:

etcdctl  snapshot status etcd_backup_20240809.tar.gz -w table
Deprecated: Use `etcdutl snapshot status` instead.

+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 8c54a0ac | 15687011 | 10141 | 4.8 MB |
+----------+----------+------------+------------+

恢复rke01:

etcdctl snapshot restore etcd_backup_20240812.tar.gz \
--data-dir=/opt/etcd --name rke01 \
--initial-advertise-peer-urls http://192.168.1.11:2380 \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

恢复rke02:

etcdctl snapshot restore etcd_backup_20240812.tar.gz \
--data-dir=/opt/etcd --name rke02 \
--initial-advertise-peer-urls http://192.168.1.16:2380 \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

恢复rke03:

etcdctl snapshot restore etcd_backup_20240812.tar.gz \
--data-dir=/opt/etcd --name rke03 \
--initial-advertise-peer-urls http://192.168.1.15:2380 \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

通过删除member恢复数据

设置endpoint环境变量

export ETCDCTL_ENDPOINTS="`etcdctl member list | awk -F ',' '{print $5}' | tr -d ' ' | paste -sd, -`"

查看集群member

etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.97.235:2379 | aa82fb5b4c259753 | 3.5.18 | 273 MB | false | false | 9 | 17075349 | 17075349 | |
| http://192.168.97.57:2379 | 7b1c5ca31530f6ee | 3.5.18 | 2.6 GB | false | false | 9 | 17075349 | 17075349 | |
| http://192.168.97.151:2379 | f4aa7f67ae39563f | 3.5.18 | 273 MB | true | false | 9 | 17075349 | 17075349 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

删除member

etcdctl member remove 7b1c5ca31530f6ee

指定删除member对应的etcd服务

systemctl stop etcd

删除data目录

cd /data/etcd/
mv data data.202510091837

添加member

etcdctl --endpoints=http://192.168.97.151:2379 member add etcd1 --peer-urls=http://192.168.97.57:2380
Member 2808db15300b50d3 added to cluster 4cd24e62559e720c

ETCD_NAME="etcd2"
ETCD_INITIAL_CLUSTER="etcd2=http://192.168.97.57:2380,etcd2=http://192.168.97.235:2380,etcd0=http://192.168.97.151:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.97.57:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

编辑配置文件

initial-cluster-state: existing

启动etcd服务

systemctl start etcd

检查集群状态

etcdctl endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.97.57:2379 | 205a21841c52fa4d | 3.5.18 | 273 MB | false | false | 9 | 17078416 | 17078416 | |
| http://192.168.97.235:2379 | aa82fb5b4c259753 | 3.5.18 | 273 MB | false | false | 9 | 17078416 | 17078416 | |
| http://192.168.97.151:2379 | f4aa7f67ae39563f | 3.5.18 | 273 MB | true | false | 9 | 17078416 | 17078416 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

部署ETCD集群

启动参数描述

选项 描述
--name 节点名称。
--data-dir 服务运行数据保存的路径。
--snapshot-count 触发截取快照保存到磁盘的事务提交次数。
--heartbeat-interval Leader发送一次心跳到followers的时间间隔(毫秒)。
--election-timeout 重新投票的超时时间,follower未收到心跳包触发重新投票的时间(毫秒)。
--listen-peer-urls 和同伴通信的地址。
--listen-client-urls 对外提供服务的地址。
--advertise-client-urls 对外公告的该节点客户端监听地址。
--initial-advertise-peer-urls 该节点同伴监听地址。
--initial-cluster 集群中所有节点的信息。
--initial-cluster-state 新建集群时为new;已存在集群时为existing
--initial-cluster-token 创建集群的token。

创建容器

rke01:

docker run -d --name etcd --net host -v /opt/etcd:/etcd \
--restart always quay.io/coreos/etcd:v3.5.15 \
/usr/local/bin/etcd \
--data-dir=/etcd --name rke01 \
--initial-advertise-peer-urls http://192.168.1.11:2380 --listen-peer-urls http://0.0.0.0:2380 \
--advertise-client-urls http://192.168.1.11:2379 --listen-client-urls http://0.0.0.0:2379 \
--initial-cluster-state new \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

rke02:

docker run -d --name etcd --net host -v /opt/etcd:/etcd \
--restart always quay.io/coreos/etcd:v3.5.15 \
/usr/local/bin/etcd \
--data-dir=/etcd --name rke02 \
--initial-advertise-peer-urls http://192.168.1.16:2380 --listen-peer-urls http://0.0.0.0:2380 \
--advertise-client-urls http://192.168.1.16:2379 --listen-client-urls http://0.0.0.0:2379 \
--initial-cluster-state new \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

rke03:

docker run -d --name etcd --net host -v /opt/etcd:/etcd \
--restart always quay.io/coreos/etcd:v3.5.15 \
/usr/local/bin/etcd \
--data-dir=/etcd --name rke03 \
--initial-advertise-peer-urls http://192.168.1.15:2380 --listen-peer-urls http://0.0.0.0:2380 \
--advertise-client-urls http://192.168.1.15:2379 --listen-client-urls http://0.0.0.0:2379 \
--initial-cluster-state new \
--initial-cluster-token docker-etcd \
--initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380

etcdctl命令

配置客户端

docker cp etcd:/usr/local/bin/etcdctl /usr/local/bin/
选项 描述
etcdctl snapshot save 保存etcd快照到指定文件。
etcdctl snapshot restore 从指定文件恢复etcd快照。
etcdctl endpoint health 检查etcd端点的健康状态。
etcdctl endpoint status 显示etcd端点的状态信息。
etcdctl put 设置键值对。
etcdctl get 获取键的值。
etcdctl del 删除键。
etcdctl member list 列出集群成员信息。
etcdctl member add 添加集群成员。
etcdctl member remove 移除集群成员。
etcdctl member update 更新集群成员的peer URL。

查看集群状态

etcdctl member list -w table
+------------------+---------+-------+--------------------------+--------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+--------------------------+--------------------------+------------+
| 37499ff739d6c21 | started | rke03 | http://192.168.1.15:2380 | http://192.168.1.15:2379 | false |
| 79e7c26cb0fc149 | started | rke02 | http://192.168.1.16:2380 | http://192.168.1.16:2379 | false |
| b4773de1c1f38771 | started | rke01 | http://192.168.1.11:2380 | http://192.168.1.11:2379 | false |
+------------------+---------+-------+--------------------------+--------------------------+------------+

查看恢复的所有数据

etcdctl get --prefix ""

统计数据,判断迁移是否成功

etcdctl get --prefix "" | grep -c '^'
文章作者: 慕容峻才
文章链接: https://www.acaiblog.top/K8S-ETCD集群备份与恢复/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 阿才的博客
微信打赏
支付宝打赏