参考链接:https://www.cnblogs.com/xishuai/p/docker-etcd.html
数据备份 查询endpoint
ETCD_ENDPOINT=`kubectl -n milvus-gpu exec my-release-etcd-0 -- etcdctl member list -w table \ | awk '/http/ {print $8}' \ | sed 's/:2380/:2379/' \ | paste -sd ','`
获取leader节点
kubectl -n milvus-gpu exec my-release-etcd-0 -- etcdctl --endpoints=$ETCD_ENDPOINT endpoint status -w table
查询数据有多少条记录
kubectl -n milvus exec my-release-etcd-0 -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379 get --prefix "" | grep -c '^'
备份etcd数据到本地
kubectl -n milvus-gpu exec my-release-etcd-0 -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379 snapshot save /tmp/etcd_backup_202603180943.tar.gz
查看备份文件的数据
kubectl -n milvus-gpu exec my-release-etcd-0 -- etcdctl --endpoints=http://my-release-etcd-2.my-release-etcd-headless.milvus-gpu.svc.cluster.local:2379 snapshot status /tmp/etcd_backup_202603180943.tar.gz -w table Deprecated: Use `etcdutl snapshot status` instead. +----------+----------+------------+------------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +----------+----------+------------+------------+ | 6f82a4f2 | 19198020 | 26607 | 11 MB | +----------+----------+------------+------------+
同步文件到恢复节点
scp etcd_backup_20240808.tar.gz rke01:/root/ scp etcd_backup_20240808.tar.gz rke02:/root/ scp etcd_backup_20240808.tar.gz rke03:/root/
数据恢复 确认上传的恢复数据和备份数据保持一致:
etcdctl snapshot status etcd_backup_20240809.tar.gz -w table Deprecated: Use `etcdutl snapshot status` instead. +----------+----------+------------+------------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | +----------+----------+------------+------------+ | 8c54a0ac | 15687011 | 10141 | 4.8 MB | +----------+----------+------------+------------+
恢复rke01:
etcdctl snapshot restore etcd_backup_20240812.tar.gz \ --data-dir=/opt/etcd --name rke01 \ --initial-advertise-peer-urls http://192.168.1.11:2380 \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
恢复rke02:
etcdctl snapshot restore etcd_backup_20240812.tar.gz \ --data-dir=/opt/etcd --name rke02 \ --initial-advertise-peer-urls http://192.168.1.16:2380 \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
恢复rke03:
etcdctl snapshot restore etcd_backup_20240812.tar.gz \ --data-dir=/opt/etcd --name rke03 \ --initial-advertise-peer-urls http://192.168.1.15:2380 \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
通过删除member恢复数据 设置endpoint环境变量
export ETCDCTL_ENDPOINTS="`etcdctl member list | awk -F ',' '{print $5}' | tr -d ' ' | paste -sd, -`"
查看集群member
etcdctl endpoint status -w table +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.97.235:2379 | aa82fb5b4c259753 | 3.5.18 | 273 MB | false | false | 9 | 17075349 | 17075349 | | | http://192.168.97.57:2379 | 7b1c5ca31530f6ee | 3.5.18 | 2.6 GB | false | false | 9 | 17075349 | 17075349 | | | http://192.168.97.151:2379 | f4aa7f67ae39563f | 3.5.18 | 273 MB | true | false | 9 | 17075349 | 17075349 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
删除member
etcdctl member remove 7b1c5ca31530f6ee
指定删除member对应的etcd服务
删除data目录
cd /data/etcd/ mv data data.202510091837
添加member
etcdctl --endpoints=http://192.168.97.151:2379 member add etcd1 --peer-urls=http://192.168.97.57:2380 Member 2808db15300b50d3 added to cluster 4cd24e62559e720c ETCD_NAME="etcd2" ETCD_INITIAL_CLUSTER="etcd2=http://192.168.97.57:2380,etcd2=http://192.168.97.235:2380,etcd0=http://192.168.97.151:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.97.57:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
编辑配置文件
initial-cluster-state: existing
启动etcd服务
检查集群状态
etcdctl endpoint status -w table +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://192.168.97.57:2379 | 205a21841c52fa4d | 3.5.18 | 273 MB | false | false | 9 | 17078416 | 17078416 | | | http://192.168.97.235:2379 | aa82fb5b4c259753 | 3.5.18 | 273 MB | false | false | 9 | 17078416 | 17078416 | | | http://192.168.97.151:2379 | f4aa7f67ae39563f | 3.5.18 | 273 MB | true | false | 9 | 17078416 | 17078416 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
部署ETCD集群 启动参数描述
选项
描述
--name
节点名称。
--data-dir
服务运行数据保存的路径。
--snapshot-count
触发截取快照保存到磁盘的事务提交次数。
--heartbeat-interval
Leader发送一次心跳到followers的时间间隔(毫秒)。
--election-timeout
重新投票的超时时间,follower未收到心跳包触发重新投票的时间(毫秒)。
--listen-peer-urls
和同伴通信的地址。
--listen-client-urls
对外提供服务的地址。
--advertise-client-urls
对外公告的该节点客户端监听地址。
--initial-advertise-peer-urls
该节点同伴监听地址。
--initial-cluster
集群中所有节点的信息。
--initial-cluster-state
新建集群时为new;已存在集群时为existing。
--initial-cluster-token
创建集群的token。
创建容器
rke01:
docker run -d --name etcd --net host -v /opt/etcd:/etcd \ --restart always quay.io/coreos/etcd:v3.5.15 \ /usr/local/bin/etcd \ --data-dir=/etcd --name rke01 \ --initial-advertise-peer-urls http://192.168.1.11:2380 --listen-peer-urls http://0.0.0.0:2380 \ --advertise-client-urls http://192.168.1.11:2379 --listen-client-urls http://0.0.0.0:2379 \ --initial-cluster-state new \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
rke02:
docker run -d --name etcd --net host -v /opt/etcd:/etcd \ --restart always quay.io/coreos/etcd:v3.5.15 \ /usr/local/bin/etcd \ --data-dir=/etcd --name rke02 \ --initial-advertise-peer-urls http://192.168.1.16:2380 --listen-peer-urls http://0.0.0.0:2380 \ --advertise-client-urls http://192.168.1.16:2379 --listen-client-urls http://0.0.0.0:2379 \ --initial-cluster-state new \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
rke03:
docker run -d --name etcd --net host -v /opt/etcd:/etcd \ --restart always quay.io/coreos/etcd:v3.5.15 \ /usr/local/bin/etcd \ --data-dir=/etcd --name rke03 \ --initial-advertise-peer-urls http://192.168.1.15:2380 --listen-peer-urls http://0.0.0.0:2380 \ --advertise-client-urls http://192.168.1.15:2379 --listen-client-urls http://0.0.0.0:2379 \ --initial-cluster-state new \ --initial-cluster-token docker-etcd \ --initial-cluster rke01=http://192.168.1.11:2380,rke02=http://192.168.1.16:2380,rke03=http://192.168.1.15:2380
etcdctl命令
配置客户端
docker cp etcd:/usr/local/bin/etcdctl /usr/local/bin/
选项
描述
etcdctl snapshot save
保存etcd快照到指定文件。
etcdctl snapshot restore
从指定文件恢复etcd快照。
etcdctl endpoint health
检查etcd端点的健康状态。
etcdctl endpoint status
显示etcd端点的状态信息。
etcdctl put
设置键值对。
etcdctl get
获取键的值。
etcdctl del
删除键。
etcdctl member list
列出集群成员信息。
etcdctl member add
添加集群成员。
etcdctl member remove
移除集群成员。
etcdctl member update
更新集群成员的peer URL。
查看集群状态
etcdctl member list -w table +------------------+---------+-------+--------------------------+--------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-------+--------------------------+--------------------------+------------+ | 37499ff739d6c21 | started | rke03 | http://192.168.1.15:2380 | http://192.168.1.15:2379 | false | | 79e7c26cb0fc149 | started | rke02 | http://192.168.1.16:2380 | http://192.168.1.16:2379 | false | | b4773de1c1f38771 | started | rke01 | http://192.168.1.11:2380 | http://192.168.1.11:2379 | false | +------------------+---------+-------+--------------------------+--------------------------+------------+
查看恢复的所有数据
统计数据,判断迁移是否成功
etcdctl get --prefix "" | grep -c '^'