k8s离线部署mivils集群

安装

配置helm

安装helm

wget https://get.helm.sh/helm-v3.15.3-linux-amd64.tar.gz
tar xf helm-v3.15.3-linux-amd64.tar.gz
cp linux-amd64/helm /usr/local/bin/

添加代理

export https_proxy=http://192.168.2.1:7890;export http_proxy=http://192.168.2.1:7890;export all_proxy=socks5://192.168.2.1:7890

添加milvus helm源

helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update

离线安装milvus

helm地址:https://github.com/zilliztech/milvus-helm.git

查看当前版本

helm search repo milvus --versions
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME CHART VERSION APP VERSION DESCRIPTION
milvus/milvus 4.2.4 2.4.7 Milvus is an open-source vector database built ...
milvus/minio 8.0.17 master High Performance, Kubernetes Native Object Storage

获取milvus_manifest.yaml

helm template my-release milvus/milvus --version 4.2.4 > milvus_manifest.yaml

指定namespace

helm template my-release milvus/milvus --version 4.1.28 --namespace milvus -f custom-values.yaml > milvus_manifest.yaml
helm template my-release milvus/milvus \
--version 4.2.56 \
--namespace milvus \
--set nodeSelector.node=milvus \
--set serviceAccount.create=true \
--set metrics.serviceMonitor.enabled=true \
--set attu.enabled=true \
--set minio.enabled=false \
--set externalS3.enabled=true \
--set externalS3.host=x.x.x.x \
--set externalS3.port=8081 \
--set externalS3.accessKey=xxxxxxxxxxxxxxx \
--set externalS3.secretKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxx \
--set externalS3.bucketName=milvus-bucket-zx \
> milvus_manifest_external_etcd.yaml

外置etcd

helm template my-release milvus/milvus \
--version 4.2.56 \
--namespace milvus \
--set nodeSelector.node=milvus \
--set serviceAccount.create=true \
--set metrics.serviceMonitor.enabled=true \
--set attu.enabled=true \
--set minio.enabled=false \
--set etcd.enabled=false \
--set externalEtcd.enabled=true \
--set externalEtcd.endpoints[0]=192.168.1.11:2379 \
--set externalEtcd.endpoints[1]=192.168.1.12:2379 \
--set externalEtcd.endpoints[2]=192.168.1.13:2379 \
--set externalS3.enabled=true \
--set externalS3.host=x.x.x.x \
--set externalS3.port=8081 \
--set externalS3.accessKey=xxxxxxxxxxxxxxx \
--set externalS3.secretKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxx \
--set externalS3.bucketName=milvus-bucket-zx \
> milvus_manifest_external_etcd.yaml

安装milvus

kubectl apply -f milvus_manifest.yaml

安装ETCD集群

下载etcd

https://github.com/etcd-io/etcd/releases?q=3.5.18&expanded=true

解压目录

tar xf etcd-v3.5.18-linux-amd64.tar.gz
cp etcd-v3.5.18-linux-amd64/etcd* /usr/local/bin/

创建目录

mkdir -p /data/etcd/{conf,data,log}

创建配置文件

cat > /data/etcd/conf/etcd.conf.yml << EOF
# /etc/etcd/etcd.conf.yml
# etcd 生产配置(适用于 Milvus)

# === 基础配置 ===
name: etcd0 # 每个节点分别设为 etcd0, etcd1, etcd2
data-dir: /data/etcd/data
listen-peer-urls: http://0.0.0.0:2380
listen-client-urls: http://0.0.0.0:2379
advertise-client-urls: http://192.168.1.10:2379 # 替换为本机实际 IP
initial-advertise-peer-urls: http://192.168.1.10:2380 # 替换为本机实际 IP

# 集群初始成员(所有节点必须完全一致)
initial-cluster: etcd0=http://192.168.1.10:2380,etcd1=http://192.168.1.11:2380,etcd2=http://192.168.1.12:2380
initial-cluster-state: new
initial-cluster-token: milvus-etcd-cluster

# === Milvus 关键优化 ===
quota-backend-bytes: 17179869184 # 16 GB,防止写满
max-txn-ops: 131072 # Milvus 要求 ≥128K
max-request-bytes: 10485760 # 10 MB,支持大请求

# === 稳定性与性能 ===
heartbeat-interval: 100 # 100ms
election-timeout: 5000 # 5s,避免网络抖动误判
auto-compaction-mode: revision # 按 revision 压缩
auto-compaction-retention: "1000" # 保留最近 1000 个 revision
snapshot-count: 50000 # 每 5 万次写入生成快照

# === 日志 ===
log-outputs:
- stdout
- /data/etcd/log/etcd.log
enable-log-rotation: true
log-rotation-config-json: |
{
"maxsize": 100,
"maxage": 7,
"compress": true
}
EOF

启动服务

cat > /etc/systemd/system/etcd.service << EOF
[Unit]
Description=etcd: distributed reliable key-value store
Documentation=https://etcd.io/docs
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/etcd --config-file=/data/etcd/conf/etcd.conf.yml
Restart=always
RestartSec=10
LimitNOFILE=65536
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
EOF
systemctl enable etcd
systemctl start etcd

查看集群状态

etcdctl member list -w table
+------------------+---------+-------+----------------------------+----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+-------+----------------------------+----------------------------+------------+
| 6ae5c40407554081 | started | etcd2 | http://192.168.97.235:2380 | http://192.168.97.235:2379 | false |
| 7b1c5ca31530f6ee | started | etcd1 | http://192.168.97.57:2380 | http://192.168.97.57:2379 | false |
| f4aa7f67ae39563f | started | etcd0 | http://192.168.97.151:2380 | http://192.168.97.151:2379 | false |
+------------------+---------+-------+----------------------------+----------------------------+------------+

额外配置

修改minio默认密码

编辑milvus_manifest.yaml

apiVersion: v1
kind: Secret
metadata:
name: my-release-minio
labels:
app: minio
chart: minio-8.0.17
release: my-release
heritage: Helm
type: Opaque
data:
# 使用base64加密
accesskey: "bWluaW9hZG1pbg=="
secretkey: "MTIzNDU2Nzg="
apiVersion: v1
kind: ConfigMap
metadata:
name: my-release-milvus
data:
default.yaml: |+
minio:
address: my-release-minio
port: 9000
accessKeyID: minioadmin
# secretAccessKey和secretkey需要保持一致
secretAccessKey: 12345678
useSSL: false
bucketName: milvus-bucket
rootPath: file
useIAM: false
cloudProvider: aws
iamEndpoint:
region:
useVirtualHost: false

minio使用StorageClass

编辑milvus_manifest.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: my-release-minio
labels:
app: minio
chart: minio-8.0.17
release: my-release
heritage: Helm
spec:
volumeClaimTemplates:
- metadata:
name: export
spec:
storageClassName: nfs-client # 指定StorageClass名称为nfs
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 2560Gi

etcd使用hostpath

由于etcd持久化存储到nfs中存在性能不足的问题,所以需要将etcd的数据使用hostpath的方式映射到宿主机

        volumeMounts:
- name: data
mountPath: /bitnami/etcd
volumes:
- name: data
hostPath:
path: /opt/milvus/etcd
type: DirectoryOrCreate
#volumeClaimTemplates:
# - metadata:
# name: data
# spec:
# accessModes:
# - "ReadWriteOnce"
# resources:
# requests:
# storage: "10Gi"

开启milvus认证

编辑milvus_manifest.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: my-release-milvus
namespace: default
data:
user.yaml: |+
common:
security:
authorizationEnabled: true
defaultRootPassword: 123456

开启认证默认权限:root/Milvus

配置minio nginx代理

编辑/etc/nginx/nginx.conf

#将nginx代理的所有请求实体的大小限制为1024M
client_max_body_size 10240M;

编辑/etc/nginx/conf.d/minio.conf

server {
listen 9000;

location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $http_host;

proxy_connect_timeout 300;
# Default is HTTP/1, keepalive is only enabled in HTTP/1.1
proxy_http_version 1.1;
proxy_set_header Connection "";
chunked_transfer_encoding off;
proxy_pass http://10.43.82.8:9000;
}
}

参考链接:https://minio.org.cn/docs/minio/linux/integrations/setup-nginx-proxy-with-minio.html

Nginx代理Milvus

server {
listen 50498 http2; # 显式启用HTTP/2支持
location / {
grpc_pass grpc://milvus_test; # 使用grpc_pass替代proxy_pass
}
}
upstream milvus_test {
server 192.168.31.42:30717;
}

已知问题

my-release-milvus-rootcoord启动失败

my-release-milvus-rootcoord启动失败,错误日志如下:

[2024/10/18 11:13:51.532 +00:00] [ERROR] [msgstream/mq_msgstream.go:138] ["retry func failed"] ["retry time"=4] [error="no partitioned metadata for topic{public/milvus-test/by-dev-rootcoord-dml_0} in lookup response"]

解决思路:检查my-release-pulsar-broker-0 pod中提示Policies not found for public/milvus-test namespace,问题原因zk没有创建public/milvus-test namespace,在zk中查询默认的namespace

root@my-release-pulsar-zookeeper-0:/pulsar# bin/pulsar-admin --admin-url http://my-release-pulsar-broker:8080 namespaces list public
"public/default"

创建namespace,即可恢复服务

bin/pulsar-admin --admin-url http://my-release-pulsar-broker:8080 namespaces create public/milvus-test

my-release-pulsar-zookeeper启动失败

zk日志报错:

java.net.UnknownHo.stException:my-release-pulsar-zookeeper-2.my-release-pulsar-zookeeper.milvus-gpu.svc.cluster.local

发现只有两个pod启动,但是my-release-pulsar-zookeeper-2 pod并没有被创建,清理zk数据目录重新部署

修改configmap

my-release-pulsar-bookie

PULSAR_MEM: |
-Xms8192m -Xmx8192m -XX:MaxDirectMemorySize=16384m
dbStorage_readAheadCacheMaxSizeMb: "64"
dbStorage_rocksDB_blockCacheSize: "8388608"
dbStorage_rocksDB_writeBufferSizeMB: "16"
dbStorage_writeCacheMaxSizeMb: "64"
nettyMaxFrameSizeBytes: "104867840"

Prometheus监控部署

参考链接:https://milvus.io/docs/zh/monitor.md

生成创建serviceMonitor配置文件

helm template my-release milvus/milvus \
--version 4.2.4 \
--namespace milvus \
--set metrics.serviceMonitor.enabled=true \
--show-only templates/servicemonitor.yaml > servicemonitor.yaml

生成文件如下

---
# Source: milvus/templates/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-release-milvus
namespace: milvus
labels:
helm.sh/chart: milvus-4.2.4
app.kubernetes.io/name: milvus
app.kubernetes.io/instance: my-release
app.kubernetes.io/version: "2.4.7"
app.kubernetes.io/managed-by: Helm
spec:
endpoints:
- honorLabels: true
interval: 30s
scrapeTimeout: 10s
path: /metrics
port: metrics
namespaceSelector:
matchNames:
- milvus
selector:
matchLabels:
app.kubernetes.io/name: milvus
app.kubernetes.io/instance: my-release
targetLabels:
- app.kubernetes.io/name
- app.kubernetes.io/instance
- component

访问https://github.com/prometheus-operator/kube-prometheus,根据k8s版本确定分支

创建monitor crd

kubectl apply -f https://github.com/prometheus-operator/kube-prometheus/blob/release-0.9/manifests/setup/prometheus-operator-0servicemonitorCustomResourceDefinition.yaml

创建serviceMonitor

kubectl apply -f servicemonitor.yaml

Prometheus server配置milvus job

- job_name: milvus
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [milvus] # 仅监控milvus命名空间
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: milvus
action: keep
文章作者: 慕容峻才
文章链接: https://www.acaiblog.top/k8s离线部署mivils集群/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 阿才的博客
微信打赏
支付宝打赏