📋 文档概述
本文档详细说明如何在 Kubernetes 环境中部署 MySQL 监控系统,监控多个环境的 MySQL 实例(dev、uat 等)。
🎯 架构组件
Prometheus Operator: 管理 Prometheus 实例
mysqld_exporter: MySQL 指标导出器
Grafana: 数据可视化
Kubernetes: 容器编排平台
🔧 前置条件
Kubernetes 集群(已安装 KubeSphere)
kubectl 命令行工具
访问 MySQL 实例的权限
第一部分:MySQL 服务器配置
1.1 创建监控用户
在每个 MySQL 服务器上执行:
-- 创建专用监控用户
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'StrongPassword123!' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
GRANT SELECT ON performance_schema.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;1.2 安装和配置 mysqld_exporter
# 下载 mysqld_exporter
MYSQLD_EXPORTER_VERSION="0.17.2"
wget https://github.com/prometheus/mysqld_exporter/releases/download/v${MYSQLD_EXPORTER_VERSION}/mysqld_exporter-${MYSQLD_EXPORTER_VERSION}.linux-amd64.tar.gz
# 解压安装
tar xvf mysqld_exporter-${MYSQLD_EXPORTER_VERSION}.linux-amd64.tar.gz
sudo mv mysqld_exporter-${MYSQLD_EXPORTER_VERSION}.linux-amd64/mysqld_exporter /usr/local/bin/
sudo chmod +x /usr/local/bin/mysqld_exporter
# 创建系统用户
sudo useradd -rs /bin/false mysqld_exporter
# 创建配置文件(根据环境调整密码)
sudo tee /etc/mysqld_exporter.cnf > /dev/null <<EOF
[client]
user=exporter
password=StrongPassword123!
host=localhost
port=3306
EOF
sudo chown mysqld_exporter:mysqld_exporter /etc/mysqld_exporter.cnf
sudo chmod 600 /etc/mysqld_exporter.cnf1.3 创建 systemd 服务
sudo tee /etc/systemd/system/mysqld_exporter.service > /dev/null <<'EOF'
[Unit]
Description=Prometheus MySQL Exporter
After=network.target
[Service]
User=mysqld_exporter
Environment=DATA_SOURCE_NAME="exporter:StrongPassword123!@(localhost:3306)/"
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf=/etc/mysqld_exporter.cnf \
--collect.global_status \
--collect.info_schema.tables \
--collect.info_schema.tablestats \
--collect.info_schema.innodb_metrics \
--collect.perf_schema.tablelocks \
--collect.perf_schema.file_events \
--collect.perf_schema.eventswaits \
--collect.perf_schema.indexiowaits \
--collect.perf_schema.tableiowaits \
--collect.slave_status \
--collect.auto_increment.columns \
--collect.info_schema.processlist \
--collect.binlog_size \
--collect.info_schema.query_response_time \
--collect.info_schema.userstats \
--web.listen-address=0.0.0.0:9104
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
sudo systemctl daemon-reload
sudo systemctl enable mysqld_exporter
sudo systemctl start mysqld_exporter
# 验证服务
sudo systemctl status mysqld_exporter
curl http://localhost:9104/metrics | grep mysql_up1.4 防火墙配置
# 开放 9104 端口
sudo firewall-cmd --add-port=9104/tcp --permanent
sudo firewall-cmd --reload
# 验证端口可访问性
curl http://<服务器IP>:9104/metrics | grep mysql_up第二部分:Kubernetes 配置
2.1 创建抓取配置 Secret
创建 mysql-scrape-config.yaml:
apiVersion: v1
kind: Secret
metadata:
name: additional-scrape-configs
namespace: kubesphere-monitoring-system
type: Opaque
stringData:
mysql-scrape.yaml: |
- job_name: 'mysql-dev'
static_configs:
- targets: ['192.168.181.153:9104']
labels:
instance: 'mysql-dev'
environment: 'dev'
account: 'dev-account'
group: 'dev-group'
vendor: 'internal'
region: 'cn-east-1'
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
- job_name: 'mysql-uat'
static_configs:
- targets: ['192.168.181.121:9104']
labels:
instance: 'mysql-uat'
environment: 'uat'
account: 'uat-account'
group: 'uat-group'
vendor: 'internal'
region: 'cn-east-1'
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metricskubectl apply -f mysql-scrape-config.yaml2.2 更新 Prometheus CRD
获取当前 Prometheus 配置:
kubectl -n kubesphere-monitoring-system get prometheus k8s -o yaml > prometheus-crd.yaml编辑 prometheus-crd.yaml,在 spec部分添加:
spec:
additionalScrapeConfigs:
name: additional-scrape-configs
key: mysql-scrape.yaml
# ... 其他现有配置保持不变应用更新:
kubectl apply -f prometheus-crd.yaml2.3 验证配置
# 检查 Secret
kubectl -n kubesphere-monitoring-system get secret additional-scrape-configs
# 检查 Prometheus 配置
kubectl -n kubesphere-monitoring-system port-forward svc/prometheus-k8s 9090
# 访问 http://localhost:9090/config 查看配置
# 检查目标状态
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.scrapePool | startswith("mysql"))'第三部分:Grafana 配置
3.1 添加 Prometheus 数据源
如果尚未添加,在 Grafana 中:
导航到 Configuration → Data Sources
点击 Add data source
选择 Prometheus
配置:
Name:
Prometheus-MySQLURL:
http://prometheus-k8s.kubesphere-monitoring-system.svc.cluster.local:9090
点击 Save & Test
3.2 导入 MySQL 监控仪表盘
方法一:使用官方仪表盘 ID
导航到 Dashboards → New → Import
输入仪表盘 ID:
7362(MySQL Overview)选择数据源:
Prometheus-MySQL点击 Import
方法二:使用自定义 JSON
创建 mysql-dashboard.json:
{
"dashboard": {
"title": "MySQL 多环境监控",
"tags": ["mysql", "database"],
"timezone": "browser",
"panels": [...],
"templating": {
"list": [
{
"name": "environment",
"type": "query",
"query": "label_values(mysql_up, environment)",
"refresh": 1,
"includeAll": true,
"multi": true,
"allValue": ".*"
},
{
"name": "instance",
"type": "query",
"query": "label_values(mysql_up{environment=~\"$environment\"}, instance)",
"refresh": 1,
"includeAll": true,
"multi": true
}
]
}
}
}导入自定义仪表盘。
3.3 配置告警规则(可选)
创建 mysql-alerts.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: mysql-alerts
namespace: kubesphere-monitoring-system
labels:
prometheus: k8s
role: alert-rules
spec:
groups:
- name: mysql.rules
rules:
- alert: MySQLDown
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL instance {{ $labels.instance }} is down"
description: "MySQL instance {{ $labels.instance }} has been down for more than 1 minute."
- alert: MySQLHighConnections
expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8
for: 2m
labels:
severity: warning
annotations:
summary: "High MySQL connections on {{ $labels.instance }}"
description: "MySQL connections are at {{ $value | humanizePercentage }} of maximum."
- alert: MySQLSlowQueries
expr: rate(mysql_global_status_slow_queries[5m]) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "High slow query rate on {{ $labels.instance }}"
description: "Slow query rate is {{ $value }} queries per second."应用告警规则:
kubectl apply -f mysql-alerts.yaml第四部分:故障排查和验证
4.1 常见问题排查
问题 1:mysqld_exporter 无法连接 MySQL
# 测试 MySQL 连接
mysql -u exporter -p'StrongPassword123!' -h localhost -e "SHOW STATUS LIKE 'Uptime';"
# 检查 exporter 日志
journalctl -u mysqld_exporter -f问题 2:Prometheus 无法抓取指标
# 检查网络连通性
kubectl run -it --rm --image=curlimages/curl test-curl -- curl http://192.168.181.153:9104/metrics
# 检查防火墙
sudo firewall-cmd --list-ports | grep 9104问题 3:Grafana 无数据
# 检查 Prometheus 数据
curl http://localhost:9090/api/v1/query?query=mysql_up
# 检查标签匹配
curl http://localhost:9090/api/v1/series?match[]=mysql_up4.2 验证步骤
# 1. 验证 mysqld_exporter 工作
curl http://192.168.181.153:9104/metrics | grep mysql_up
# 2. 验证 Prometheus 抓取
kubectl -n kubesphere-monitoring-system exec -it prometheus-k8s-0 -c prometheus -- \
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.scrapePool | startswith("mysql"))'
# 3. 验证指标存在
kubectl -n kubesphere-monitoring-system port-forward svc/prometheus-k8s 9090
curl http://localhost:9090/api/v1/query?query=mysql_up
# 4. 验证 Grafana 仪表盘
# 访问 Grafana,检查 MySQL 仪表盘是否显示数据4.3 性能优化建议
调整抓取间隔:
开发环境:30-60秒
生产环境:15-30秒
限制收集的指标:
# 只收集必要的指标 --collect.global_status \ --collect.info_schema.tables \ --collect.slave_status设置数据保留策略:
spec: retention: 7d # 根据需求调整
第五部分:维护和监控
5.1 日常监控检查清单
[ ] MySQL 实例状态 (
mysql_up)[ ] 连接数监控 (
mysql_global_status_threads_connected)[ ] 查询性能 (
mysql_global_status_queries)[ ] 复制状态 (
mysql_global_status_slave_status)[ ] 缓冲池使用率 (
mysql_global_status_innodb_buffer_pool_pages_total)
5.2 备份和恢复
备份配置:
# 备份 Prometheus 配置
kubectl -n kubesphere-monitoring-system get secret additional-scrape-configs -o yaml > additional-scrape-configs-backup.yaml
# 备份仪表盘配置
# 在 Grafana 中导出仪表盘 JSON恢复配置:
# 恢复 Secret
kubectl apply -f additional-scrape-configs-backup.yaml
# 重启 Prometheus 使配置生效
kubectl -n kubesphere-monitoring-system delete pods -l app.kubernetes.io/name=prometheus5.3 扩展监控
添加新的 MySQL 实例:
在新服务器上部署 mysqld_exporter
更新
additional-scrape-configsSecret添加新的 job 配置
监控其他数据库指标:
慢查询日志分析
表空间使用情况
锁等待统计
复制延迟监控
📞 支持信息
日志位置
mysqld_exporter:
/var/log/mysqld_exporter.log或journalctl -u mysqld_exporterPrometheus:
kubectl logs -n kubesphere-monitoring-system prometheus-k8s-0Grafana:
kubectl logs -n kubesphere-monitoring-system grafana-pod-name
有用的命令
# 检查所有 MySQL 目标状态
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.scrapePool | startswith("mysql")) | {job: .scrapePool, instance: .labels.instance, health: .health}'
# 检查特定指标
curl http://localhost:9090/api/v1/query?query=mysql_global_status_connections
# 强制重载配置
curl -X POST http://localhost:9090/-/reload✅ 完成检查清单
[ ] 所有 MySQL 实例已部署 mysqld_exporter
[ ] Kubernetes Secret 配置正确
[ ] Prometheus 能够抓取所有 MySQL 指标
[ ] Grafana 仪表盘正常显示数据
[ ] 告警规则配置正确
[ ] 防火墙端口已开放
[ ] 监控用户权限正确配置