Prometheus在K8S中的应用和配置
一、在K8S平台部署Prometheus
IP | 角色 |
---|---|
192.168.73.136 | nfs |
192.168.73.138 | k8s-master |
192.168.73.139 | k8s-node1 |
192.168.73.140 | k8s-node2 |
可以直接选择git clone项目代码,也可以后面自己编写,后面的操作也会有实现代码
[root@k8s-master src]# git clone https://gitee.com/zhdya/K8S-Monitor.git
Cloning into 'k8s-prometheus'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
[root@k8s-master src]# cd k8s-prometheus/
[root@k8s-master k8s-prometheus]# ls
alertmanager-configmap.yaml kube-state-metrics-rbac.yaml prometheus-rbac.yaml
alertmanager-deployment.yaml kube-state-metrics-service.yaml prometheus-rules.yaml
alertmanager-pvc.yaml node_exporter-0.17.0.linux-amd64.tar.gz prometheus-service.yaml
alertmanager-service.yaml node_exporter.sh prometheus-statefulset-static-pv.yaml
grafana.yaml OWNERS prometheus-statefulset.yaml
kube-state-metrics-deployment.yaml prometheus-configmap.yaml README.md
这些文件主要分为四部分,第一部分是 prometheus 的部署文件,包含了 configmap、rbac、service、statefulset,共四个文件,第二部分是 kube-state-metrics,获取资源对象的部署文件,第三部分为 node-exporter 部署文件,第四部分 alertmanager 部署文件,这个是用来告警的,开始部署吧,先部署 prometheus。
1.1、使用RBAC进行授权
首先需要部署 rbac、configmap
RBAC(Role-Based Access Control,基于角色的访问控制):负责完成授权(Authorization)工作。
[root@k8s-master prometheus-k8s]# vim prometheus-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
创建:
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-rbac.yaml
1.2、配置管理
使用Configmap保存不需要加密配置信息 其中需要把nodes中ip地址根据自己的地址进行修改,其他不需要改动。
[root@k8s-master prometheus-k8s]# vim prometheus-configmap.yaml
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap #
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
rule_files:
- /etc/config/rules/*.rules
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: kubernetes-nodes
scrape_interval: 30s
static_configs:
- targets:
- 192.168.73.135:9100 #ip地址根据自己的地址进行修改
- 192.168.73.138:9100
- 192.168.73.139:9100
- 192.168.73.140:9100
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-nodes-cadvisor
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __metrics_path__
replacement: /metrics/cadvisor
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:80"]
重点分析:
1、
rule_files:
- /etc/config/rules/*.rules
2、
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-configmap.yaml
这里分为两块告警规则,一个是通用的告警规则,适用所有的实例,如果实例要是挂了,会发送告警,实例被监控端的agent,还有一个node角色,这个监控每个node的CPU、内存、磁盘利用率。
1.3、有状态部署prometheus
部署 prometheus 的方式为 statefulset,也就是有状态服务,这个也用到了 PV 动态供给,它默认的模板里指定了 storageClassName 类名为 managed-nfs-storage,也就是这里定义的,
[root@k8s-master prometheus-k8s]# vim prometheus-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "zhdya/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
- name: prometheus-server
image: "prom/prometheus:v2.2.1"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
subPath: ""
- name: prometheus-rules
mountPath: /etc/config/rules
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: prometheus-rules
configMap:
name: prometheus-rules
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
storageClassName: managed-nfs-storage #存储类根据自己的存储类名字修改
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-statefulset.yaml
检查状态:
[root@k8s-master prometheus-k8s]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
prometheus-0 2/2 Running 6 14d
1.4、创建service暴露访问端口
此处使用nodePort固定一个访问端口,不适用随机端口,便于访问
[root@k8s-master prometheus-k8s]# vim prometheus-service.yaml
kind: Service
apiVersion: v1
metadata:
name: prometheus
namespace: kube-system
labels:
kubernetes.io/name: "Prometheus"
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
type: NodePort
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
nodePort: 30090 #固定的对外访问的端口
selector:
k8s-app: prometheus
[root@k8s-master prometheus-k8s]# kubectl apply -f prometheus-service.yaml
[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-5bd5f9dbd9-wv45t 1/1 Running 1 8d
pod/kubernetes-dashboard-7d77666777-d5ng4 1/1 Running 5 14d
pod/prometheus-0 2/2 Running 6 14d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.0.0.2 <none> 53/UDP,53/TCP 13d
service/kubernetes-dashboard NodePort 10.0.0.127 <none> 443:30001/TCP 16d
service/prometheus NodePort 10.0.0.33 <none> 9090:30090/TCP 13d
web访问
使用任意一个NodeIP加端口进行访问,访问地址:http://NodeIP:Port ,此例就是:http://192.168.73.139:30090

二、基于 K8S 服务发现的配置解析
2.1、K8S集群资源监控
先看一下部署 prometheus 的部署文件,他是启动了两个容器,其中一个是这个:
- name: prometheus-server-configmap-reload
image: "zhdya/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
这个容器的作用是这样,当他发现 configmap 有更新的时候,会自动重载 prometheus,现在进入到 prometheus-server 容器中解析一下他的配置文件,先是这一段:
# kubectl exec -it -n kube-system prometheus-0 -c prometheus-server sh
/prometheus $ cat /etc/config/prometheus.yml
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

下面又使用了重新标记的标签,使用了keep,也就是只保留正则匹配的标签,源标签的值包含 default;kubernetes;https,我只采集这些,所以这一段的作用就是采集 kube-apiserver,使用的方法是 https,最后就是普罗米修斯访问apiserver 使用的 CA 和 token,这是两个默认的,任何一个pod都会有这两个文件,管理员权限,再下面:
- job_name: kubernetes-nodes-kubelet
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

他是采集了每个节点的 10250/metrics,这个 10250 就是 kubelet 集成的 cAdvisor 单独暴露的端口,这个接口是采集kubelet本身暴露出来的一些指标,他的角色为 node,访问的方式还是 https,和上面的方式是一样的,再往下看:
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
这里是用来发现 endpoints 的,__开头的都是 prometheus 自有的,一般情况下不会直接使用,如果直接用的话需要重新打标签,看页面,目前只是采集到一个 coredns 的 endpoints,我目前 coredns 启动了两个:

所以你的 endpoints 想被prometheus抓取就需要加这个声明,是否被采集,采集的端口是什么,使用什么协议,都需要声明,再下面:
- job_name: kubernetes-services
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module:
- http_2xx
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_probe
- source_labels:
- __address__
target_label: __param_target
- replacement: blackbox
target_label: __address__
- source_labels:
- __param_target
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_name
这里是黑盒探测 services 是否可用,这个之前没提过,他用了 blackbox 组件,探测路径为 /probe,http探测,下面的就不贴了,都是类似的东西了,prometheus 在采集你资源的时候用的值都是在被创建对象中动态获取的,也就是需要你去定义的。
2.2、监控集群中 pod
开始监控资源了,最重要的指标就是pod了,要想监控pod很简单,上面提到了监控 pod 需要使用 cAdvisor,而 cAdvisor 已经集成到 kubelet 中了,cAdvisor 会采集所有 pod 的指标,包括宿主机的内存CPU,cAdvisor有两个暴露地址,分别是 NodeIP:{10255,10250}/metrice/cadvisor,这是在kubelet配置文件中指定的,也就是这里:
port: 10250 #暴露kubelet自身指标端口
readOnlyPort: 10255 #单独cAdvisor端口
也就是如上的这张图:

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!