K8S中Pod、Node、资源对象等监控

一、监控 K8S 集群 node

1.1、node_explorer安装配置

Pod

kubelet的节点使用cAdvisor提供的metrics接口获取该节点所有Pod和容器相关的性能指标数据,安装kubelet默认就开启了 暴露接口地址:

https://NodeIP:10255/metrics/cadvisor
https://NodeIP:10250/metrics/cadvisor
Node

需要使用node_exporter收集器采集节点资源利用率。

https://github.com/prometheus/node_exporter
使用文档:https://prometheus.io/docs/guides/node-exporter/

使用node_exporter.sh脚本分别在所有服务器上部署node_exporter收集器,不需要修改可直接运行脚本

[root@k8s-master prometheus-k8s]# cat node_exporter.sh
#!/bin/bash

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz

## 解压重命名
tar zxf node_exporter-0.17.0.linux-amd64.tar.gz
mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter

##添加系统服务
cat <<EOF >/usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service

[Install]
WantedBy=multi-user.target
EOF

##启动node_exporter
systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter

[root@k8s-master prometheus-k8s]# ./node_exporter.sh

当然如果自己公司有ansible或者saltstack那就更好解决问题了(无需下载,更快速高效!):

# ansible node -m copy -a "src=./node_exporter-0.18.1.linux-amd64.tar.gz dest=/tmp/"

# ansible node -m script -a "./node-exporter.sh"

这样的话就需要更改如上脚本的:

## 解压重命名
tar zxf /tmp/node_exporter-0.18.1.linux-amd64.tar.gz -C /usr/local/ 
mv /usr/local/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter

1.2、监控 node 组件的状态

改一下 Configmap,自动发现是没办法发现我的 node 了,我把我的 node 和 master 分开了,master顺便监控了master组件的运行状态,还有需要添加一下数据采集的间隔时间,默认是一分钟,现在改成30s,所以在Configmap里面加了这些东西:

global:
      scrape_interval:     30s

    - job_name: k8s-nodes
      static_configs:
      - targets:
        - 192.168.1.202:9100
        - 192.168.1.203:9100
        - 192.168.1.204:9100
        - 192.168.1.165:9100
        - 192.168.1.169:9100
        - 192.168.1.172:9100

    - job_name: k8s-master
      static_configs:
      - targets:
        - 192.168.1.200:9100
        - 192.168.1.201:9100
更新一下 ConfigMap,

# kubectl apply -f prometheus-configmap.yaml 
configmap/prometheus-config configured

这样就会自动重载 prometheus,还有一个问题要解决,那就是 prometheus 时间的问题,差了八小时,查出来的数据也有问题,所以直接把宿主的 localtime 文件挂到容器内就行了,所以 statefulset 也要加点配置:

    volumeMounts:
    - name: localtime
      mountPath: /etc/localtime
       
volumes:
- name: localtime
  hostPath:
    path: /usr/share/zoneinfo/Asia/Shanghai

更新一下,等重启完了看一下 web 页面,能看到这些就对了

mark

然后在 grafana 导入一个模板,ID 为 9276,稍等一会就有数据了:

mark

二、监控 K8S 资源对象

这里监控的就是 K8S 创建资源对象的状态信息,譬如说 service/deployment/replicaset/endpoints 等等,前面也提到了要监控这些东西需要用到 kube-state-metrics 组件,这个组件就是专门采集 K8S 中各种资源对象信息,比较全面。

2.1、创建rbac的yaml对metrics进行授权

[root@k8s-master prometheus-k8s]# vim kube-state-metrics-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kube-state-metrics-resizer
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-rbac.yaml

编写Deployment和ConfigMap的yaml进行metrics pod部署,不需要进行修改

[root@k8s-master prometheus-k8s]# cat kube-state-metrics-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.3.0
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
      version: v1.3.0
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
        version: v1.3.0
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: zhdya/kube-state-metrics:v1.3.0
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: zhdya/addon-resizer:1.8.3
        resources:
          limits:
            cpu: 100m
            memory: 30Mi
          requests:
            cpu: 100m
            memory: 30Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        volumeMounts:
          - name: config-volume
            mountPath: /etc/config
        command:
          - /pod_nanny
          - --config-dir=/etc/config
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics
      volumes:
        - name: config-volume
          configMap:
            name: kube-state-metrics-config
---
# Config map for resource configuration.
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-state-metrics-config
  namespace: kube-system
  labels:
    k8s-app: kube-state-metrics
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
data:
  NannyConfiguration: |-
    apiVersion: nannyconfig/v1alpha1
    kind: NannyConfiguration

[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-deployment.yaml

编写Service的yaml对metrics进行端口暴露:

[root@k8s-master prometheus-k8s]# cat kube-state-metrics-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "kube-state-metrics"
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics

[root@k8s-master prometheus-k8s]# kubectl apply -f kube-state-metrics-service.yaml

查pod和svc的状态,可以看到正常运行了pod/kube-state-metrics-7c76bdbf68-kqqgd 和对外暴露了8080和8081端口

[root@k8s-master prometheus-k8s]# kubectl get pod,svc -n kube-system
NAME                                        READY   STATUS    RESTARTS   AGE
pod/alertmanager-5d75d5688f-fmlq6           2/2     Running   0          9d
pod/coredns-5bd5f9dbd9-wv45t                1/1     Running   1          9d
pod/grafana-0                               1/1     Running   2          15d
pod/kube-state-metrics-7c76bdbf68-kqqgd     2/2     Running   6          14d
pod/kubernetes-dashboard-7d77666777-d5ng4   1/1     Running   5          16d
pod/prometheus-0                            2/2     Running   6          15d
NAME                           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/alertmanager           ClusterIP   10.0.0.207   <none>        80/TCP              13dservice/grafana                NodePort    10.0.0.74    <none>        80:30091/TCP        15dservice/kube-dns               ClusterIP   10.0.0.2     <none>        53/UDP,53/TCP       14d
service/kube-state-metrics     ClusterIP   10.0.0.194   <none>        8080/TCP,8081/TCP   14dservice/kubernetes-dashboard   NodePort    10.0.0.127   <none>        443:30001/TCP       17dservice/prometheus             NodePort    10.0.0.33    <none>        9090:30090/TCP      14d


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!