K8S弹性伸缩之基于Prometheus QPS指标的HPA

基于Prometheus自定义指标缩放

资源指标只包含CPU、内存,一般来说也够了。但如果想根据自定义指标:如请求qps/5xx错误数来实现HPA,就需要使用自定义指标了,目前比较成熟的实现是 Prometheus Custom Metrics。自定义指标由Prometheus来提供,再利用k8s-prometheus-adpater聚合到apiserver,实现和核心指标(metric-server)同样的效果。

mark
1、部署Prometheus

Prometheus(普罗米修斯)是一个最初在SoundCloud上构建的监控系统。自2012年成为社区开源项目,拥有非常活跃的开发人员和用户社区。为强调开源及独立维护,Prometheus于2016年加入云原生云计算基金会(CNCF),成为继Kubernetes之后的第二个托管项目。

Prometheus 特点: - 自动采集,服务发现;

  • 多维数据模型:由度量名称和键值对标识的时间序列数据;

  • PromSQL:一种灵活的查询语言,可以利用多维数据完成复杂的查询;

  • 不依赖分布式存储,单个服务器节点可直接工作;

  • 基于HTTP的pull方式采集时间序列数据;

  • 推送时间序列数据通过PushGateway组件支持;

  • 通过服务发现或静态配置发现目标;

  • 多种图形模式及仪表盘支持(grafana);

Prometheus组成及架构:

mark
  • Prometheus Server:收集指标和存储时间序列数据,并提供查询接口

  • ClientLibrary:客户端库

  • Push Gateway:短期存储指标数据。主要用于临时性的任务

  • Exporters:采集已有的第三方服务监控指标并暴露metrics

  • Alertmanager:告警

  • Web UI:简单的Web控制台

部署:

现在node上安装:

[root@k8s-node1 ~]# yum install -y nfs-utils

NFS 配置及使用
我们在服务端创建一个共享目录 /data/share ,作为客户端挂载的远端入口,然后设置权限。

$ mkdir -p /opt/sharedata/
$ chmod 666 /opt/sharedata/

然后,修改 NFS 配置文件 /etc/exports

[root@k8s-node1 ~]# cat /etc/exports
/opt/sharedata 192.168.171.0/24(rw,sync,insecure,no_subtree_check,no_root_squash)

说明一下,这里配置后边有很多参数,每个参数有不同的含义,具体可以参考下边。此处,我配置了将 /data/share 文件目录设置为允许 IP 为该 192.168.171.0/24 区间的客户端挂载,当然,如果客户端 IP 不在该区间也想要挂载的话,可以设置 IP 区间更大或者设置为 * 即允许所有客户端挂载,例如:/home *(ro,sync,insecure,no_root_squash) 设置 /home 目录允许所有客户端只读挂载。

# 启动 NFS 服务
$ service nfs start
# 或者使用如下命令亦可
/bin/systemctl start nfs.service

[root@k8s-node1 ~]# showmount -e localhost
Export list for localhost:
/opt/sharedata 192.168.171.0/24

示例:
挂载远端目录到本地 /share 目录。

$ mount 192.168.171.11:/opt/sharedata /share
$ df -h | grep 192.168.171.11
Filesystem                 Size  Used  Avail Use% Mounted on
192.168.171.11:/opt/sharedata   27G   11G   17G   40%  /share

客户端要卸载 NFS 挂载的话,使用如下命令即可。
$ umount /share

现在master上安装:

链接:https://pan.baidu.com/s/1b4Fu8j4Flf2Lzd0naT_iRg  提取码:7l3z
从分享包中导入nfs-client.zip

# cd nfs-client
# [root@k8s-master1 nfs-client]# cat deployment.yaml
...省略
serviceAccountName: nfs-client-provisioner
containers:
  - name: nfs-client-provisioner
    image: quay.io/external_storage/nfs-client-provisioner:latest
    volumeMounts:
      - name: nfs-client-root
        mountPath: /persistentvolumes
    env:
      - name: PROVISIONER_NAME
        value: fuseim.pri/ifs
      - name: NFS_SERVER
        value: 192.168.171.12   ##nfs的server地址
      - name: NFS_PATH
        value: /opt/sharedata  ##暴露的目录
volumes:
  - name: nfs-client-root
    nfs:
      server: 192.168.171.12
      path: /opt/sharedata
...省略

[root@k8s-master1 nfs-client]# kubectl apply -f .
storageclass.storage.k8s.io/managed-nfs-storage created
serviceaccount/nfs-client-provisioner created
deployment.apps/nfs-client-provisioner created
serviceaccount/nfs-client-provisioner unchanged
clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created

[root@k8s-master1 nfs-client]# kubectl get po
NAME                                    READY   STATUS              RESTARTS   AGE
nfs-client-provisioner-9c784f97-cqzhb   1/1     running   0          2m16s

链接:https://pan.baidu.com/s/1b4Fu8j4Flf2Lzd0naT_iRg  提取码:7l3z
# cd prometheus
# kubectl apply -f .
[root@k8s-master1 nfs-client]# kubectl get po -o wide -n kube-system
NAME                             READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
coredns-6d8cfdd59d-pbbbc         1/1     Running   2          2d1h    10.244.2.15      k8s-node2     <none>           <none>
kube-flannel-ds-amd64-q8g25      1/1     Running   3          2d1h    192.168.171.13   k8s-node2     <none>           <none>
metrics-server-7dbbcf4c7-v5zpm   1/1     Running   3          47h     10.244.2.14      k8s-node2     <none>           <none>
prometheus-0                     2/2     Running   0          6m48s   10.244.3.15      k8s-node3     <none>           <none>

[root@k8s-master1 nfs-client]# kubectl get svc -n kube-system
NAME             TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
kube-dns         ClusterIP   10.0.0.2     <none>        53/UDP,53/TCP    2d1h
metrics-server   ClusterIP   10.0.0.5     <none>        443/TCP          47h
prometheus       NodePort    10.0.0.147   <none>        9090:30090/TCP   7m46s

访问Prometheus UI:http://NdeIP:30090

mark
2、 部署 Custom Metrics Adapter

但是prometheus采集到的metrics并不能直接给k8s用,因为两者数据格式不兼容,还需要另外一个组件(k8s-prometheus-adpater),将prometheus的metrics 数据格式转换成k8s API接口能识别的格式,转换以后,因为是自定义API,所以还需要用Kubernetes aggregator在主APIServer中注册,以便直接通过/apis/来访问。

https://github.com/DirectXMan12/k8s-prometheus-adapter

该 PrometheusAdapter 有一个稳定的Helm Charts,我们直接使用。

先准备下helm环境:

wget https://get.helm.sh/helm-v3.0.0-linux-amd64.tar.gz
tar zxvf helm-v3.0.0-linux-amd64.tar.gz 
mv linux-amd64/helm /usr/bin/
helm repo add stable http://mirror.azure.cn/kubernetes/charts
helm repo update
helm repo list

部署prometheus-adapter,指定prometheus地址:

# helm install prometheus-adapter stable/prometheus-adapter --namespace kube-system --set prometheus.url=http://prometheus.kube-system,prometheus.port=9090
# helm list -n kube-system
# kubectl get pods -n kube-system
NAME                                  READY   STATUS    RESTARTS   AGE
prometheus-adapter-77b7b4dd8b-ktsvx   1/1     Running   0          9m

确保适配器注册到APIServer:

[root@k8s-master1 ~]# kubectl get apiservices |grep custom
v1beta1.custom.metrics.k8s.io          kube-system/prometheus-adapter   True        87s

[root@k8s-master1 ~]# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"custom.metrics.k8s.io/v1beta1","resources":[{"name":"namespaces/kubelet_volume_stats_inodes_used","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_volume_stats_used_bytes","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_capacity_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_volume_stats_inodes_free","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_inodes_free","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_inodes_used","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_used_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_container_log_filesystem_used_bytes","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_volume_stats_inodes","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_inodes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_available_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_capacity_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_volume_stats_capacity_bytes","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_inodes_free","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"persistentvolumeclaims/kubelet_volume_stats_used_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"pods/kubelet_container_log_filesystem_used_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_container_log_filesystem_used_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_available_bytes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"namespaces/kubelet_volume_stats_available_bytes","singularName":"","namespaced":false,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_inodes","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]},{"name":"jobs.batch/kubelet_volume_stats_inodes_used","singularName":"","namespaced":true,"kind":"MetricValueList","verbs":["get"]}]}

基于QPS指标实践

部署一个应用:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: metrics-app
  name: metrics-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metrics-app
  template:
    metadata:
      labels:
        app: metrics-app
      annotations:
        prometheus.io/scrape: "true"    ##是否可以被采集数据
        prometheus.io/port: "80"        ##采集访问的端口
        prometheus.io/path: "/metrics"  ##采集访问的URL
    spec:
      containers:
      - image: zhdya/metrics-app
        name: metrics-app
        ports:
        - name: web
          containerPort: 80
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 3
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 3
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: metrics-app
  labels:
    app: metrics-app
spec:
  ports:
  - name: web
    port: 80
    targetPort: 80
  selector:
    app: metrics-app
[root@k8s-master1 hpa]# kubectl get po -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
metrics-app-7674cfb699-5l72f             1/1     Running   0          19s   10.244.1.13   k8s-node1     <none>           <none>
metrics-app-7674cfb699-btch5             0/1     Running   0          19s   10.244.2.16   k8s-node2     <none>           <none>
metrics-app-7674cfb699-kksjr             0/1     Running   0          19s   10.244.0.15   k8s-master1   <none>           <none>

[root@k8s-master1 hpa]# kubectl get svc
NAME          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes    ClusterIP   10.0.0.1     <none>        443/TCP   2d1h
metrics-app   ClusterIP   10.0.0.163   <none>        80/TCP    39s

该metrics-app暴露了一个Prometheus指标接口,可以通过访问service看到:

[root@k8s-master1 hpa]# curl 10.0.0.163/metrics
# HELP http_requests_total The amount of requests in total
# TYPE http_requests_total counter
http_requests_total 20
# HELP http_requests_per_second The amount of requests per second the latest ten seconds
# TYPE http_requests_per_second gauge
http_requests_per_second 0.5

##顺带测试下负载均衡:
[root@k8s-master1 hpa]# curl 10.0.0.163
Hello! My name is metrics-app-7674cfb699-btch5. The last 10 seconds, the average QPS has been 0.5. Total requests served: 35
[root@k8s-master1 hpa]# curl 10.0.0.163
Hello! My name is metrics-app-7674cfb699-5l72f. The last 10 seconds, the average QPS has been 0.5. Total requests served: 38
[root@k8s-master1 hpa]# curl 10.0.0.163
Hello! My name is metrics-app-7674cfb699-kksjr. The last 10 seconds, the average QPS has been 0.5. Total requests served: 37

收集到的每个容器被访问的次数:

mark

创建HPA策略:

# vi app-hpa-v2.yml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: metrics-app-hpa 
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: metrics-app
  minReplicas: 1
  maxReplicas: 8
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 800m   # 800m 即0.8个/秒
[root@k8s-master1 hpa]# kubectl get hpa
NAME              REFERENCE                TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
metrics-app-hpa   Deployment/metrics-app   <unknown>/800m   1         8         3          36s

这里使用Prometheus提供的指标测试来测试自定义指标(QPS)的自动缩放。

4、配置适配器收集特定的指标

当创建好HPA还没结束,因为适配器还不知道你要什么指标(http_requests_per_second),HPA也就获取不到Pod提供指标。

ConfigMap在default名称空间中编辑prometheus-adapter ,并seriesQuery在该rules: 部分的顶部添加一个新的:

# kubectl edit cm prometheus-adapter -n kube-system
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: prometheus-adapter
    chart: prometheus-adapter-v0.1.2
    heritage: Tiller
    release: prometheus-adapter
  name: prometheus-adapter
data:
  config.yaml: |
    rules:      ##增加如下一段:
    - seriesQuery: 'http_requests_total{kubernetes_namespace!="",kubernetes_pod_name!=""}'      ##在prometheus中就可以直接查询到这部分数据
      resources:
        overrides:
          kubernetes_namespace: {resource: "namespace"}
          kubernetes_pod_name: {resource: "pod"}
      name:
        matches: "^(.*)_total"
        as: "${1}_per_second"
      metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
...

该规则将http_requests在2分钟的间隔内收集该服务的所有Pod的平均速率。

测试API:

[root@k8s-master1 hpa]# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second"
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests_per_second"},"items":[{"describedObject":{"kind":"Pod","namespace":"default","name":"metrics-app-7674cfb699-5l72f","apiVersion":"/v1"},"metricName":"http_requests_per_second","timestamp":"2019-12-12T15:52:47Z","value":"416m"},{"describedObject":{"kind":"Pod","namespace":"default","name":"metrics-app-7674cfb699-btch5","apiVersion":"/v1"},"metricName":"http_requests_per_second","timestamp":"2019-12-12T15:52:47Z","value":"416m"},{"describedObject":{"kind":"Pod","namespace":"default","name":"metrics-app-7674cfb699-kksjr","apiVersion":"/v1"},"metricName":"http_requests_per_second","timestamp":"2019-12-12T15:52:47Z","value":"416m"}]}
[root@k8s-master1 hpa]# kubectl get hpa
NAME              REFERENCE                TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
metrics-app-hpa   Deployment/metrics-app   416m/800m   1         8         2          20m

压测:

ab -n 100000 -c 100  http://10.0.0.163/metrics

查看容器扩容的情况:

[root@k8s-master1 ~]# kubectl get po -o wide
NAME                                     READY   STATUS              RESTARTS   AGE    IP            NODE          NOMINATED NODE   READINESS GATES
metrics-app-7674cfb699-5l72f             1/1     Running             0          48m    10.244.1.13   k8s-node1     <none>           <none>
metrics-app-7674cfb699-6rht6             1/1     Running             0          16s    10.244.0.16   k8s-master1   <none>           <none>
metrics-app-7674cfb699-9ltvr             0/1     ContainerCreating   0          1s     <none>        k8s-master1   <none>           <none>
metrics-app-7674cfb699-btch5             1/1     Running             0          48m    10.244.2.16   k8s-node2     <none>           <none>
metrics-app-7674cfb699-kft7p             1/1     Running             0          16s    10.244.3.16   k8s-node3     <none>           <none>
metrics-app-7674cfb699-plhrp             0/1     ContainerCreating   0          1s     <none>        k8s-node2     <none>           <none>
metrics-app-7674cfb699-sgvln             0/1     ContainerCreating   0          1s     <none>        k8s-node1     <none>           <none>
metrics-app-7674cfb699-wr56r             0/1     ContainerCreating   0          1s     <none>        k8s-node1     <none>           <none>
nfs-client-provisioner-f9fdd5cc9-ffzbd   1/1     Running             0          8m7s   10.244.2.17   k8s-node2     <none>           <none>
[root@k8s-master1 ~]# kubectl get po -o wide
NAME                                     READY   STATUS    RESTARTS   AGE    IP            NODE          NOMINATED NODE   READINESS GATES
metrics-app-7674cfb699-5l72f             1/1     Running   0          48m    10.244.1.13   k8s-node1     <none>           <none>
metrics-app-7674cfb699-6rht6             1/1     Running   0          18s    10.244.0.16   k8s-master1   <none>           <none>
metrics-app-7674cfb699-9ltvr             0/1     Running   0          3s     10.244.0.17   k8s-master1   <none>           <none>
metrics-app-7674cfb699-btch5             1/1     Running   0          48m    10.244.2.16   k8s-node2     <none>           <none>
metrics-app-7674cfb699-kft7p             1/1     Running   0          18s    10.244.3.16   k8s-node3     <none>           <none>
metrics-app-7674cfb699-plhrp             0/1     Running   0          3s     10.244.2.18   k8s-node2     <none>           <none>
metrics-app-7674cfb699-sgvln             0/1     Running   0          3s     10.244.1.16   k8s-node1     <none>           <none>
metrics-app-7674cfb699-wr56r             0/1     Running   0          3s     10.244.1.17   k8s-node1     <none>           <none>
nfs-client-provisioner-f9fdd5cc9-ffzbd   1/1     Running   0          8m9s   10.244.2.17   k8s-node2     <none>           <none>

查看HPA状态:

[root@k8s-master1 ~]# kubectl get hpa
NAME              REFERENCE                TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
metrics-app-hpa   Deployment/metrics-app   414345m/800m   1         8         8          21m

[root@k8s-master1 ~]# kubectl describe hpa metrics-app-hpa
...省略
Metrics:                               ( current / target )
  "http_requests_per_second" on pods:  818994m / 800m
Min replicas:                          1
Max replicas:                          8
Deployment pods:                       8 current / 8 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric http_requests_per_second
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:
  Type     Reason                        Age                   From                       Message
  ----     ------                        ----                  ----                       -------
  Warning  FailedComputeMetricsReplicas  19m (x12 over 22m)    horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get pods metric value: unable to get metric http_requests_per_second: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests_per_second for pods
  Warning  FailedGetPodsMetric           7m18s (x61 over 22m)  horizontal-pod-autoscaler  unable to get metric http_requests_per_second: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests_per_second for pods
  Normal   SuccessfulRescale             88s                   horizontal-pod-autoscaler  New size: 4; reason: pods metric http_requests_per_second above target

小结

mark
1、应用程序暴露/metrics监控指标并且是prometheus数据格式;

2、通过/metrics收集每个Pod的http_request_total指标;

3、prometheus将收集到的信息汇总;

4、APIServer定时从Prometheus查询,获取request_per_second的数据;

5、HPA定期向APIServer查询以判断是否符合配置的autoscaler规则;

6、如果符合autoscaler规则,则修改Deployment的ReplicaSet副本数量进行伸缩。