k8s 集群的容器启动顺序异常解决

一、背景

当服务器重启之后,基础组件coredns的服务还未正常启动,业务服务已经启动完成;此时的业务链接的service解析异常,导致服务启动异常;因此决定容器的启动顺序便极为关键

二、解决方案

在pod服务启动之前,对k8s的 apiserver的service进行curl检测一定次数,若不通则直接停止退出;若curl正常则继续业务启动

实现方法:

1、添加的k8s apiserver的service检测脚本

a、使用curl 命令检测 kubernetes.default:443 的service 接口是否正常

b、检测3次;当检测不正常时,则退出返回 错误码:4    ;  当正常时 直接进入下一步

#!/bin/bash
count=0
while [ $count -le 3 ]
do
        date=`date "+%Y-%m-%d-%H:%M:%s"`
        sleep 2
         ### 检测kubernetes的 apiserver service的 是否正常
        curl -s "kubernetes.default:443"                         
        if [ $? == 0 ];then
                ### 日志输出至标准输出
                 echo "$date Apiserver Service Is Ready " >> /dev/stdout  
                break
                 
        else
                ((count=$count + 1))
                 ### 日志输出至标准输出
                echo "$date Apiserver Service Is NotReady" >> /dev/stdout
        fi
        if [ $count -ge 4 ];then
            ### 返回错误码 4
            exit 4         
        fi
done

2、将apiserver 的service 检测脚本,在制作服务镜像时添加进去

1、Dockerfile 打镜像配置
例:
FROM docker.XXX.com:15000/XXX-backend:2.3.0
ADD check-coredns.sh /
ADD start.sh /
CMD ["/start.sh"]
 
 
2、start.sh 容器的启动脚本
例:
#!/bin/bash
 
rm -rf /log
mkdir /log
 
 ####  执行 kubernetes的 service的检测
sh /check-coredns.sh            
if [ $? == 4 ];then
  ### 当脚本返回为 错误码 4 时  则退出执行服务重启
    exit 4                     
fi
 
mkdir -p /root/log/${HOSTNAME}_${PROJECT_NS}_${PROJECT_NAME}
ln -s  /root/log/${HOSTNAME}_${PROJECT_NS}_${PROJECT_NAME}  /log/user
nohup /app_home/bin/config_generate &
/usr/local/tomcat/bin/catalina.sh run

check-coredns.sh (见第1、k8s apiserver的service检测脚本)

2、制作docker的镜像
例:
docker build -t="docker.XXX.com:15000/XXX-backend:2.3.3" .
docker push docker.XXX.com:15000/XXX-backend:2.3.3

3、coredns 版本更新

coredns的版本由:1.2.6 升至 1.6.2版本 升级的yaml如下

coredns_deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2020-03-24T12:09:31Z"
  generation: 1
  labels:
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "9078"
  selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns
  uid: 5f31329d-1ef3-465e-8ffb-453d9fa29e5c
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kube-dns
    spec:
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: docker.XXX.com:15000/coredns:1.6.2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/coredns
          name: config-volume
          readOnly: true
      dnsPolicy: Default
      nodeSelector:
        beta.kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: coredns
      serviceAccountName: coredns
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: coredns
        name: config-volume
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2020-03-24T13:48:23Z"
    lastUpdateTime: "2020-03-24T13:48:23Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-03-24T13:48:23Z"
    lastUpdateTime: "2020-03-24T13:48:25Z"
    message: ReplicaSet "coredns-67c766df46" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2

coredns_svc.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  creationTimestamp: "2020-03-24T12:09:31Z"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: KubeDNS
  name: kube-dns
  namespace: kube-system
  resourceVersion: "190"
  selfLink: /api/v1/namespaces/kube-system/services/kube-dns
  uid: f280fae8-cb9c-4ff4-ac3f-13103ec9fd38
spec:
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  - name: metrics
    port: 9153
    protocol: TCP
    targetPort: 9153
  selector:
    k8s-app: kube-dns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

coredns_config.yaml

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30           
        }
        hosts {
           10.68.7.129 mon01
           10.68.7.128 mon02
           10.68.7.127 mon03
           ttl 60
           fallthrough
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2020-03-24T12:09:31Z"
  name: coredns
  namespace: kube-system
  resourceVersion: "184"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 76518f5f-6e94-4be6-b307-dc3e753ee373

]# kubectl  delete  -f   coredns_config.yaml;kubectl  delete -f  coredns_svc.yaml ;kubectl  delete  -f  coredns_deployment.yaml
]# kubectl  create  -f   coredns_config.yaml;kubectl  create -f  coredns_svc.yaml ;kubectl  create  -f  coredns_deployment.yaml
 
 
变更:
cat  coredns_deployment.yaml
..........
..........
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
...........
...........
添加了就绪探针

4、测试方法

1、获取coredns的yaml
kubectl -n kube-system get deploy coredns -o yaml > coredns.yaml
2、停止coredns服务
kubectl delete -f coredns.yaml
3、使用新的镜像重建服务
cat XXX-backend.yaml
......
        image: docker.XXX.com:15000/XXX-backend:2.3.3
        imagePullPolicy: IfNotPresent
......
 
 
kubectl  delete -f XXX-backend.yaml
kubectl  create -f XXX-backend.yaml
 
 
4、查看日志发现
[root@k8s-10 ops]# kubectl  -n XXX-project-namespace logs -f XXX-backend-7957bd486d-k9mwt
2020-04-14-23:55:1586879711 Apiserver Service Is NotReady
2020-04-14-23:55:1586879733 Apiserver Service Is NotReady
2020-04-14-23:55:1586879756 Apiserver Service Is NotReady
2020-04-14-23:56:1586879778 Apiserver Service Is NotReady
2020-04-14-23:56:1586879801 Apiserver Service Is NotReady
2020-04-14-23:57:1586879824 Apiserver Service Is NotReady
2020-04-14-23:57:1586879846 Apiserver Service Is NotReady
2020-04-14-23:57:1586879869 Apiserver Service Is NotReady
2020-04-14-23:58:1586879891 Apiserver Service Is NotReady
2020-04-14-23:58:1586879914 Apiserver Service Is NotReady
由于coredns 没能正常启动 XXX-backend 无法正常启动
一段时间后便会重启
 
 
[root@k8s-10 ops]# kubectl get pods --all-namespaces |grep XXX-backend
XXX-project-namespace   XXX-backend-7957bd486d-k9mwt                0/1     Running     1          4m40s
XXX-project-namespace   XXX-backend-7957bd486d-mwt9k                0/1     Running     1          4m40s
XXX-project-namespace   XXX-backend-7957bd486d-zzq9b                0/1     Running     1          4m40s
 
 
此处restart 变为 1
 
 
5、启动coredns 的组件
[root@k8s-10 ops]# kubectl  -n XXX-project-namespace logs -f XXX-backend-7957bd486d-k9mwt
2020-04-14-23:59:1586879945 Apiserver Service Is NotReady
2020-04-14-23:59:1586879967 Apiserver Service Is NotReady
2020-04-14-23:59:1586879990 Apiserver Service Is NotReady
2020-04-15-00:00:1586880012 Apiserver Service Is NotReady
2020-04-15-00:00:1586880035 Apiserver Service Is NotReady
2020-04-15-00:00:1586880057 Apiserver Service Is NotReady
2020-04-15-00:01:1586880080 Apiserver Service Is NotReady
2020-04-15-00:01:1586880102 Apiserver Service Is NotReady
 
 #### 此处发现service 已正常  业务已经正常启动
2020-04-15-00:02:1586880120 Apiserver Service Is Ready                    
/usr/local/tomcat/webapps/dops/WEB-INF/classes/
Java HotSpot(TM) 64-Bit Server VM warning: Cannot open file /log/jvm/gcdetail.log due to No such file or directory
 
15-Apr-2020 00:02:03.228 WARNING [main] org.apache.catalina.startup.SetAllPropertiesRule.begin [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'maxPostSize' to '10737418240' did not find a matching property.
15-Apr-2020 00:02:03.275 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.5.27
15-Apr-2020 00:02:03.275 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jan 18 2018 20:12:40 UTC
15-Apr-2020 00:02:03.275 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.5.27.0
15-Apr-2020 00:02:03.276 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux

结论

上述镜像制作方法可实现:服务在能正常解析集群内服务service之后启动


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!