Generic Status Checks

Get all clusters managed by kops (for a specific state store):

KOPS_STATE_STORE=s3://foo-bar kops get clusters

Kops validate cluster checks the following:

  • All k8s masters are running and have “Ready” status.
  • All k8s nodes are running and have “Ready” status.
  • Component status returns healthy for all components.
  • All pods in the kube-system namespace are running and healthy.
# uses state store from KOPS_STATE_STORE var # and cluster from kubectl context KOPS_STATE_STORE=s3://foo-bar kops validate cluster

Using kubectl:

kubectl get componentstatuses

Smoke Test

SSH into controller node, and run:

$ sudo ETCDCTL_API=3 etcdctl \ --endpoints= \ --cacert=/etc/etcd/ca.pem \ --cert=/etc/etcd/kubernetes.pem \ --key=/etc/etcd/kubernetes-key.pem \ member list 3a57933972cb5131, started, controller-2,, 3b182e98e51cb92f, started, controller-0,, ffed16798470cab5, started, controller-1,, $ sudo ETCDCTL_API=3 etcdctl \ --endpoints= \ --cacert=/etc/etcd/ca.pem \ --cert=/etc/etcd/kubernetes.pem \ --key=/etc/etcd/kubernetes-key.pem \ endpoint --cluster health is healthy: successfully committed proposal: took = 10.029513ms is healthy: successfully committed proposal: took = 2.125683ms is healthy: successfully committed proposal: took = 2.92381ms

Control Plane

The control plane consists of master nodes running three Kubernetes components:

  • API Server (port 6443)
  • Scheduler
  • Controller Manager

systemctl status from within server:

systemctl status kube-apiserver.service systemctl status kube-scheduler.service systemctl status kube-controller-manager.service


tail -f /var/log/syslog

Health endpoints:

KUBERNETES_PUBLIC_ADDRESS=kubernetes-e2ad1363d9177b35.elb.us-west-2.amazonaws.com curl -k https://$KUBERNETES_PUBLIC_ADDRESS:6443/healthz curl -k https://$KUBERNETES_PUBLIC_ADDRESS:6443/version

Kubectl component status:

$ kubectl get componentstatus NAME STATUS MESSAGE ERROR controller-manager Healthy ok etcd-1 Healthy {"health":"true"} etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} scheduler Healthy ok Worker Nodes Worker nodes are running the following services: Runc gVisor container networking plugins Containerd Kubelet Kube-proxy

Verify nodes:

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-240-0-20 Ready <none> 14s v1.12.0 ip-10-240-0-21 Ready <none> 14s v1.12.0 ip-10-240-0-22 Ready <none> 14s v1.12.0

Data Encryption Smoke Test

kubectl create secret generic my-secret \ --from-literal="mykey=mydata" # ssh onto a controller node: $ sudo ETCDCTL_API=3 etcdctl \ --endpoints= \ --cacert=/etc/etcd/ca.pem \ --cert=/etc/etcd/kubernetes.pem \ --key=/etc/etcd/kubernetes-key.pem \ get /registry/secrets/default/my-secret | hexdump -C