After a wild using Kubernetes in AWS and set-up persistent volumes via EBS, I faced a problem with
evicted pods after they are re-schedule to another node; The issue was the EBS volumes are dedicated by zone, makes sense because the volumes work via networking and are dedicated per datacenter, for the network latency.
The limitations are described in the official Kubernetes documentation:
This post combines the infrastructure from this post:
The solution works in any cloud provider, in my case I use AWS as a cloud provider. The idea is to use
nodeSelector for the pods that use persistent volume (EBS) and provide a fixed availability zone, so if the pods are re-scheduled to other nodes will land in the same availability zone as the volume.
1. Storage class
One important thing in the storage class is the following parameter
volumeBindingMode: WaitForFirstConsumer with this parameter will delay the binding and provisioning of a persistent volume until a pod is created.
--- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: ebs provisioner: kubernetes.io/aws-ebs volumeBindingMode: WaitForFirstConsumer parameters: type: gp2 fsType: "ext4"
2. Persistent Volume Claim
The PVC that make reference to the storage class and will be included in the deployment.
--- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-nginx spec: storageClassName: ebs accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
The deployment has the
nodeSelector which defines in which zone will be deployed the pod. The
nodeSelector is only for the deployments that use persistent volume.
--- kind: Deployment apiVersion: apps/v1 metadata: name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: nodeSelector: zone: eu-west-1a containers: - image: nginx:latest name: nginx volumeMounts: - name: vol-nginx mountPath: /mnt/ volumes: - name: vol-nginx persistentVolumeClaim: claimName: pvc-nginx
Solution for multiples replicas
If you want to run multiples replicas in different availability zones and also use persistent volume you can use
podAntiAffinity to tell the Kubernetes scheduler to deploy each replica in different nodes in different availability zones.
The following deployment is an Nginx with persistent volume, with an amount of 3 replicas, running in 3 different availability zones. There is a limitation in the following example, the number of replicas need to be the number of availability zones.
--- kind: Deployment apiVersion: apps/v1 metadata: name: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - nginx topologyKey: zone containers: - image: nginx:latest name: nginx volumeMounts: - name: vol-nginx mountPath: /mnt/ volumes: - name: vol-nginx persistentVolumeClaim: claimName: pvc-nginx