After a wild using Kubernetes in AWS and set-up persistent volumes via EBS, I faced a problem with evicted
pods after they are re-schedule to another node; The issue was the EBS volumes are dedicated by zone, makes sense because the volumes work via networking and are dedicated per datacenter, for the network latency.
The limitations are described in the official Kubernetes documentation:
This post combines the infrastructure from this post:
Solution
The solution works in any cloud provider, in my case I use AWS as a cloud provider. The idea is to use nodeSelector
for the pods that use persistent volume (EBS) and provide a fixed availability zone, so if the pods are re-scheduled to other nodes will land in the same availability zone as the volume.
1. Storage class
One important thing in the storage class is the following parameter volumeBindingMode: WaitForFirstConsumer
with this parameter will delay the binding and provisioning of a persistent volume until a pod is created.
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: ebs
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp2
fsType: "ext4"
2. Persistent Volume Claim
The PVC that make reference to the storage class and will be included in the deployment.
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-nginx
spec:
storageClassName: ebs
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
3. Deployment
The deployment has the nodeSelector
which defines in which zone will be deployed the pod. The nodeSelector
is only for the deployments that use persistent volume.
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
zone: eu-west-1a
containers:
- image: nginx:latest
name: nginx
volumeMounts:
- name: vol-nginx
mountPath: /mnt/
volumes:
- name: vol-nginx
persistentVolumeClaim:
claimName: pvc-nginx
Solution for multiples replicas
If you want to run multiples replicas in different availability zones and also use persistent volume you can use podAntiAffinity
to tell the Kubernetes scheduler to deploy each replica in different nodes in different availability zones.
The following deployment is an Nginx with persistent volume, with an amount of 3 replicas, running in 3 different availability zones. There is a limitation in the following example, the number of replicas need to be the number of availability zones.
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: zone
containers:
- image: nginx:latest
name: nginx
volumeMounts:
- name: vol-nginx
mountPath: /mnt/
volumes:
- name: vol-nginx
persistentVolumeClaim:
claimName: pvc-nginx