Kubernetes storage
Storage is critical to most real-world production applications. Fortunately, Kubernetes has a mature and feature-rich storage subsystem called the persistent volume subsystem.
We’ll divide this chapter as follows:
- The Container Storage Interface (CSI)
- The Kubernetes persistent volume subsystem
- Storage Classes and Dynamic Provisioning
The big picture
First things first, Kubernetes supports lots of types of storage from lots of different places. For example, iSCSI, SMB, NFS, and object storage blobs, all from a variety of external storage systems that can be in the cloud or in your on-premises data center. However, no matter what type of storage you have, or where it comes from, when it’s exposed on your Kubernetes cluster it’s called a volume. For example, Azure File resources surfaced in Kubernetes are called volumes, as are block devices from AWS Elastic Block Store. All storage on a Kubernetes cluster is called a volume.
Figure 8.1 shows the high-level architecture.
Figure 8.1
On the left, you’ve got storage providers. They can be your traditional enterprise storage arrays from vendors like EMC and NetApp, or they can be cloud storage services such as AWS Elastic Block Store (EBS) and GCE Persistent Disks (PD). All you need, is a plugin that allows their storage resources to be surfaced as volumes in Kubernetes.
In the middle of the diagram is the plugin layer. In the simplest terms, this is the glue that connects external storage with Kubernetes. Going forward, plugins will be based on the Container Storage Interface (CSI) which is an open standard aimed at providing a clean interface for plugins. If you’re a developer writing storage plugins, the CSI abstracts the internal Kubernetes storage detail and lets you develop out-of-tree.
Note: Prior to the CSI, all storage plugins were implemented as part of the main Kubernetes code tree (in-tree). This meant they all had to be open-source, and all updates and bug-fixes were tied to the main Kubernetes release-cycle. This was a nightmare for plugin developers as well as the Kubernetes maintainers. However, now that we have the CSI, storage vendors no longer need to open-source their code, and they can release updates and bug-fixes against their own timeframes.
On the right of Figure 8.1 is the Kubernetes persistent volume subsystem. This is a set of API objects that allow applications to consume storage. At a high-level, Persistent Volumes (PV) are how you map external storage onto the cluster, and Persistent Volume Claims (PVC) are like tickets that authorize applications (Pods) to use a PV.
Let’s assume the quick example shown in Figure 8.2.
A Kubernetes cluster is running on AWS and the AWS administrator has created a 25GB EBS volume called “ebs-vol”. The Kubernetes administrator creates a PV called “k8s-vol” that links back to the “ebs-vol” via the kubernetes.io/aws-ebs plugin. While that might sound complicated, it’s not. The PV is simply a way of representing the external storage on the Kubernetes cluster. Finally, the Pod uses a PVC to claim access to the PV and start using it.
Figure 8.2
A couple of points worth noting.
- There are rules safeguarding access to a single volume from multiple Pods (more on this later).
- A single external storage volume can only be used by a single PV. For example, you cannot have a 50GB external volume that has two 25GB Kubernetes PVs each using half of it.
Now that you have an idea of the fundamentals, let’s dig a bit deeper.
Storage Providers
Kubernetes can use storage from a wide range of external systems. These will often be native cloud services such as AWSElasticBlockStore or AzureDisk, but they can also be traditional on-premises storage arrays providing iSCSI or NFS volumes. Other options exist, but the take-home point is that Kubernetes gets its storage from a wide range of external systems.
Some obvious restrictions apply. For example, you cannot use the AWSElasticBlockStore provisioner if your Kubernetes cluster is running in Microsoft Azure.
The Container Storage Interface (CSI)
The CSI is an important piece of the Kubernetes storage jigsaw. However, unless you’re a developer writing storage plugins, you’re unlikely to interact with it very often.
It’s an open-source project that defines a standards-based interface so that storage can be leveraged in a uniform way across multiple container orchestrators. In other words, a storage vendor should be able to write a single CSI plugin that works across multiple orchestrators like Kubernetes and Docker Swarm. In reality, Kubernetes is the focus.
In the Kubernetes world, the CSI is the preferred way to write drivers (plugins) and means that plugin code no longer needs to exist in the main Kubernetes code tree. It also provides a clean and simple interface that abstracts all the complex internal Kubernetes storage machinery. Basically, the CSI exposes a clean interface and hides all the ugly volume machinery inside of the Kubernetes code (no offense intended).
From a day-to-day management perspective, your only real interaction with the CSI will be referencing the appropriate plugin in your YAML manifest files. Also, it may take a while for existing in-tree plugins to be replaced by CSI plugins.
Sometimes we call plugins “provisioners”, especially when we talk about Storage Classes later in the chapter.
The Kubernetes persistent volume subsystem
From a day-to-day perspective, this is where you’ll spend most of your time configuring and interacting with Kubernetes storage.
You start out with raw storage on the left of Figure 8.3. This plugs in to Kubernetes via a CSI plugin. You then use the resources provided by the persistent volume subsystem to leverage and use the storage in your apps.
Figure 8.3
The three main resources in the persistent volume subsystem are:
- Persistent Volume Claims (PVC)
At a high level, PVs are how you represent storage in Kubernetes. PVCs are like tickets that grant a Pod access to a PV. SCs make it all dynamic.
Let’s walk through a quick example
Assume you have a Kubernetes cluster and an external storage system. The storage vendor provides a CSI plugin so that you can leverage its storage assets inside of your Kubernetes cluster. You provision 3 x 10GB volumes on the storage system and create 3 Kubernetes PV objects to make them available on your cluster. Each PV references one of the volumes on the storage array via the CSI plugin. At this point, the three volumes are visible and available for use on the Kubernetes cluster.
Now assume you’re about to deploy an application that requires 10GB of storage. That’s great, you already have three 10GB PVs. In order for the app to use one of them, it needs a PVC. As previously mentioned, a PVC is like a ticket that lets a Pod (application) use a PV. Once the app has the PVC, it can mount the respective PV into its Pod as a volume. Refer back to Figure 8.2 if you need a visual representation.
That was a high-level example. Let’s do it.
This example is for a Kubernetes cluster running on the Google Cloud. I’m using a cloud option as they’re the easiest to follow along with and you may be able to use the cloud’s free tier/initial free credit. It’s also possible to follow along on other clouds by changing a few values.
The example assumes 10GB SSD volume called “uber-disk” has been pre-created in the same Google Cloud Region or Zone as the cluster. The Kubernetes steps will be:
- Create the PV
- Create the PVC
- Define the volume into a PodSpec
- Mount it into a container
The following YAML file creates a PV object that maps back to the pre-created Google Persistent Disk called “uber-disk”. The YAML file is available in the storage folder of the book’s GitHub repo called gke-pv.yml.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
spec:
accessModes:
- ReadWriteOnce storageClassName: test capacity:
storage: 10Gi persistentVolumeReclaimPolicy: Retain gcePersistentDisk:
pdName: uber-disk
Let’s step through the file.
PersistentVolume (PV) resources are defined in v1 of the core API group. You’re naming this PV “pv1”, setting its access mode to ReadWriteOnce, and making it part of a class of storage called “test”. You’re defining it as a 10GB volume, setting a reclaim policy, and mapping it back to a pre-created GCE persistent disk called “uber-disk”.
The following command will create the PV. It assumes the YAML file is in your PATH and is called gke-pv.yml. The operation will fail if you have not pre-created “uber-disk” on the back-end storage system (in this example the back-end storage is provided by Google Compute Engine).
- kubectl apply -f gke-pv.yml persistentvolume/pv1 created
Check the PV exists.
$ kubectl get pv pv1
NAME CAPACITY MODES pv1 10Gi RWO
RECLAIM POLICY Retain
STATUS Available
STORAGECLASS ...
test
If you want, you can see more detailed information with kubectl describe pv pv1, but at the moment you have what is shown in Figure 8.4.
Figure 8.4
Let’s quickly explain some of the PV properties set out in the YAML file.
.spec.accessModes defines how the PV can be mounted. Three options exist:
ReadWriteOnce defines a PV that can only be mounted/bound as R/W by a single PVC. Attempts from multiple PVCs to bind (claim) it will fail.
ReadWriteMany defines a PV that can be bound as R/W by multiple PVCs. This mode is usually only supported by file and object storage such as NFS. Block storage usually only supports RWO.
ReadOnlyMany defines a PV that can be bound by multiple PVCs as R/O.
A couple of things are worth noting. First up, a PV can only be opened in one mode – it is not possible for a single PV to have a PVC bound to it in ROM mode and another PVC bound to it in RWM mode. Second up, Pods do not act directly on PVs, they always act on the PVC object that is bound to the PV.
.spec.storageClassName tells Kubernetes to group this PV in a storage class called “test”. You’ll learn more about storage classes later in the chapter, but you need this here to make sure the PV will correctly bind with a PVC in a later step.
Another property is spec.persistentVolumeReclaimPolicy. This tells Kubernetes what to do with a PV when its
PVC has been released. Two policies currently exist:
Delete is the most dangerous, and is the default for PVs that are created dynamically via storage classes (more on these later). This policy deletes the PV and associated storage resource on the external storage system, so will result in data loss! You should obviously use this policy with caution.
Retain will keep the associated PV object on the cluster as well as any data stored on the associated external asset.
However, it will prevent another PVC from using the PV in future.
If you want to re-use a retained PV, you need to perform the following three steps:
- Manually delete the PV on Kubernetes
- Re-format the associated storage asset on the external storage system to wipe any data
- Recreate the PV
Tip: If you are experimenting in a lab and re-using PVs, it’s easy to forget that you will have to perform the previous three steps when trying to re-use an old deleted PV that has the retain policy.
.spec.capacity tells Kubernetes how big the PV should be. This value can be less than the actual physical storage asset but cannot be more. For example, you cannot create a 100GB PV that maps back to a 50GB device on the external storage system. But you can create a 50GB PV that maps back to a 100GB external volume (but that would be wasteful).
Finally, the last line of the YAML file links the PV to the name of the pre-created device on the back-end.
You can also specify vendor-specific attributes using the .parameters section of a PV YAML. You’ll see more of this later when you look at storage classes, but for now, if your storage system supports pink fluffy NVMe devices, this is where you’d specify them.
Now that you’ve got a PV, let’s create a PVC so that a Pod can claim access to the storage.
The following YAML defines a PVC that can be used by a Pod to gain access to the pv1 PV you created earlier.
The file is available in the storage folder in the book’s GitHub repo called gke-pvc.yml.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc1
spec:
accessModes:
- ReadWriteOnce storageClassName: test resources:
requests: storage: 10Gi
As with the PV, PVCs are a stable v1 resource in the core API group.
The most important thing to note about a PVC object is that the values in the .spec section must match with the PV you are binding it with. In this example, access modes, storage class, and capacity must match with the PV.
Note: It’s possible for a PV to have more capacity than a PVC. For example, a 10GB PVC can be bound to a 15GB PV (obviously this will waste 5GB of the PV). However, a 15GB PVC cannot be bound to a 10GB PV.
Figure 8.5 shows a side-by-side comparison of the example PV and PVC YAML files and highlights the properties that need to match.
Figure 8.5
Deploy the PVC with the following command. It assumes the YAML file is called “gke-pvc.yml” and exists in your PATH.
- kubectl apply -f gke-pvc.yml persistentvolumeclaim/pvc1 created
Check that the PVC is created and bound to the PV.
$ kubectl get pvc pvc1
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
pvc1 Bound pv1 10Gi RWO test
OK, you’ve got a PV called pv1 representing 10GB of external storage on our Kubernetes cluster and you’ve bound a PVC called pvc1 to it. Let’s find out how a Pod can leverage that PVC and use the actual storage.
More often than not, you’ll deploy your applications via higher-level controllers like Deployments and State-fulSets, but to keep the example simple, you’ll deploy a single Pod. Pods deployed like this are often referred to as “singletons” and are not recommended for production as they do not provide high availability and cannot self-heal.
The following YAML defines a single-container Pod with a volume called “data” that leverages the PVC and PV objects you already created. The file is available in the storage folder of the book’s GitHub repo called volpod.yml