Using Ceph FS Persistent Volumes with Kubernetes Containers
Containers are great, up until you need to persist storage beyond the potential lifetime of the container. In a single Dockerised setup, this might be easily solved through the use of host volumes but that’s not really a feasible method on larger orchestrated container setups such as Kubernetes. Luckily though, this is easily solved through the use of remote persistent volumes. Kubernetes supports a number of Persistent Volumes out of the box, however this post will specifically focus on the usage of Ceph FS persistent volumes. As a quick aside though, on the Kubernetes documentation you may see references to “Ceph RBD” and “Ceph FS”. Both of these are different types of volumes that can be hosted by Ceph but they differ in what type of volumes they are. Ceph RBD is a block device, analogous to an iSCSI block device. Ceph FS on the other hand is a file system, analogous more to something such as an NFS or Samba share. Both can be used to provide persistent storage to Kubernetes pods, but Ceph FS is my personal preferred method as the same volume can be mounted to multiple pods at the same time.
Overall this post will cover the steps needed to set up Ceph FS on an existing Ceph Cluster and securely mounting the Ceph FS storage on a Kubernetes pod.
Pre-Requisites
Before we begin, i’ll assume that you have the following set up and running;
- A Kubernetes cluster running 1.5 or higher on hosts running Kernel 4.9 or higher (due to a bug with Ceph FS’s Kernel Module on versions earlier than this that prevents mounting secured subpaths).
- A Ceph cluster running Jewel or higher, with OSDs and Monitors configured.
- ceph-deploy has been used to set up the existing Ceph cluster, or has been configured to communicate with an existing cluster.
Add a Ceph Metadata Server
Ceph FS requires a Metadata Server (MDS) in order to run. Only a single node needs to run as a Metadata server, and an existing OSD or Monitor node can be used for it. Be aware though that the MDS is one of the most CPU intensive roles, so if you do add it to an existing OSD or Monitor node ensure that the system has sufficient resources. Ceph’s Minimum Hardware Recommendations may be a useful resource for sizing your nodes appropriately.
To add a Ceph Metadata Server;
- Run ceph-deploy mds create NODE, replacing NODE with the hostname of the node you wish to use as a Metadata Server.
- Verify that the Metadata Server is online by running ceph mds stat . The output should be similar to the following, confirming that you have a single Metadata Server ‘up’ along with the Metadata Server’s hostname.
$ ceph mds stat e2:, 1 up:standby
Although only one Metadata Server is required, you can add additional standby Metadata Servers. However, support for this is considered experimental and multiple active Metadata Servers are strongly discouraged by the Ceph project.
Create the Ceph Pools and Ceph FS
The Ceph FS itself relies on two underlying Ceph Pools, one for the metadata and one for the actual data, so these will need to be created first. Once the pools are created, the Ceph FS can be created on top of them.
- Create the Metadata and Data pools using the following commands, replacing PLACEMENTGROUPS with the number of Placement Groups for the pool. Be sure to size this carefully as once set it can only be increased and never decreased without destroying the pool. A suitable value for a small 3 node cluster would be 32 – Ceph’s own suggested value of 128 is too high for smaller clusters.
ceph osd pool create cephfs_data PLACEMENTGROUPS ceph osd pool create cephfs_metadata PLACEMENTGROUPS
- Create a new Ceph FS on top of the two pools using the command ceph fs new cephfs cephfs_metadata cephfs_data.
- Verify that the Ceph FS was created successfully using the command ceph fs ls. The output should be similar to the following;
$ sudo ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
- Verify that the Metadata Server is now in the active state using the command ceph mds stat. The output should be similar to the following;
$ ceph mds stat e12: 1/1/1 up {0=ceph-01=up:active}
Test Mount the Ceph FS Volume
Before we try mounting the volume in to a Kubernetes pod, it’s a good idea to try mounting it from a regular Linux host (that isn’t already a Ceph node!). By default, access to Ceph FS is authenticated so first grab your admin key using the command ceph auth get client.admin . Save the resulting base64-encoded key to a file, copy it to your regular Linux host and secure access to it appropriately – there is no need to copy the [client.admin] part or the key = part, only the base64-encoded key itself is needed!
Mount the Ceph FS volume using the following command, replacing CEPH-MONITOR with the hostname or IP address of a Ceph monitor node and replacing /path/to/client.admin with the path where your admin client key is saved. If /mnt/ceph doesn’t already exist, either create it or replace it with the path to an empty directory.
mount -t ceph CEPH-MONITOR:/ /mnt/ceph -o name=admin,secretfile=/path/to/client.admin
If the mount was successful, it should now show up in the output of the mount command and as a partition in the output of the df command. Try creating, writing to and reading from a few files to make sure everything is working. If you’re not able to mount Ceph FS successfully (especially if you’re receiving mount error 22 = Invalid argument ), try looking at the output of dmesg – it often contains more details as to why the mount failed.
Generating a new Client Key
At this point we could use the admin client key with our Kubernetes pods too. However, from a security perspective every pod should have it’s own restricted key and be only able to access it’s own part of Ceph filesystem. Luckily though, this is a fairly easy process.
Run ceph auth get-or-create client.NAME mon ‘allow r’ mds ‘allow rw path=/PATH’ osd ‘allow rw pool=cephfs_data’ , replacing NAME with the name of the pod (e.g. nginx ) and PATH with the path you’d like the key to grant access to (e.g. /containers/nginx ). Like the ceph auth get command from before, this will return the key for this new client. Make sure to save the base64-encoded key somewhere safe as we’ll need to provide it to the pod.
To quickly explain the privileges that this key has been granted;
- Read permissions to the Ceph Metadata Server. This is required so that the key can look up the appropriate metadata for where files are actually stored.
- Read and write permissions to the /PATH subdirectory. Note that we’ve only granted it permissions to this path – for any other path (e.g. /other/path ) the key doesn’t have permission to read or write.
- Read and write permission to the underlying data storage pool.
Create the Client’s Directory
In the previous step, a new Client Key was created and restricted to a path. However, if that path doesn’t exist the mount will fail with mount error 2 = No such file or directory . From the host that you previously test-mounted the CephFS root volume on, run mkdir /PATH , replacing PATH with the path you limited the Client Key to.
Mounting Ceph FS inside a Kubernetes Pod
And finally we’re at the point where we’re able to mount our shiny new Ceph FS storage inside a Kubernetes Pod! To do this, we’ll need to update an existing Kubernetes pod configuration to include information on how to mount Ceph FS.
As a quick recap, mounting storage in Kubernetes needs you to define the mounted volume at the Pod level and where the volume is mounted at the Container level.
Pod-Level Configuration
At the Pod level, add the following configuration;
volumes: - name: ceph-fs cephfs: monitors: - CEPH-MONITOR-1:6789 - CEPH-MONITOR-2:6789 - CEPH-MONITOR-3:6789 user: NAME secretRef: name: ceph-secret path: "/PATH"
- Replace CEPH-MONITOR-1 through CEPH-MONITOR-3 with the actual hostnames or IPs of your Ceph monitor nodes. You can add in as many or as few of these as you wish, however you should add all of your monitors here to ensure that your pod is always able to reach the Ceph FS storage even if one or more of your monitor nodes is down.
- Replace NAME with the name of the client key you created earlier. There’s no need to use client.NAME here – the client. part is automatically prepended.
- Replace PATH with the path that you granted the client key access to, e.g. /containers/nginx . If you don’t specify this, Kubernetes will attempt to mount the Ceph FS’s root directory and this will fail as our client key was only granted access to the /PATH subpath!
Container Level Configuration
At the Container level, add the following configuration;
volumeMounts: - name: ceph-fs mountPath: "/MOUNT-PATH"
- Replace MOUTH-PATH with the path that you would like to mount the Ceph FS volume at within the container. There’s no need to re-specify the Ceph FS path here, the path from the pod-level configuration has already been taken in to account. For example, if you had a file at /containers/nginx/nginx.conf on your Ceph FS, had set PATH to /containers/nginx at the pod-level configuration and then set MOUNT-PATH to /etc/nginx , then your container would have a file located at /etc/nginx/nginx.conf .
Secret Configuration
Remember the client key that we created earlier? We need to provide this to Kubernetes so it can use it when mounting the volume. To do so, add the following configuration (note, if you’re adding this to the end of an existing pod configuration, don’t forget to add — between them!).
apiVersion: v1 kind: Secret metadata: name: ceph-secret-beets data: key: "SECRET-KEY-B64"
- Replace SECRET-KEY-B64 with the base64 encoded secret key from earlier. Be careful here – the secret key that the ceph auth get-or-create command gave us is already in base64, but as Kubernetes requires secrets to be defined in base64 we need to base64 encode the key that the command gave us again for Kubernetes to correctly encode and decode the key. To verify that you’ve done this correctly, after creating the secret to you can run kubectl get secret ceph-fs -o yaml to output the contents of the secret. If the key value looks like a base64 string, you’ve done it correctly, if you get a string of weird characters (e.g. question marks, squares, accented letters) you’ve not encoded the key in base64.
Create the Pod
That should be it! Finish configuring the rest of your pod and then create it. If done correctly, Kubernetes should mount your new Ceph FS storage inside the pod, and your pod’s containers will be able to read and write to Ceph FS.
Conclusions
Hopefully by following all of the steps above, you’ve been able to set up Ceph FS on top of your existing Ceph cluster and expose that storage securely to your Kubernetes pods and containers.
Appendex
Full Example Pod Configuration
Wrapping your head around Kubernetes configurations can be difficult when starting out, so if you’re not sure where the bits of configuration above should fit in, the following full example from one of my pods should help. A few things to note;
- This sets up a pod running a single container containing the Beets music tagger. Beets needs storage for its local database and configuration, which I am using Ceph FS for. I also have my Music share mounted in to the container over NFS.
- There are loads of extra bits above and beyond the minimun required to mount Ceph. You will not likely require these, or they will vary depending on your setup.
- I am using a deployment to schedule the pod. If you are just using pod, your configuration may omit some of the parts on strategy/replicas etc.
- I am using a Service and Ingress to expose Beet’s web interface locally.
- I have set the UID and GID of the process running inside the container so that it can access my NFS share properly. This isn’t Ceph specific and can be ignored.
- Various paths, settings and keys have been changed from their actual values – sorry you won’t find any value here if you’re attempting to perform reconnaissance on my setup!
If you’re looking for a simpler example, the Kubernetes source repository has a few examples as well.
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: beets spec: replicas: 1 strategy: type: Recreate template: metadata: labels: app: beets spec: volumes: - name: music-nfs nfs: server: nas path: "/mnt/POOL/Music/" - name: music-input-nfs nfs: server: nas path: "/mnt/POOL/input/music/" - name: beets-config cephfs: monitors: - ceph-1:6789 - ceph-2:6789 - ceph-3:6789 user: beets secretRef: name: ceph-secret-beets path: "/containers/beets" containers: - name: beets image: kingj/beets-docker ports: - containerPort: 8337 volumeMounts: - name: music-nfs mountPath: "/music" - name: music-input-nfs mountPath: "/input" - name: beets-config mountPath: "/etc/beets" env: - name: PGID value: "12053" - name: PUID value: "12053" securityContext: runAsUser: 12053 supplementalGroups: [12053] --- apiVersion: v1 kind: Service metadata: name: beets-svc labels: app: beets spec: ports: - port: 8337 targetPort: 8337 protocol: TCP name: http selector: app: beets --- apiVersion: extensions/v1beta1 kind: Ingress metadata: name: beets-ingress spec: rules: - host: beets http: paths: - path: / backend: serviceName: beets-svc servicePort: 8337 --- apiVersion: v1 kind: Secret metadata: name: ceph-secret-beets data: key: "bm8sIGknbSBub3QgZHVtYiBlbm91Z2ggdG8gbGVhdmUgdGhlIGFjdHVhbCBjZXBoIGNsaWVudCBrZXkgaGVyZS4gTmljZSB0cnkgdGhvdWdoIQ=="