Using Ceph FS Persistent Volumes with Kubernetes Containers

Containers are great, up until you need to persist storage beyond the potential lifetime of the container. In a single Dockerised setup, this might be easily solved through the use of host volumes but that’s not really a feasible method on larger orchestrated container setups such as Kubernetes. Luckily though, this is easily solved through the use of remote persistent volumes. Kubernetes supports a number of Persistent Volumes out of the box, however this post will specifically focus on the usage of Ceph FS persistent volumes. As a quick aside though, on the Kubernetes documentation you may see references to “Ceph RBD” and “Ceph FS”. Both of these are different types of volumes that can be hosted by Ceph but they differ in what type of volumes they are. Ceph RBD is a block device, analogous to an iSCSI block device. Ceph FS on the other hand is a file system, analogous more to something such as an NFS or Samba share. Both can be used to provide persistent storage to Kubernetes pods, but Ceph FS is my personal preferred method as the same volume can be mounted to multiple pods at the same time.

Overall this post will cover the steps needed to set up Ceph FS on an existing Ceph Cluster and securely mounting the Ceph FS storage on a Kubernetes pod.

Pre-Requisites

Before we begin, i’ll assume that you have the following set up and running;

  • A Kubernetes cluster running 1.5 or higher on hosts running Kernel 4.9 or higher (due to a bug with Ceph FS’s Kernel Module on versions earlier than this that prevents mounting secured subpaths).
  • A Ceph cluster running Jewel or higher, with OSDs and Monitors configured.
  • ceph-deploy has been used to set up the existing Ceph cluster, or has been configured to communicate with an existing cluster.

Add a Ceph Metadata Server

Ceph FS requires a Metadata Server (MDS) in order to run. Only a single node needs to run as a Metadata server, and an existing OSD or Monitor node can be used for it. Be aware though that the MDS is one of the most CPU intensive roles, so if you do add it to an existing OSD or Monitor node ensure that the system has sufficient resources. Ceph’s Minimum Hardware Recommendations may be a useful resource for sizing your nodes appropriately.

To add a Ceph Metadata Server;

  1. Run ceph-deploy mds create NODE, replacing NODE  with the hostname of the node you wish to use as a Metadata Server.
  2. Verify that the Metadata Server is online by running ceph mds stat . The output should be similar to the following, confirming that you have a single Metadata Server ‘up’ along with the Metadata Server’s hostname.
$ ceph mds stat
e2:, 1 up:standby

Although only one Metadata Server is required, you can add additional standby Metadata Servers. However, support for this is considered experimental and multiple active Metadata Servers are strongly discouraged by the Ceph project.

Create the Ceph Pools and Ceph FS

The Ceph FS itself relies on two underlying Ceph Pools, one for the metadata and one for the actual data, so these will need to be created first. Once the pools are created, the Ceph FS can be created on top of them.

  1. Create the Metadata and Data pools using the following commands, replacing PLACEMENTGROUPS  with the number of Placement Groups for the pool.  Be sure to size this carefully as once set it can only be increased and never decreased without destroying the pool. A suitable value for a small 3 node cluster would be 32 – Ceph’s own suggested value of 128 is too high for smaller clusters.
ceph osd pool create cephfs_data PLACEMENTGROUPS
ceph osd pool create cephfs_metadata PLACEMENTGROUPS
  1. Create a new Ceph FS on top of the two pools using the command ceph fs new cephfs cephfs_metadata cephfs_data.
  2. Verify that the Ceph FS was created successfully using the command ceph fs ls. The output should be similar to the following;
$ sudo ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
  1. Verify that the Metadata Server is now in the active state using the command ceph mds stat. The output should be similar to the following;
$ ceph mds stat
e12: 1/1/1 up {0=ceph-01=up:active}

Test Mount the Ceph FS Volume

Before we try mounting the volume in to a Kubernetes pod, it’s a good idea to try mounting it from a regular Linux host (that isn’t already a Ceph node!). By default, access to Ceph FS is authenticated so first grab your admin key using the command ceph auth get client.admin . Save the resulting base64-encoded key to a file, copy it to your regular Linux host and secure access to it appropriately – there is no need to copy the [client.admin]  part or the key =  part, only the base64-encoded key itself is needed!

Mount the Ceph FS volume using the following command, replacing CEPH-MONITOR  with the hostname or IP address of a Ceph monitor node and replacing /path/to/client.admin  with the path where your admin client key is saved. If /mnt/ceph  doesn’t already exist, either create it or replace it with the path to an empty directory.

mount -t ceph CEPH-MONITOR:/ /mnt/ceph -o name=admin,secretfile=/path/to/client.admin

If the mount was successful, it should now show up in the output of the mount  command and as a partition in the output of the df  command. Try creating, writing to and reading from a few files to make sure everything is working. If you’re not able to mount Ceph FS successfully (especially if you’re receiving mount error 22 = Invalid argument ), try looking at the output of dmesg  – it often contains more details as to why the mount failed.

Generating a new Client Key

At this point we could use the admin client key with our Kubernetes pods too. However, from a security perspective every pod should have it’s own restricted key and be only able to access it’s own part of Ceph filesystem. Luckily though, this is a fairly easy process.

Run ceph auth get-or-create client.NAME mon ‘allow r’ mds ‘allow rw path=/PATH’ osd ‘allow rw pool=cephfs_data’ , replacing NAME  with the name of the pod (e.g. nginx ) and PATH  with the path you’d like the key to grant access to (e.g. /containers/nginx ). Like the ceph auth get  command from before, this will return the key for this new client. Make sure to save the base64-encoded key somewhere safe as we’ll need to provide it to the pod.

To quickly explain the privileges that this key has been granted;

  • Read permissions to the Ceph Metadata Server. This is required so that the key can look up the appropriate metadata for where files are actually stored.
  • Read and write permissions to the /PATH  subdirectory. Note that we’ve only granted it permissions to this path – for any other path (e.g. /other/path ) the key doesn’t have permission to read or write.
  • Read and write permission to the underlying data storage pool.

Create the Client’s Directory

In the previous step, a new Client Key was created and restricted to a path. However, if that path doesn’t exist the mount will fail with mount error 2 = No such file or directory . From the host that you previously test-mounted the CephFS root volume on, run mkdir /PATH , replacing PATH  with the path you limited the Client Key to.

Mounting Ceph FS inside a Kubernetes Pod

And finally we’re at the point where we’re able to mount our shiny new Ceph FS storage inside a Kubernetes Pod! To do this, we’ll need to update an existing Kubernetes pod configuration to include information on how to mount Ceph FS.

As a quick recap, mounting storage in Kubernetes needs you to define the mounted volume at the Pod level and where the volume is mounted at the Container level.

Pod-Level Configuration

At the Pod level, add the following configuration;

volumes:
  - name: ceph-fs
    cephfs:
      monitors:
        - CEPH-MONITOR-1:6789
        - CEPH-MONITOR-2:6789
        - CEPH-MONITOR-3:6789
      user: NAME
      secretRef:
        name: ceph-secret
      path: "/PATH"
  • Replace CEPH-MONITOR-1  through CEPH-MONITOR-3  with the actual hostnames or IPs of your Ceph monitor nodes. You can add in as many or as few of these as you wish, however you should add all of your monitors here to ensure that your pod is always able to reach the Ceph FS storage even if one or more of your monitor nodes is down.
  • Replace NAME  with the name of the client key you created earlier. There’s no need to use client.NAME  here – the client.  part is automatically prepended.
  • Replace PATH  with the path that you granted the client key access to, e.g. /containers/nginx . If you don’t specify this, Kubernetes will attempt to mount the Ceph FS’s root directory and this will fail as our client key was only granted access to the /PATH  subpath!

Container Level Configuration

At the Container level, add the following configuration;

volumeMounts:
  - name: ceph-fs
    mountPath: "/MOUNT-PATH"
  • Replace MOUTH-PATH  with the path that you would like to mount the Ceph FS volume at within the container.  There’s no need to re-specify the Ceph FS path here, the path from the pod-level configuration has already been taken in to account. For example, if you had a file at /containers/nginx/nginx.conf  on your Ceph FS, had set PATH  to /containers/nginx  at the pod-level configuration and then set  MOUNT-PATH  to /etc/nginx , then your container would have a file located at /etc/nginx/nginx.conf .

Secret Configuration

Remember the client key that we created earlier? We need to provide this to Kubernetes so it can use it when mounting the volume. To do so, add the following configuration (note, if you’re adding this to the end of an existing pod configuration, don’t forget to add   between them!).

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret-beets
data:
  key: "SECRET-KEY-B64"
  • Replace SECRET-KEY-B64  with the base64 encoded secret key from earlier. Be careful here – the secret key that the ceph auth get-or-create  command gave us is already in base64, but as Kubernetes requires secrets to be defined in base64 we need to base64 encode the key that the command gave us again for Kubernetes to correctly encode and decode the key. To verify that you’ve done this correctly, after creating the secret to you can run kubectl get secret ceph-fs -o yaml  to output the contents of the secret. If the key value looks like a base64 string, you’ve done it correctly, if you get a string of weird characters (e.g. question marks, squares, accented letters) you’ve not encoded the key in base64.

Create the Pod

That should be it! Finish configuring the rest of your pod and then create it. If done correctly, Kubernetes should mount your new Ceph FS storage inside the pod, and your pod’s containers will be able to read and write to Ceph FS.

Conclusions

Hopefully by following all of the steps above, you’ve been able to set up Ceph FS on top of your existing Ceph cluster and expose that storage securely to your Kubernetes pods and containers.

Appendex

Full Example Pod Configuration

Wrapping your head around Kubernetes configurations can be difficult when starting out, so if you’re not sure where the bits of configuration above should fit in, the following full example from one of my pods should help. A few things to note;

  • This sets up a pod running a single container containing the Beets music tagger. Beets needs storage for its local database and configuration, which I am using Ceph FS for. I also have my Music share mounted in to the container over NFS.
  • There are loads of extra bits above and beyond the minimun required to mount Ceph. You will not likely require these, or they will vary depending on your setup.
    • I am using a deployment to schedule the pod. If you are just using pod, your configuration may omit some of the parts on strategy/replicas etc.
    • I am using a Service and Ingress to expose Beet’s web interface locally.
    • I have set the UID and GID of the process running inside the container so that it can access my NFS share properly. This isn’t Ceph specific and can be ignored.
  • Various paths, settings and keys have been changed from their actual values – sorry you won’t find any value here if you’re attempting to perform reconnaissance on my setup!

If you’re looking for a simpler example, the Kubernetes source repository has a few examples as well.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: beets
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: beets
    spec:
      volumes:
        - name: music-nfs
          nfs:
            server: nas
            path: "/mnt/POOL/Music/"
        - name: music-input-nfs
          nfs:
            server: nas
            path: "/mnt/POOL/input/music/"
        - name: beets-config
          cephfs:
            monitors:
              - ceph-1:6789
              - ceph-2:6789
              - ceph-3:6789
            user: beets
            secretRef:
              name: ceph-secret-beets
            path: "/containers/beets"
      containers:
      - name: beets
        image: kingj/beets-docker
        ports:
          - containerPort: 8337
        volumeMounts:
          - name: music-nfs
            mountPath: "/music"
          - name: music-input-nfs
            mountPath: "/input"
          - name: beets-config
            mountPath: "/etc/beets"
        env:
        - name: PGID
          value: "12053"
        - name: PUID
          value: "12053"
      securityContext:
        runAsUser: 12053
        supplementalGroups: [12053]
---
apiVersion: v1
kind: Service
metadata:
  name: beets-svc
  labels:
    app: beets
spec:
  ports:
    - port: 8337
      targetPort: 8337
      protocol: TCP
      name: http
  selector:
    app: beets
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: beets-ingress
spec:
  rules:
  - host: beets
    http:
      paths:
      - path: /
        backend:
          serviceName: beets-svc
          servicePort: 8337
---
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret-beets
data:
  key: "bm8sIGknbSBub3QgZHVtYiBlbm91Z2ggdG8gbGVhdmUgdGhlIGFjdHVhbCBjZXBoIGNsaWVudCBrZXkgaGVyZS4gTmljZSB0cnkgdGhvdWdoIQ=="

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *