Showing posts with label Pod. Show all posts
Showing posts with label Pod. Show all posts

How to Create, Troubleshoot and Use NFS type Persistent Storage Volume in Kubernetes

Whether you need to simply persists the data or share data among pods, one of the options is to use Network File System (NFS) type Persistent Volumes (PV).
However, you may encounter multiple issues and a lot of times error message(s) you see in the pod's log not detailed enough or even misleading. In this blog post, I'm going to show you step by step process (with real example) of creating PV, Persistence Volume Claims (PVC) and use them in a pod. We'll also discuss the possible issues and how to resolve them.

Prerequisites for this exercise:

  1. Make sure you have working Kubernetes cluster where you can create resources as needed. 
  2. Make sure you have a working Network File System (NFS) server and is accessible from all Kubernetes nodes in the Kubernetes cluster.

Process steps:

1) Allow Kubernetes pod/container to use NFS

1.1) Check, if selinux is enabled on your Kubernetes cluster nodes/hosts (where Kubernetes pod(s) will be created). If it is enabled, we need to make sure it lets container/pod to access remote NFS share.

$> sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28

1.2) If it is enabled, find out the value of 'virt_use_nfs'. You can use either 'getsebool' or 'semanage' utilities as shown below:

$> getsebool virt_use_nfs
virt_use_nfs --> off
or
$> sudo semanage boolean -l | grep virt_use_nfs
virt_use_nfs (off , off) Allow virt to use nfs

1.3) If value of 'virt_use_nfs' is 'off', make sure to enable it; otherwise, any attempt by Kubernetes pod to access NFS share may be denied and you may get '403 Forbidden error' from your application.You can use 'setsebool' tool to set value as '1' or 'on'

$> sudo setsebool -P virt_use_nfs 1

$> sudo semanage boolean -l | grep virt_use_nfs
virt_use_nfs (on , on) Allow virt to use nfs

Note: -P option is to set the value permanently.

2) Create NFS share on NFS server

2.1) create  a directory on NFS server. My NFS server's IP is 192.168.56.101. Here I'm creating directory '/var/rabbitmq' on NFS server as a NFS share and assigning the ownership to 'osboxes:osboxes'. We'll discuss the ownership of the share and it's relationship to pod/container security context little later in the post.

# Create directory to be shared.
sudo mkdir -p /var/rabbitmq


# Change the ownership
$> sudo chown osboxes:osboxes /var/rabbitmq


Important: The right ownership of the NFS share is crucial.

2.2) Add NFS share in /etc/exports file. Below, I'm adding all of my kubernetes nodes. Pods running on 192.168.56.101-103 will be able to access the NFS share. 'root_squash' option "squashes" the power of the remote root user to the lowest local user, preventing unauthorized alterations.

/var/rabbitmq/ 192.168.56.101(rw,sync,root_squash)
/var/rabbitmq/ 192.168.56.102(rw,sync,root_squash)
/var/rabbitmq/ 192.168.56.103(rw,sync,root_squash)


2.3) Export the NFS share.

sudo exportfs -a


3) Provisioning of PV and PVC

Let's create a PersistentVolume (PV), PersistentVolumeClaim (PVC) for RabbitMQ.
Note: it's important that the PVC and pod that uses it to be in the same namespace. You can create them all in default namespace. However, here I'm going to create a dedicated namespace for this purpose.

3.1) Create a new namespace or use existing one or default namespace.
Below yaml file (shared-services-ns.yml) defines a namespace object called 'shared-services':

apiVersion: v1
kind: Namespace
metadata:
   name: shared-services

To create the “shared-services” namespace, run the following command:

# Create a new namespace:
$> kubectl create -f shared-services-ns.yml
namespace "shared-services" created

# Verify namespace is created successfully
$> kubectl get namespaces shared-services
NAME              STATUS    AGE
shared-services   Active    36s

3.2) Create a new service account or use existing one or default:
If a service account is not set in the pod definition, the pod uses the default service account for the namespace. Here we are defining a new service account called 'shared-svc-accnt'. File: svcAccnt.yml

apiVersion: v1
kind: ServiceAccount
metadata:
   name: shared-svc-accnt
   namespace: shared-services

To create a new service account 'shared-svc-accnt', run the following command:

# Create service account
$> kubectl create -f svcAccnt.yml
serviceaccount "shared-svc-accnt" created

# Verify service account
$> kubectl describe serviceaccount shared-svc-accnt -n shared-services
Name:                shared-svc-accnt
Namespace:           shared-services
Labels:              
Annotations:         
Image pull secrets:  
Mountable secrets:   shared-svc-accnt-token-mgk9w
Tokens:              shared-svc-accnt-token-mgk9w
Events:              

3.3) Assign role/permission to service account:
Once, service account is created, make sure to provide necessary access permission to service account in the given namespace. Based on your Kubernetes platform, you may do it differently. Since, my Kubernetes is part of Docker Enterprise Edition (EE), I do it through Docker Universal Control Plane (UCP) as described in https://docs.docker.com/ee/ucp/authorization/grant-permissions/#kubernetes-grants. I'll assign 'restricted control' role to my service account 'shared-svc-accnt' in namespace 'shared-services'. If you are using MiniKube or other platform, you may want to refer to generic Kuberentes documents for RBAC and service account permission. Basically, you need to basically create the cluster role(s) and bind it to the service account. Here are some links to corresponding documentation. See https://v1-7.docs.kubernetes.io/docs/admin/authorization/rbac/#service-account-permissions and https://kubernetes.io/docs/reference/access-authn-authz/rbac/#role-and-clusterrole

3.4) Define PV object in a yaml file (rabbitmq-nfs-pv.yml):

apiVersion: v1
kind: PersistentVolume
metadata:
  name: rabbitmq-nfs-pv
  namespace: shared-services
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - ReadWriteMany
  nfs:
    path: /var/rabbitmq/
    server: 192.168.56.101
  persistentVolumeReclaimPolicy: Retain

Note: currently a PVcan have “Retain”, “Recycle”, or “Delete” reclaim policies. For dynamically provisioned PV, the default reclaim policy is “Delete”. Kubernetes supports following access modes:

  • ReadWriteOnce – the volume can be mounted as read-write by a single node
  • ReadOnlyMany – the volume can be mounted read-only by many nodes
  • ReadWriteMany – the volume can be mounted as read-write by many nodes

To create a new PV 'rabbitmq-nfs-pv', run the following command:

# Create PV
$> kubectl create -f rabbitmq-nfs-pv.yml
persistentvolume "rabbitmq-nfs-pv" created

# Verify PV
$> kubectl describe pv rabbitmq-nfs-pv
Name:            rabbitmq-nfs-pv
Labels:          
Annotations:     
Finalizers:      []
StorageClass:
Status:          Available
Claim:
Reclaim Policy:  Retain
Access Modes:    RWX
Capacity:        5Gi
Node Affinity:   
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    192.168.56.101
    Path:      /var/rabbitmq/
    ReadOnly:  false
Events:        

3.5) Define PVC object in a yaml file ( rabbitmq-nfs-pvc.yml):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rabbitmq-nfs-pvc
  namespace: shared-services
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi

Note: make sure to create PVC in the same namespace as your pod(s) that use it.

To create a new PVC 'rabbitmq-nfs-pvc', run the following command:

# Create PVC
$> kubectl create -f rabbitmq-nfs-pvc.yml
persistentvolumeclaim "rabbitmq-nfs-pvc" created

# Verify PVC
$> kubectl describe pvc rabbitmq-nfs-pvc -n shared-services
Name:          rabbitmq-nfs-pvc
Namespace:     shared-services
StorageClass:
Status:        Bound
Volume:        rabbitmq-nfs-pv
Labels:        
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    []
Capacity:      5Gi
Access Modes:  RWX
Events:        

Important: see the status above. It's "Bound" and it's bound to volume "rabbitqm-nfs-pv" that we created in previous step. If your PVC is not able to bind with PV, then it's a problem. It could be problem in defining the PV and PVC. Make sure your PV and PVC are of same storage class (if you are using one. For details refer to https://kubernetes.io/docs/concepts/storage/storage-classes/), and PV can fully satisfy the specification defined in PVC.


3.7) Now let's put together a simple yaml file that defines service and deployment objects for RabbitMQ (rabbitmq-nfs-pv-poc-depl.yml):

apiVersion: v1
kind: Service
metadata:
  name: rabbitmq-nfs-poc-svc
  namespace: shared-services
  labels:
    app: rabbitmq-nfs-poc-svc
spec:
  type: NodePort
  ports:
  - name: http
    port: 15672
    targetPort: 15672
  - name: amqp
    protocol: TCP
    port: 5672
    targetPort: 5672
  selector:
    app: rabbitmq-app
---
apiVersion: apps/v1beta2 # for versions prior to 1.9.0
kind: Deployment
metadata:
  name: rabbitmq-depl
  namespace: shared-services
spec:
  selector:
    matchLabels:
      app: rabbitmq-app
  replicas: 1
  template:
    metadata:
      labels:
        app: rabbitmq-app
    spec:
      serviceAccountName: shared-svc-accnt
      securityContext:
        runAsUser: 1000
        supplementalGroups: [1000,65534]
      containers:
      - name: rabbitmq-cnt
        image: rabbitmq
        imagePullPolicy: IfNotPresent
        #privileged: false
        #securityContext:
          #runAsUser: 1000
        ports:
        - containerPort: 15672
          name: http-port
          protocol: TCP
        - containerPort: 5672
          name: amqp
          protocol: TCP
        volumeMounts:
          # 'name' must match the volume name below.
          - name: rabbitmq-mnt
            # Where to mount the volume.
            mountPath: "/var/lib/rabbitmq/"
      volumes:
      - name: rabbitmq-mnt
        persistentVolumeClaim:
          claimName: rabbitmq-nfs-pvc
 

Note:
As seen in the rabbitmq-nfs-pv-poc-depl.yml above, I'm defining the security context in the pod level as:

securityContext:
  runAsUser: 1000
  supplementalGroups: [1000,65534]

Here runAsUser's value '1000' and supplementalGroups' value '1000' belong to user 'osboxes' and group 'osboxes'. gid '65534' belongs to group 'nfsnobody'.

$> id osboxes
uid=1000(osboxes) gid=1000(osboxes) groups=1000(osboxes),10(wheel),983(docker)

$> id nfsnobody
uid=65534(nfsnobody) gid=65534(nfsnobody) groups=65534(nfsnobody)

My NFS share '/var/rabbitmq' is owned by 'osboxes:osboxes', so I'm specifying those values that belong to osboxes in the securityContext.

Security context can be defined both on pod level as well as container level. Security context defined in the pod level is applied to all containers in the pod. https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ has details about configuring security context for pod or container.


Following command creates rabbitmq deployment and service:

# Create objects 
$> kubectl create -f rabbitmq-nfs-pv-poc-depl.yml
service "rabbitmq-nfs-poc-svc" created
deployment.apps "rabbitmq-depl" created

# Get pods $> kubectl get pods -n shared-services
NAME                            READY     STATUS    RESTARTS   AGE
rabbitmq-depl-775496b9b-d85l7   1/1       Running   0          7s


Let's check the rabbitmq processes inside the container and files under '/var/rabbitmq' share on NFS server.

# Check process inside the container
$> kubectl exec -it rabbitmq-depl-775496b9b-d85l7 /bin/bash -n shared-services
$> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
1000         1     0  0 12:38 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
1000       162     1  0 12:38 ?        00:00:00 /usr/lib/erlang/erts-9.3.3.2/bin/epmd -daemon
1000       321     1  5 12:38 ?        00:00:03 /usr/lib/erlang/erts-9.3.3.2/bin/beam.smp -W w -

# Connect to NFS server 


$> ssh osboxes@192.168.56.101
Last login: Sun Aug 26 14:48:19 2018 from centosddcclnt

Make sure rabbitmq successfully created the file and review the file ownership
$> cd /var/rabbitmq
$> ls -la
total 28
drwxr-xr-x.  5 osboxes osboxes   4096 Aug 26 13:40 .
drwxr-xr-x. 25 root    root      4096 Aug 26 13:34 ..
-rw-------.  1 osboxes nfsnobody   40 Aug 26 13:40 .bash_history
drwxr-xr-x.  3 osboxes nfsnobody 4096 Aug 26 13:38 config
-r--------.  1 osboxes nfsnobody   20 Aug 26 01:00 .erlang.cookie
drwxr-xr-x.  4 osboxes nfsnobody 4096 Aug 26 13:38 mnesia
drwxr-xr-x.  2 osboxes nfsnobody 4096 Aug 26 13:38 schema



4) Possible issues & troubleshooting

4.1) Pod remain in pending state and pod description shows 'mount failed: exit status 32' as shown below:

$> kubectl describe pod rabbitmq-shared-app -n shared-services
Name:         rabbitmq-shared-app
Namespace:    shared-services
Node:         centosddcwrk01/192.168.56.103
Start Time:   Thu, 16 Aug 2018 17:03:19 +0100
Labels:       name=rabbitmq-shared-app
Annotations:  
Status:       Pending
IP:
  ...
  ...
...
Events:
  Type     Reason                 Age   From                   Message
  ----     ------                 ----  ----                   -------
  ...
  Warning  FailedMount            50s   kubelet, centosddcucp  MountVolume.SetUp failed for volume .... : mount failed: exit status 32

If you try to run the mount manually from inside the container, you may see following:

$> kubectl exec -it rabbitmq-depl-bd9689c8-7md48 /bin/bash -n shared-services
root@rabbitmq-depl-bd9689c8-7md48:/# pwd
/


root@rabbitmq-depl-bd9689c8-7md48:/# mount -t nfs 192.168.56.101:/var/rabbitmq /tmp/test
mount: wrong fs type, bad option, bad superblock on 192.168.56.101:/var/rabbitmq,
       missing codepage or helper program, or other error
       (for several filesystems (e.g. nfs, cifs) you might
       need a /sbin/mount. helper program)

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

In this case, review the '/etc/exports' file on NFS server.  This file controls which file systems are exported to remote hosts and specifies options. If your Kubernetes host/node is not listed
in this file with appropriate option(s), a pod running on that node will not be able to mount. Make sure to run the command 'sudo exportfs -a' once you have updated the /etc/exports. You can also try to manually mount from your host (instead of from within the container) in order to test if that host/node is authorized to mount. Refer to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/deployment_guide/s1-nfs-server-config-exports for details.


4.2) Pod fails to instantiate and you see 'chown: changing ownership of '/var/lib/rabbitmq': Operation not permitted' error in the log as shown below:

$> kubectl create -f rabbitmq-nfs-pv-poc-depl.yml
service "rabbitmq-nfs-poc-svc" created
deployment.apps "rabbitmq-depl" created

$> kubectl get pods -n shared-services
NAME                             READY     STATUS             RESTARTS   AGE
rabbitmq-depl-5fff645d95-429vd   0/1       CrashLoopBackOff   1          14s

$> kubectl logs rabbitmq-depl-5fff645d95-429vd -n shared-services
chown: changing ownership of '/var/lib/rabbitmq': Operation not permitted

This means that the pod is able to mount successfully, however, it's not able to change the ownership of file/directory. The easiest way to resolve this issue is to have a common user that owns NFS share on NFS server and runAsUser of Kubernetes pod. For example, for this demo, I have used 'osboxes' user which owns the NFS share and also use this user's uid '1000' in the pod level security context.

$> ls -lZ /var/rabbitmq
drwxr-xr-x. osboxes nfsnobody system_u:object_r:var_t:s0       ...

$> id osboxes
uid=1000(osboxes) gid=1000(osboxes) groups=1000(osboxes),10(wheel),983(docker)

In reality, it may not be that easy. You may not have access to remote NFS server or system administrator of NFS server is not willing to change the ownership of NFS share on NFS server. In this case (as a work-around), you can use 'root' as runAsUser like below in the container level:

securityContext:
  runAsUser: 0

However, for this to work properly, the /etc/exports file on NFS server should not squash (use 'no_root_squash') the root. It should look something like this:

/var/rabbitmq/ 192.168.56.103(rw,sync,no_root_squash)

'no_root_squash' has it's own security consequences. See details here https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/deployment_guide/s1-nfs-server-config-exports

In summary, in order to grant pod's access to PVs you need to take considerations of:

  • Finding the group ID and/or user ID assigned to the actual storage (on NFS server)
  • SELinux considerations,
  • Also making sure that the IDs allowed to access physical storage match the requirements of the particular pod.

The Group IDs, the user ID, and SELinux values can be defined in the pod's SecurityContext section. User IDs can also be defined to each container. So, in short you can use the following user, group and options to control and find the right combination:
  • supplementalGroups
  • fsGroup
  • runAsUser
  • seLinuxOptions

Hope, it helps you a little bit!

Note: yaml files used in this post can be downloaded from Github location: https://github.com/pppoudel/kube-pv-pvc-demo