MultiTenancy on Kubernetes

February 3, 2020 — #Kubernetes #Docker

Problem

Workload for Client specific task can be made to run on client specific hardware , They can be achieved by Node Selector or Taint / Tolerances , Both path way have perk and cons , This document provides that information.

Sometimes client have their own hardware where their task , Apps and other long running workload run. Current Rancher master has reached EOL and we are planning to move to current K8s Platform . Due to better functionality and redundancy resource requirements have increased thus we are looking for better solution for client cluster. Finally solution is to merge those node into MT K8s cluster but still keeping them segregated by VLANs and some mechanism for separation of workload .

Following methods can be used for workload segregation.

Node Selectors via Labels / Affinity : https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity

Taint and Tolerances : https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/

Solution

What is multi-tenant Kubernetes?

Multi-tenant Kubernetes is a Kubernetes deployment where multiple applications or workloads run side-by-side.
Multi-tenancy is a common architecture for organizations that have multiple applications running in the same environment, or where different teams (like developers, and IT Ops) share the same Kubernetes environment.
You can think of multi-tenant Kubernetes as being akin to an apartment building, whereas single-tenant Kubernetes is like a single-family house.

Option 1 : Taint & Tolerances

Description:

Taints are a Kubernetes feature that repel pods from nodes. When a node receives a taint, all pods that do not have a matching tolerance will be repelled from that node.

This method works as DENY ALL UNLESS ALLOWED .

Example :

# Node Taint
>>> kubectl taint nodes test-node-1 my-taint=test:NoSchedule
>>> kubectl describe node ip-192-168-101-21.us-west-2.compute.internal

Name:               test-node-1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m4.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-west-2
                    failure-domain.beta.kubernetes.io/zone=us-west-2a
                    kubernetes.io/hostname=ip-192-168-101-21.us-west-2.compute.internal
                    pipeline-nodepool-name=pool1
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Wed, 29 Aug 2018 11:31:53 +0200
Taints:             my-taint=test:NoSchedule


#Pod Tolerance
    spec:
     tolerations:
      - key: "my-taint"
        operator: Equal
        value: "test"

Pro:

Pods from other orgs/namespaces are guaranteed to not schedule on nodes that are owned by certain clients. Best security and workload isolation between clients.

This will require the least amount of work on software side. Only pods by clients would need changes to their pod configuration. ( Its same amount either way ;p )

Cons:

In case of failure of nodes , Spilling Client work on MT cluster will be extremely difficult
Debugging failure of container by starting test container will require tolerance thus additional configuration by pod ( lots of work)
Starting Client specific workload by helm requires additional configuration
Each of Master/ System level config needs to be added tolerance for given taints.

Option 2 : Node Selectors via Labels / Affinity

Description:

Here each pod will start on only nodes labeled with certain labels.

Example :

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

Simpler Config:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

Pro:

Better fault tolerance and greater capacity for all clients. If a tainted node fails, other nodes could have trouble starting up all failed pods which would increase recovery time.
Moving Workload around during failure is lot easier.
Better ease in starting debug workloads.
Easier and Detailed configuration for rules including weight based scheduling
Future support for APPs Frame work.

Cons:

Node selector have chance to start on different client node if workload has not specified to start on MT nodes. ( MOSTLY MANUALLY STARTED PODS for debugging : This should be reduces as much as we can . )
Node selector can run into extreme detailed configuration ( hard to grasp)