Kubernetes Architecture and Best Practices
Disclaimer: This document is a collection of best practices for Kubernetes users. It is highly opinionated and reflects my own experiences and preferences.
Namespaces
The Kubernetes namespace is a virtual cluster that provides a way to divide cluster resources between multiple users. It's a way to create a virtual cluster inside a physical cluster.
Principles
- Namespaces are a way to divide cluster resources between multiple users logically.
- The resources in a namespace can be limited by using
ResourceQuotaandLimitRange. - The permissions to access resources in a namespace can be controlled using role-based access control (RBAC).
- The network access to pods in a namespace can be controlled by using
NetworkPolicy. - The services in a namespace can be discovered by using the DNS name of the service.
Rules for Namespaces
The name of a namespace must be a valid RFC 1123 DNS label. The following rules apply:
- contains at most 63 characters
- contains only lowercase alphanumeric characters or '-'
- starts and ends with an alphanumeric character
NOTE: For more information about RFC 1123, see RFC 1123.
Initial Namespaces
These are the four initial namespaces that Kubernetes starts with:
default: The default namespace set by the system. It's intended for objects that don't specify any of the namespaces.kube-system: This namespace is assigned to resources that are created by the Kubernetes system.kube-public: This namespace is created by the system and is visible to all users, even users who aren't authenticated. Usually, this namespace is focused on the internal use of the platform cluster in situations where some of the resources need to be publicly visible and readable for the entire cluster.kube-node-lease: This namespace holds lease objects associated with each node. These leases allow the kubelet to send heartbeats so that you can determine node availability.
Use Convenient and Scalable Names
Naming is at the root of programming and is one of its basic building blocks. Names should be meaningful and provide context. Therefore, it's recommended to use names that are expressive and scalable.
For example, if you're working on a streaming application, you can name the namespace "stream". For the different development environments, you can scale this name by adding a suffix, for example, "stream-dev" for the development environment, "stream-test" for testing, and "stream-prod" for production.
Attach Labels to Namespaces
Labels in Kubernetes are not just a way to distinguish resources, but they're also a major source of metadata that can be used to log, analyze, and audit resources.
Though it's considered a best practice to use labels throughout Kubernetes, using them in a namespace is essential when you have a large team. Here is an example:
1 kubectl create namespace namespace_name
2 kubectl label namespaces namespace_name labelname=value --overwrite=true
Use RBAC to Allocate Resources
Using role-based access control (RBAC), you can authorize and limit users' access to certain resources. You can manage access locally within a cluster and globally to the entire cluster.
To use RBAC for a specific namespace, you can use the Role resource type while the ClusterRole resource type can be used globally.
Using RBAC helps you to secure clusters and manage resources by defining permissions based on roles.
Use ResourceQuota and LimitRange
The namespaces in a cluster don't all need the same resources. Giving all namespaces equal resources can compromise the performance of key namespaces. Use a resource quota to limit the resource usage of particular namespaces.
Use Kubernetes ResourceQuota to control the number of resources that can be created in a namespace and LimitRange to restrict the consumption of resources
by pods.
Here's an example of how to use ResourceQuota:
1 apiVersion: v1
2 kind: ResourceQuota
3 metadata:
4 name: compute-high
5 spec:
6 hard:
7 pods: "10"
8 cpu: "10"
9 memory: 10Gi
10 requests.nvidia.com/gpu: "2"
11 scopeSelector:
12 matchExpressions:
13 - operator: In
14 scopeName: PriorityClass
15 values:
16 - high
And here's an example of how to use LimitRange:
1 apiVersion: v1
2 kind: LimitRange
3 metadata:
4 name: cpu-mem-limit-range
5 spec:
6 limits:
7 - default: # this section defines default limits
8 cpu: 500m
9 memory: 512Mi
10 defaultRequest: # this section defines default requests
11 cpu: 500m
12 memory: 512Mi
13 max: # max and min define the limit range
14 cpu: "1"
15 memory: 1Gi
16 min:
17 cpu: 100m
18 memory: 64Mi
19 type: Container
Use a NetworkPolicy
Kubernetes allows different pods across clusters to communicate. To secure the pods and only allow the desired traffic to pods from selected sources, it's necessary to use a NetworkPolicy for each namespace along with a CNI plugin to restrict communications. Using a NetworkPolicy will allow you to deny ingress, egress, or any unwanted traffic coming into pods through the namespace.
Here's an example of how to use a NetworkPolicy:
1 apiVersion: networking.k8s.io/v1
2 kind: NetworkPolicy
3 metadata:
4 name: test-network-policy
5 namespace: default
6 spec:
7 podSelector:
8 matchLabels:
9 role: db
10 policyTypes:
11 - Ingress
12 - Egress
13 ingress:
14 - from:
15 - ipBlock:
16 cidr: 172.17.0.0/16
17 except:
18 - 172.17.1.0/24
19 - namespaceSelector:
20 matchLabels:
21 project: myproject
22 - podSelector:
23 matchLabels:
24 role: frontend
25 ports:
26 - protocol: TCP
27 port: 6379
28 egress:
29 - to:
30 - ipBlock:
31 cidr: 10.0.0.0/24
32 ports:
33 - protocol: TCP
34 port: 5978
Don't Create Too Many Namespaces
Even though there's no restriction on how many namespaces you can create and how many namespaces Kubernetes can handle, it's best to avoid creating too many namespaces.
Creating namespaces without any definite function can become difficult to manage and too many namespaces can affect the efficient consumption of resources.
Don't Shy Away From Creating a Cluster
Namespaces are used to create virtual clusters to segregate resources and reduce costs. However, it's important to understand that as your team grows, the better FinOps approach is to create additional clusters rather than creating namespaces so that you don't to compromise on performance.
Don't Use the Default Namespace
All objects created without a specified namespace are placed in the Kubernetes "default" namespace. If you use the "default" namespace, it can become difficult to segregate objects in it or implement RBAC and NetworkPolicies.
Have an Idea of What's Inside
For better management of the Kubernetes cluster, it's important to understand which objects and resources are located in namespaces. This includes objects such as pods, replication controllers managed by the Kubernetes controller manager, and others.
However, some elements are responsible for representing these resources are found outside Kubernetes namespaces. Additionally, low-level resources, such as persistent volumes and nodes, aren't found within namespaces. Services like Release use dynamic provisioning in Kubernetes to provide on-demand environments that reduce the management overhead required to create persistent volumes.
Sync Secrets
Secrets in Kubernetes often need to exist in multiple namespaces in a cluster so pods can access them. Registry credentials, for example, need to exist in all namespaces in a cluster. If you have many namespaces, managing registry credentials manually can be tricky.
Syncing secrets allows you to copy "regcred" to all new namespaces when they are created and pushes updates to the copied secrets.
General Practices
- Backups: Do have working backups.
- Containerization: Do make sure your program runs nicely in a container first.
- Image Optimization: Do make your container images small and void of build tools, debug tools, shells etc. Nothing that isn't required to run the system in production.
- Graceful Shutdown: Do make sure your program can gracefully shutdown on SIGTERM.
- 12 Factor App: Do check out 12factor.net and figure out how much it matches these guidelines.
- Manifest Storage: Do store your manifests in git.
- Manifest Synchronization: Do have a system that syncs those manifests automatically, be it push (e.g., your CI/CD system) or pull-based (e.g., a GitOps controller like ArgoCD or Flux).
- Local Testing: Do test out your programs locally using kind/k3d/minikube etc. before deploying them to your actual clusters. Ideally, test this in CI.
- Monitoring and Logging: Do make sure you have monitoring systems and log aggregation systems in place.
- Migration Strategy: Do start migrating systems that are stateless, not stateful. Try to avoid dealing with storage within Kubernetes as long as possible.
- Kubernetes Concepts: Do read up on the basic concepts in Kubernetes. The relationship between Ingress, services, and pods for example.
- Development Tools: Do at least start playing with systems that help you develop your programs directly in Kubernetes (skaffold, tilt, devspace...).
Practices to Avoid
- Pod and Node Access: Don't exec into pods to do stuff or SSH into nodes to have your daily ops done.
- Cluster Deployment: Don't deploy your own clusters when you're starting out, just use some managed service.
- Resource Limits: Don't set CPU limits.
- Namespace Usage: Don't deploy anything in the default namespace. If you do by accident, remove it immediately.
Security Guidelines
- Port Usage: Do not use privileged ports (1-1024) for pods/containers, unless absolutely required by the vendor.
- Kernel Capabilities: Try to run workloads with as few kernel capabilities as possible. Aim for zero.
- Non-root Workloads: Do not run workload as root (UID 0).
- User ID Management: Aim to run each workload with a separate UID.
- Service Accounts: Run each workload with a different Service Account; do not use the namespace's default Service Account for all workloads.
- SA Secrets: Do not automount SA secrets (99% of pods do not need their SA's token for communicating with the K8s API server).
- Logging Practices: Do not write logs to disk; output to STDOUT/STDERR and use an external log aggregator.
- Secrets Management: Mount secrets as files over environment variables, as the former is an encrypted in-memory only volume while the latter is written to and accessible from the host's /proc directory.