Build Kubernetes bare metal cluster with external access

Kubernetes has been around for a while now. Although you can find numerous guides for various applications, they often require you to be already familiar with Kubernetes or running a cluster in a cloud environment provided by major providers.

This makes Kubernetes unnecessarily hard to get into. All you need to follow this guide is a basic understanding of networking concepts and some basic knowledge about containers. Kubernetes being a container orchestration tool implies you'll want to run containerized applications.

I won't go into too much detail about the internal workings of the cluster or Kubernetes itself. My goal is to provide you with an easy-to-follow guide to get your first cluster and application up and running. A lot of the tutorial is based on official documentation of components I used.

Prerequisites:

We'll use kubeadm to build a cluster that comes with minimum requirements:

- One or more machines running one of:
    - Ubuntu 16.04+
    - Debian 9+
    - CentOS 7+
    - Red Hat Enterprise Linux (RHEL) 7+
    - Fedora 25+
    - HypriotOS v1.0.1+
    - Flatcar Container Linux (tested with 2512.3.0)
- 2 GB or more of RAM per machine (any less will leave little room for your apps).
- 2 CPUs or more.
- Full network connectivity between all machines in the cluster (public or private network is fine).
- Unique hostname, MAC address, and product_uuid for every node. See here for more details.
- Certain ports are open on your machines. See here for more details.
- Swap disabled. You **MUST** disable swap in order for the kubelet to work properly.

These are the official requirements. I would add one more, and it's sufficient disk space available on each machine. By default, Kubernetes will evict nodes when a device uses more than 90% of disk space. You can change this threshold but let's not complicate things any further.

Setting up the cluster, deploying the application, and exposing it to the outside world is, on paper, a pretty straightforward task. Still, when following official guides, you'll inevitably run into various roadblocks. I'll try to make this guide as complete as possible so you can avoid most of them.

I tested everything below on virtual machines with exactly minimum requirements, Ubuntu server 20.04 and 30GB drives.

Necessary steps:

Installing curl, kubeadm, kubelet, and kubectl
1.1 Configure prerequisites
1.2 Installation
Installing runtime - Containerd
2.1 Configure prerequisites
2.2 Installation
Joining worker(s)
Calico network plugin for CNI
Deploying Nginx
MetalLB

Quick explainer:

pod: basic unit of a deployment, runs containers that can comunicate on localhost and share Pod IP address.

node: A single server in the cluster.

kubeadm: a toolbox to bootstrap the cluster.

kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.

kubectl: the command line util to talk to your cluster.

master node: controls and manages worker nodes.

worker node: works

Installing Kubernetes - curl, kubeadm, kubelet and kubectl

First, you'll want to enable IP-Forwarding and let iptables see bridged traffic.

1.1 Setup

echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sudo sysctl --system

1.2 Installation

Add repo and install.

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

There's nothing to go wrong so far.

Containerd

You will need container runtime to, well, run containers. Docker, containerd, and cri-o are well documented and commonly used.

Official description:

containerd is an industry-standard container runtime with an emphasis on simplicity, robustness, and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.

Docker is the most popular runtime by far, but for future compatibility purposes, let's use containerd.

There's been a bit of fuzz about Docker being deprecated in the future and the death of Docker as a result. To put your mind at ease, it came from a series of misconceptions. Cliffnotes version: Kubernetes uses Container Runtime Interface (CRI) to allow kubelet to interact with containers. Docker runtime was not built with Kubernetes in mind and is not CRI compliant. Docker, however, produces OCI images that will work with any runtime.

2.1 Configure prerequisites

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

Setup required sysctl params. These persist across reboots.

cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

Apply sysctl params without reboot

sudo sysctl --system

2.2 Install

sudo apt-get update && sudo apt-get install -y containerd

Configure containerd

sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml

Restart containerd

sudo systemctl restart containerd

Bootstrap the cluster

You have now installed everything you need to initialize your cluster. Before init, you have to specify a pod network. Keep in mind that every pod gets its own IP address. I use a very small /24 suffix in the example, but you will need more addresses for most real applications. Pods should also run in a different network than your host machines.

Run on your master node:

sudo kubeadm init --pod-network-cidr=192.168.2.0/24

If everything went well, you should see this message:

## Your Kubernetes control-plane has initialized successfully!
## 
## To start using your cluster, you need to run the following as a regular user:
## 
##   mkdir -p $HOME/.kube
##   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
##   sudo chown $(id -u):$(id -g) $HOME/.kube/config
## 
## You should now deploy a pod network to the cluster.
## Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
##   https://kubernetes.io/docs/concepts/cluster-administration/addons/
## 
## Then you can join any number of worker nodes by running the following on each as root:
## 
## kubeadm join 192.168.0.10:6443 --token boq2jb.qk3gu4v01l5cg2xc \
##     --discovery-token-ca-cert-hash sha256:35ad26fc926cb98e16f10447a1b43bc947d07c2c19b380c148d4c1478c7bf834

At this point, kubeadm is very instructive about what to do next. First, enable Kubernetes environment configuration for a regular user.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Now you must deploy a pod network plugin.

Calico

Why is a network plugin necessary?

Right now, your pods cannot communicate with one another. Even according to official documentation, Kubernetes networking can be challenging to understand exactly how it is expected to work. Kubernetes only provides guidelines on how networking should work. Luckily there are numerous implementations available.

Most popular CNI plugins include flannel, Calico, weave, canal, and more can be found here.

For easy setup, let's use Calico in this example.

Official:

Calico is an open-source networking and network security solution for containers, virtual machines, and native host-based workloads.

Kubernetes uses YAML files to describe objects. For the first time now, we'll need to edit one.

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
wget https://docs.projectcalico.org/manifests/custom-resources.yaml

Change cidr field in custom-resource.yaml to match your pod network CIDR. Moreover, master nodes are by default tainted, which means that they can't run pods. While it's possible to set up toleration for specific pods, let's just remove the taint all together now.

kubectl create -f custom-resources.yaml 
kubectl taint nodes --all node-role.kubernetes.io/master-

Now we can join a worker node.

Joining worker node

Everything you've done so far on the master node must be done on the worker node as well, with exception of running kubeadm init and copying /etc/Kubernetes/admin.conf to $HOME/.kube/config.

To add a node as a worker, you'll need to run the join command you saw when you initialized the cluster instead. You can print the join command any time using:

kubeadm token create --print-join-command

Don't forget to install a network plugin on worker nodes as well.

Now if you run kubectl get nodes on the master you should get similar output:

NAME     STATUS   ROLES                  AGE   VERSION
mando    Ready                     7d    v1.20.2
master   Ready    control-plane,master   7d    v1.20.2

If you try to run the same on the worker node, you'll get an error. Either connection to the server localhost:8080 was refused - did you specify the right host or port? or failed to find any PEM data in certificate input. If you tried copying /etc/Kubernetes/admin.conf to $HOME/.kube/config you noticed that the file doesn't exist. A quick workaround is to copy contents of $HOME/.kube/config on the master to the same file on the worker. After that, you should be able to get the output above even on the worker.

Your cluster is now ready to deploy.

Nginx

Now it's time to deploy some pods. Let's say our goal is to run five replicas of Nginx at any given time, and we want those replicas distributed as evenly as possible among the nodes.

Without us explicitly telling it not to, Kubernetes would have no problem scheduling all the pods on the same node provided the node has the necessary resources available.

Let's get some example configuration for us to modify.

wget https://k8s.io/examples/controllers/nginx-deployment.yaml

Now we want to change the number of replicas to 5 and set topologySpreadConstraints. The final file should look roughly like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx

Topology Spread Constraints are available from Kubernetes v1.19 and quite fun to play with.

maxSkew defines a maximum permitted difference between the number of matching Pods. This is exactly what we want from our cluster. The difference is set to 1, which for five replicas on our two-node cluster means three and two pods scheduled on our nodes. To learn more about topologyKey visit documentation here.

whenUnsatisfiable: DoNotSchedule means that if maxSkew can't be satisfied, Kubernetes will not start more pods even if the number of replicas is lower than specified. The alternative is ScheduleAnyway, which is, however, a soft constraint and may result in ignoring maxSkew for a number of reasons.

And finally, we can deploy.

kubectl apply -f Nginx-deployment.yaml

You watch pods being created status with watch kubectl get pods -o wide.

When all pods are running, you should be able to query all Nginx pods from any node in a cluster on port 80. Note that pods do not respond to ICMP. To test the setup, you can use curl with the corresponding port. In my case:

curl 192.168.2.87:80

You should now see a "Welcome to Nginx!" page. If you run into a "No Route to Host" problem, check firewall settings.

Right now, your webserver is running in a cluster in multiple replicas, but one more problem remains to be solved. So far, you can only access it from inside of the cluster.

Metallb

In this final part, you want to set up access to your server from the outside. And since you are running it in the cluster, you don't really care which of the pods respond, but it would make sense to distribute the load equally among them.

Kubernetes uses services to make multiple pods accessible through one IP address. You can very easily create such a service for your deployment

kubectl expose deployment/nginx-deployment

Service will be assigned Cluster IP. Curl on this address on port 80 should lead to the Welcome page. But when you run kubectl get services, you'll notice no external IP has been assigned. You will need a load balancer for that.

Kubernetes supports two ways of exposing services to the outside world: NodePorts and LoadBalancers. NodePorts, while easy to setup, come with additional burden of manual port management which you probably want to avoid.

LoadBalancer service is available in Kubernetes, but is by itself unusable for bare metal clusters. It will only assign external IP when you are running Kubernetes in the cloud and depends on your provider's implementation. You can use NGINX Ingress Controller on bare-metal, which is well tested in production, but not exactly easy to get up and running.

Luckily there is an simple solution in the form of MetalLB. MetalLB is currently in beta, but for the most part, it just works and is very easy to set up. It can work in multiple modes, including BGP, but Layer 2 configuration is sufficient for us.

First few essentials:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"

kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl apply -f - -n kube-system

You need to create config.yaml and specify the address-pool for MetalLB to give out. You can set your public IPs right away or your private network with additional forwarding. This time you can use the same network as the host machine's.

Example config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: my-ip-space
      protocol: layer2
      addresses:
      - 192.168.0.210-192.168.0.250

Apply:

kubectl apply -f config.yaml

Now that you have the address pool ready, the last step is to create a LoadBalancer service to get one of the addresses assigned to the service. Example loadbalancer.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-balancer
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

And finally:

kubectl apply -f loadbalancer.yaml

And that's it! When you check kubectl get services, you should see an external address, assigned by MetalLB, through which your server is accessible from the outside of the cluster.