Build Kubernetes bare metal cluster with external access
Kubernetes has been around for a while now. Although you can find numerous guides for various applications, they often require you to be already familiar with Kubernetes or running a cluster in a cloud environment provided by major providers.
This makes Kubernetes unnecessarily hard to get into. All you need to follow this guide is a basic understanding of networking concepts and some basic knowledge about containers. Kubernetes being a container orchestration tool implies you'll want to run containerized applications.
I won't go into too much detail about the internal workings of the cluster or Kubernetes itself. My goal is to provide you with an easy-to-follow guide to get your first cluster and application up and running. A lot of the tutorial is based on official documentation of components I used.
We'll use kubeadm to build a cluster that comes with minimum requirements:
- One or more machines running one of:
- Ubuntu 16.04+
- Debian 9+
- CentOS 7+
- Red Hat Enterprise Linux (RHEL) 7+
- Fedora 25+
- HypriotOS v1.0.1+
- Flatcar Container Linux (tested with 2512.3.0)
- 2 GB or more of RAM per machine (any less will leave little room for your apps).
- 2 CPUs or more.
- Full network connectivity between all machines in the cluster (public or private network is fine).
- Unique hostname, MAC address, and product_uuid for every node. See here for more details.
- Certain ports are open on your machines. See here for more details.
- Swap disabled. You **MUST** disable swap in order for the kubelet to work properly.
These are the official requirements. I would add one more, and it's sufficient disk space available on each machine. By default, Kubernetes will evict nodes when a device uses more than 90% of disk space. You can change this threshold but let's not complicate things any further.
Setting up the cluster, deploying the application, and exposing it to the outside world is, on paper, a pretty straightforward task. Still, when following official guides, you'll inevitably run into various roadblocks. I'll try to make this guide as complete as possible so you can avoid most of them.
I tested everything below on virtual machines with exactly minimum requirements, Ubuntu server 20.04 and 30GB drives.
- Installing curl, kubeadm, kubelet, and kubectl
1.1 Configure prerequisites
- Installing runtime - Containerd
2.1 Configure prerequisites
- Joining worker(s)
- Calico network plugin for CNI
- Deploying Nginx
pod: basic unit of a deployment, runs containers that can comunicate on localhost and share Pod IP address.
node: A single server in the cluster.
kubeadm: a toolbox to bootstrap the cluster.
kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers.
kubectl: the command line util to talk to your cluster.
master node: controls and manages worker nodes.
worker node: works
Installing Kubernetes - curl, kubeadm, kubelet and kubectl
First, you'll want to enable IP-Forwarding and let iptables see bridged traffic.
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
sudo sysctl --system
Add repo and install.
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
There's nothing to go wrong so far.
You will need container runtime to, well, run containers. Docker, containerd, and cri-o are well documented and commonly used.
containerd is an industry-standard container runtime with an emphasis on simplicity, robustness, and portability. It is available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system: image transfer and storage, container execution and supervision, low-level storage and network attachments, etc.
Docker is the most popular runtime by far, but for future compatibility purposes, let's use containerd.
There's been a bit of fuzz about Docker being deprecated in the future and the death of Docker as a result. To put your mind at ease, it came from a series of misconceptions. Cliffnotes version: Kubernetes uses Container Runtime Interface (CRI) to allow kubelet to interact with containers. Docker runtime was not built with Kubernetes in mind and is not CRI compliant. Docker, however, produces OCI images that will work with any runtime.
2.1 Configure prerequisites
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
sudo modprobe overlay
sudo modprobe br_netfilter
Setup required sysctl params. These persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
Apply sysctl params without reboot
sudo sysctl --system
sudo apt-get update && sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
Bootstrap the cluster
You have now installed everything you need to initialize your cluster. Before init, you have to specify a pod network. Keep in mind that every pod gets its own IP address. I use a very small /24 suffix in the example, but you will need more addresses for most real applications. Pods should also run in a different network than your host machines.
Run on your master node:
sudo kubeadm init --pod-network-cidr=192.168.2.0/24
If everything went well, you should see this message:
## Your Kubernetes control-plane has initialized successfully!
## To start using your cluster, you need to run the following as a regular user:
## mkdir -p $HOME/.kube
## sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
## sudo chown $(id -u):$(id -g) $HOME/.kube/config
## You should now deploy a pod network to the cluster.
## Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
## Then you can join any number of worker nodes by running the following on each as root:
## kubeadm join 192.168.0.10:6443 --token boq2jb.qk3gu4v01l5cg2xc \
## --discovery-token-ca-cert-hash sha256:35ad26fc926cb98e16f10447a1b43bc947d07c2c19b380c148d4c1478c7bf834
At this point, kubeadm is very instructive about what to do next. First, enable Kubernetes environment configuration for a regular user.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Now you must deploy a pod network plugin.
Why is a network plugin necessary?
Right now, your pods cannot communicate with one another. Even according to official documentation, Kubernetes networking can be challenging to understand exactly how it is expected to work. Kubernetes only provides guidelines on how networking should work. Luckily there are numerous implementations available.
Most popular CNI plugins include flannel, Calico, weave, canal, and more can be found here.
For easy setup, let's use Calico in this example.
Calico is an open-source networking and network security solution for containers, virtual machines, and native host-based workloads.
Kubernetes uses YAML files to describe objects. For the first time now, we'll need to edit one.
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
Change cidr field in custom-resource.yaml to match your pod network CIDR. Moreover, master nodes are by default tainted, which means that they can't run pods. While it's possible to set up toleration for specific pods, let's just remove the taint all together now.
kubectl create -f custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-
Now we can join a worker node.
Joining worker node
Everything you've done so far on the master node must be done on the worker node as well, with exception of running kubeadm init and copying /etc/Kubernetes/admin.conf to $HOME/.kube/config.
To add a node as a worker, you'll need to run the join command you saw when you initialized the cluster instead. You can print the join command any time using:
kubeadm token create --print-join-command
Don't forget to install a network plugin on worker nodes as well.
Now if you run kubectl get nodes on the master you should get similar output:
NAME STATUS ROLES AGE VERSION
mando Ready 7d v1.20.2
master Ready control-plane,master 7d v1.20.2
If you try to run the same on the worker node, you'll get an error. Either connection to the server localhost:8080 was refused - did you specify the right host or port? or failed to find any PEM data in certificate input. If you tried copying /etc/Kubernetes/admin.conf to $HOME/.kube/config you noticed that the file doesn't exist. A quick workaround is to copy contents of $HOME/.kube/config on the master to the same file on the worker. After that, you should be able to get the output above even on the worker.
Your cluster is now ready to deploy.
Now it's time to deploy some pods. Let's say our goal is to run five replicas of Nginx at any given time, and we want those replicas distributed as evenly as possible among the nodes.
Without us explicitly telling it not to, Kubernetes would have no problem scheduling all the pods on the same node provided the node has the necessary resources available.
Let's get some example configuration for us to modify.
Now we want to change the number of replicas to 5 and set topologySpreadConstraints. The final file should look roughly like this:
- name: nginx
- containerPort: 80
- maxSkew: 1
Topology Spread Constraints are available from Kubernetes v1.19 and quite fun to play with.
maxSkew defines a maximum permitted difference between the number of matching Pods. This is exactly what we want from our cluster. The difference is set to 1, which for five replicas on our two-node cluster means three and two pods scheduled on our nodes. To learn more about topologyKey visit documentation here.
whenUnsatisfiable: DoNotSchedule means that if maxSkew can't be satisfied, Kubernetes will not start more pods even if the number of replicas is lower than specified. The alternative is ScheduleAnyway, which is, however, a soft constraint and may result in ignoring maxSkew for a number of reasons.
And finally, we can deploy.
kubectl apply -f Nginx-deployment.yaml
You watch pods being created status with watch kubectl get pods -o wide.
When all pods are running, you should be able to query all Nginx pods from any node in a cluster on port 80. Note that pods do not respond to ICMP. To test the setup, you can use curl with the corresponding port. In my case:
You should now see a "Welcome to Nginx!" page. If you run into a "No Route to Host" problem, check firewall settings.
Right now, your webserver is running in a cluster in multiple replicas, but one more problem remains to be solved. So far, you can only access it from inside of the cluster.
In this final part, you want to set up access to your server from the outside. And since you are running it in the cluster, you don't really care which of the pods respond, but it would make sense to distribute the load equally among them.
Kubernetes uses services to make multiple pods accessible through one IP address. You can very easily create such a service for your deployment
kubectl expose deployment/nginx-deployment
Service will be assigned Cluster IP. Curl on this address on port 80 should lead to the Welcome page. But when you run kubectl get services, you'll notice no external IP has been assigned. You will need a load balancer for that.
Kubernetes supports two ways of exposing services to the outside world: NodePorts and LoadBalancers. NodePorts, while easy to setup, come with additional burden of manual port management which you probably want to avoid.
LoadBalancer service is available in Kubernetes, but is by itself unusable for bare metal clusters. It will only assign external IP when you are running Kubernetes in the cloud and depends on your provider's implementation. You can use NGINX Ingress Controller on bare-metal, which is well tested in production, but not exactly easy to get up and running.
Luckily there is an simple solution in the form of MetalLB. MetalLB is currently in beta, but for the most part, it just works and is very easy to set up. It can work in multiple modes, including BGP, but Layer 2 configuration is sufficient for us.
First few essentials:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.5/manifests/metallb.yaml
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
kubectl get configmap kube-proxy -n kube-system -o yaml | \
sed -e "s/strictARP: false/strictARP: true/" | \
kubectl apply -f - -n kube-system
You need to create config.yaml and specify the address-pool for MetalLB to give out. You can set your public IPs right away or your private network with additional forwarding. This time you can use the same network as the host machine's.
- name: my-ip-space
kubectl apply -f config.yaml
Now that you have the address pool ready, the last step is to create a LoadBalancer service to get one of the addresses assigned to the service. Example loadbalancer.yaml:
- name: http
kubectl apply -f loadbalancer.yaml
And that's it! When you check kubectl get services, you should see an external address, assigned by MetalLB, through which your server is accessible from the outside of the cluster.