How to Setup a Proxmox Cluster

Estimated reading time: 6 minutes

August 2, 2024 Gerard Samuel

I needed a means of spinning up virtual machines to try out solutions such as Kubernetes or GitLab runners, etc, on a long-term basis. I did not want to incur the cost of running operating systems on Cloud Infrastructure. ESXi was definitely not happening, as Broadcom had muddied the waters at the time. At first, I tried Proxmox, and then I tried Suse Harvester. I contemplated XCP-ng. After weighing what I needed, I settled back to Proxmox VE.

Requirements

Here are the requirements to follow along

At least three compute nodes
Each with an Intel or AMD CPU
Each with sufficient memory to run your virtual machines
Each with a primary and data disks
Each with network interfaces capable of 802.1Q VLANs
Network switches capable of 802.1Q VLANs
A thumb drive or PXE server (I highly recommend netboot.xyz) for the installer
An ACME PKI infrastructure is optional

Initial boot

If you’re using a USB drive for the installation, head over to Proxmox VE’s download page. Write the installer to the USB drive and insert it into a USB slot on the computer/server.

This part would vary depending on the hardware being used. Here, I am using Supermicro servers. Power on your nodes, boot off the USB drive or use PXE.

Agree to the EULA
Create a ZFS Raid1 mirrored volume for the operating system and Proxmox

Enter your country, time zone, and keyboard layout
For the root account, enter a password and email address (which can be fake for now)
Select the management network interface and configure the hostname, IP address, gateway, and DNS server
Review the summary and click Install. Give the process a few minutes to complete the installation and reboot

Package management

Proxmox is a mixture of Debian and Proxmox-specific software (targeted to enterprises). However, I do not need the enterprise subscription bits in this situation, so I will modify how Proxmox gets its packages and updates. Perform the following on all nodes.

Disable the enterprise subscriptions

sed -i'.bak' -e '1 s/^/# /' /etc/apt/sources.list.d/pve-enterprise.list
sed -i'.bak' -e '1 s/^/# /' /etc/apt/sources.list.d/ceph.list

Add “non-free-firmware” software sources

sed -i'.bak' -e '/.debian.org/ s/$/ non-free-firmware/' /etc/apt/sources.list

Add the “no-subscription” software repositories

tee --append /etc/apt/sources.list << EOF > /dev/null

# Proxmox VE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
EOF

tee /etc/apt/sources.list.d/ceph.list << EOF > /dev/null

# Ceph Reef no-subscription repository
deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription
EOF

Install updates, CPU-specific microcode, and then reboot

apt-get update
apt-get dist-upgrade -y

case $(lscpu | awk '/^Vendor ID:/{print $3}') in \
  "AuthenticAMD") apt-get install amd64-microcode -y ;; \
  "GenuineIntel") apt-get install intel-microcode -y ;; \
  esac
reboot

Networking

I choose not to use Proxmox’s SDN feature for networking. Instead, I will use VLANs to leverage my physical switching/routing infrastructure.

On each node, back up the network configuration file /etc/network/interfaces

cp /etc/network/interfaces /etc/network/interfaces.bak

I will write a custom network configuration file to move the management IP address to a dedicated network interface and form a bond with the 10Gb interfaces. I will also create two sub-interfaces for virtual machine migration and Ceph storage replication on the bond.

LAST_OCTET=`echo $(hostname -i) | cut -d . -f 4`

tee /etc/network/interfaces << EOF > /dev/null
auto lo
iface lo inet loopback

auto eno1
iface eno1 inet static
	address 192.168.100.${LAST_OCTET}/25
	gateway 192.168.100.1

auto eno2
iface eno2 inet manual

auto enp67s0f0
iface enp67s0f0 inet manual

auto enp67s0f1
iface enp67s0f1 inet manual

auto bond0
iface bond0 inet manual
       bond-slaves enp67s0f0 enp67s0f1
       bond-miimon 100
       bond-mode 802.3ad
       bond-xmit-hash-policy layer2+3

auto bond0.102
iface bond0.102 inet static
       address 192.168.102.${LAST_OCTET}/26

auto bond0.104
iface bond0.104 inet static
       address 192.168.104.${LAST_OCTET}/26

auto vmbr0
iface vmbr0 inet manual
       bridge-ports bond0
       bridge-stp off
       bridge-fd 0
       bridge-vlan-aware yes
       bridge-vids 2-4094

source /etc/network/interfaces.d/*
EOF

Tell the operating system to start using the updated network configuration

ifreload --all

Time synchronization

On each node, back up the chrony configuration file

cp /etc/chrony/chrony.conf /etc/chrony/chrony.conf.bak

Then, write out a new configuration file that includes the primary NTP source on an internal IP address and us.pool.ntp.org as the backup

tee /etc/chrony/chrony.conf << EOF > /dev/null
server 172.16.2.1 prefer iburst minpoll 2 maxpoll 2 xleave
pool us.pool.ntp.org iburst
driftfile /var/lib/chrony/chrony.drift
makestep 0.1 3
rtcsync
keyfile /etc/chrony/chrony.keys
logdir /var/log/chrony
leapsectz right/UTC
EOF

Restart the chrony service so the configuration change takes effect right away

systemctl restart chronyd

ACME Certificates

Add the root certificate from the certificate authority server to each node and update the truststore

curl -k https://ca.lab.howto.engineer:443/roots.pem -o /usr/local/share/ca-certificates/root_ca.crt

update-ca-certificates

On each node, register with the certificate authority server. Substitute <email@tld> with an appropriate email address

pvenode acme account register default <email@tld> \
  --directory https://ca.lab.howto.engineer/acme/acme/directory

Now set the domain for the acme http-01 challenge

pvenode config set --acme domains=$HOSTNAME.lab.howto.engineer

Request a certificate for each node

pvenode acme cert order

Note

Cluster formation

On the first node, create a cluster

pvecm create PROXMOX-CLUSTER

On the remaining nodes, execute the following. Enter the root password when prompted and accept adding the SSH key fingerprint

pvecm add compute-node01.lab.howto.engineer

Review the status of the cluster

pvecm status

Back on the first node, configure cluster-wide settings

sed -i "/console:/d" /etc/pve/datacenter.cfg
sed -i "/crs:/d" /etc/pve/datacenter.cfg
sed -i "/ha:/d" /etc/pve/datacenter.cfg
sed -i "/migration:/d" /etc/pve/datacenter.cfg

tee --append /etc/pve/datacenter.cfg << EOF > /dev/null
console: html5
crs: ha=static,ha-rebalance-on-start=1
ha: shutdown_policy=migrate
migration: insecure,network=192.168.102.0/26
EOF

Then create an HA (High Availability) group, giving each node an equal weight

ha-manager groupadd ha-global-group \
  --nodes "compute-node01:100,compute-node02:100,compute-node03:100"

NFS Backend

I use an NFS share to store ISO and cloud-init files. The next step is to add this share to the cluster.

On the first node, validate that the NFS server exports are available to the cluster nodes. The output should list the IP addresses for each node associated with the NFS share.

pvesm nfsscan <NFS-SERVER-IP> | grep -E '192\.168\.100\.2[1-3]'

From the first node, run the following. Substitute the values of <NFS-SERVER-IP> and <NFS-SHARE-PATH> for the NFS server and share path details

tee /etc/pve/storage.cfg << EOF > /dev/null

nfs: artifacts
        path /mnt/pve/artifacts
        server <NFS-SERVER-IP>
        export <NFS-SHARE-PATH>
        options vers=4,soft
        content iso,snippets
EOF

Ceph storage

On each node, install the required Ceph package. Select “y” when prompted

pveceph install --repository no-subscription --version reef

On the first node, initialize Ceph’s configuration

pveceph init --network 192.168.104.0/26

On each node, create a Ceph monitor and manager

pveceph mon create && pveceph mgr create

On each node, validate which available disks will be used for Ceph object storage

lsblk

Prepare the disks on all nodes by wiping them. This will destroy any data on the disks

ceph-volume lvm zap /dev/nvme0n1 --destroy
ceph-volume lvm zap /dev/nvme1n1 --destroy

Designate the disks to Ceph object storage

pveceph osd create /dev/nvme0n1
pveceph osd create /dev/nvme1n1

On the first node, create a Ceph storage pool and assign the available disks to it

pveceph pool create <pool-name> --add_storages

Wrap up

With those steps out of the way, point your web browser to https://:8006 and log in with the root credentials. This setup is excellent for creating virtual machines for further learning and development without worrying about the monthly cost of similar infrastructure in someone else’s data center.

Gerard Samuel

Home

About

Categories

Contact

Privacy

Credits