# Runbook: Onboard a Proxmox Node You install Proxmox. You give CC an IP and a root password. CC does the rest. --- ## Current Cluster | Alias | Local IP | Tailscale IP | |----------|-----------------|-----------------| | data | 192.168.1.240 | 100.64.0.20 | | utility | 192.168.1.241 | 100.64.0.19 | | cloud | 192.168.1.242 | 100.64.0.22 | | media | 192.168.1.243 | 100.64.0.21 | Management host: **cortex** --- ## Inputs ``` NODE_IP= # e.g. 192.168.1.244 NODE_ALIAS= # e.g. storage (lowercase, no dots) ROOT_PASS= # root password for initial key copy ``` --- ## Phase 1: SSH Access Nothing works without this. ```bash # Ensure sshpass is installed which sshpass || sudo apt install -y sshpass # Test access immediately sshpass -p "$ROOT_PASS" ssh \ -o StrictHostKeyChecking=accept-new \ -o IdentitiesOnly=yes \ -o PreferredAuthentications=password \ root@$NODE_IP 'hostname' ``` ### Gate Must return the hostname. **Stop if this fails.** ### Add host alias ```bash # Ensure ~/.ssh/config has global defaults (idempotent) grep -q "IdentitiesOnly yes" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << 'EOF' Host * IdentitiesOnly yes StrictHostKeyChecking accept-new ConnectTimeout 10 ServerAliveInterval 30 ServerAliveCountMax 3 EOF # Add alias (idempotent) grep -q "Host $NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF Host $NODE_ALIAS HostName $NODE_IP User root EOF ``` ### Optional: Set up key auth Eliminates the need for sshpass on every command to this node. ```bash ls ~/.ssh/id_ed25519 || ssh-keygen -t ed25519 -C "cortex" -N "" -f ~/.ssh/id_ed25519 sshpass -p "$ROOT_PASS" ssh-copy-id \ -o StrictHostKeyChecking=accept-new \ -o IdentitiesOnly=yes \ -o PreferredAuthentications=password \ root@$NODE_IP # Verify key auth works (no password) ssh $NODE_ALIAS 'hostname' ``` ### How CC connects for the rest of this runbook If key auth is set up: ```bash ssh $NODE_ALIAS '' ``` If not: ```bash sshpass -p "$ROOT_PASS" ssh $NODE_ALIAS '' ``` --- ## Phase 2: Base Configuration ```bash ssh $NODE_ALIAS 'apt update && apt dist-upgrade -y' ssh $NODE_ALIAS 'timedatectl set-timezone America/Boise' ssh $NODE_ALIAS 'timedatectl status | grep -i sync' # Disable enterprise repo ssh $NODE_ALIAS 'sed -i "s/^deb/# deb/" /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null; true' # Add no-subscription repo ssh $NODE_ALIAS 'grep -q "pve-no-subscription" /etc/apt/sources.list.d/pve-no-subscription.list 2>/dev/null || \ echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list' ``` --- ## Phase 3: Tailscale ```bash ssh $NODE_ALIAS 'curl -fsSL https://tailscale.com/install.sh | sh' ssh $NODE_ALIAS 'tailscale up --login-server=https:// --auth-key=' # Get Tailscale IP and add alias TSIP=$(ssh $NODE_ALIAS 'tailscale ip -4') echo "Tailscale IP: $TSIP" grep -q "Host ts-$NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF Host ts-$NODE_ALIAS HostName $TSIP User root EOF ssh ts-$NODE_ALIAS 'hostname' ``` --- ## Phase 4: Verify Cluster Membership You join the node to the cluster. CC verifies it's there. ```bash ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep "Cluster Member"' ssh data 'pvecm nodes' ``` If not in the cluster yet, **stop and tell the user**. Do not run `pvecm add`. --- ## Phase 5: Verify ```bash # Authentik SSO (syncs via cluster) ssh $NODE_ALIAS 'pveum realm list | grep authentik' # Storage ssh $NODE_ALIAS 'pvesm status' ssh $NODE_ALIAS 'lsblk && echo "---" && vgs && lvs' ``` --- ## Phase 6: Update Inventory Add to CLAUDE.md cluster table: ``` | | | | ``` Update any hardcoded node lists: - proxmox-audit.sh (NODES array) - Monitoring/backup targets --- ## Final Verification Every line must say OK. ```bash echo "=== $NODE_ALIAS ===" echo -n "SSH (local): "; ssh $NODE_ALIAS 'echo OK' 2>&1 echo -n "SSH (tailscale): "; ssh ts-$NODE_ALIAS 'echo OK' 2>&1 echo -n "Cluster: "; ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep -q "Cluster Member: Yes" && echo OK || echo FAIL' echo -n "Tailscale: "; ssh $NODE_ALIAS 'tailscale status --self >/dev/null 2>&1 && echo OK || echo FAIL' echo -n "OIDC realm: "; ssh $NODE_ALIAS 'pveum realm list 2>/dev/null | grep -q authentik && echo OK || echo FAIL' echo -n "Storage: "; ssh $NODE_ALIAS 'pvesm status >/dev/null 2>&1 && echo OK || echo FAIL' echo -n "PVE version: "; ssh $NODE_ALIAS 'pveversion' echo -n "Time sync: "; ssh $NODE_ALIAS 'timedatectl show -p NTPSynchronized --value' ``` --- ## Troubleshooting **"Too many authentication failures"** `IdentitiesOnly yes` missing from `Host *` in `~/.ssh/config`. **sshpass "Permission denied"** Add `-o PreferredAuthentications=password -o IdentitiesOnly=yes`. **Cluster join corosync errors** Check `/etc/hosts` on all nodes includes the new hostname and IP. **Authentik realm missing** Check `systemctl status pve-cluster`. Realm syncs via pmxcfs in `/etc/pve/domains.cfg`. **Can't migrate VMs to node** Storage mismatch. Compare `pvesm status` on both nodes.