echo6-docs/runbooks/proxmox-onboard-node.md
Matt Johnson e9231ac24a Migration: consolidate Echo6 docs to cortex with full infrastructure cleanup sync
- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup)
- Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing
- Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack
- Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure
- Removes 4 deprecated runbook duplicates (canonical versions live in projects/)
- Adds .gitignore for binary archives and editor temp files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 06:02:16 +00:00

5.2 KiB

Runbook: Onboard a Proxmox Node

You install Proxmox. You give CC an IP and a root password. CC does the rest.


Current Cluster

Alias Local IP Tailscale IP
data 192.168.1.240 100.64.0.20
utility 192.168.1.241 100.64.0.19
cloud 192.168.1.242 100.64.0.22
media 192.168.1.243 100.64.0.21

Management host: cortex


Inputs

NODE_IP=                # e.g. 192.168.1.244
NODE_ALIAS=             # e.g. storage (lowercase, no dots)
ROOT_PASS=              # root password for initial key copy

Phase 1: SSH Access

Nothing works without this.

# Ensure sshpass is installed
which sshpass || sudo apt install -y sshpass

# Test access immediately
sshpass -p "$ROOT_PASS" ssh \
  -o StrictHostKeyChecking=accept-new \
  -o IdentitiesOnly=yes \
  -o PreferredAuthentications=password \
  root@$NODE_IP 'hostname'

Gate

Must return the hostname. Stop if this fails.

Add host alias

# Ensure ~/.ssh/config has global defaults (idempotent)
grep -q "IdentitiesOnly yes" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << 'EOF'

Host *
    IdentitiesOnly yes
    StrictHostKeyChecking accept-new
    ConnectTimeout 10
    ServerAliveInterval 30
    ServerAliveCountMax 3
EOF

# Add alias (idempotent)
grep -q "Host $NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF

Host $NODE_ALIAS
    HostName $NODE_IP
    User root
EOF

Optional: Set up key auth

Eliminates the need for sshpass on every command to this node.

ls ~/.ssh/id_ed25519 || ssh-keygen -t ed25519 -C "cortex" -N "" -f ~/.ssh/id_ed25519

sshpass -p "$ROOT_PASS" ssh-copy-id \
  -o StrictHostKeyChecking=accept-new \
  -o IdentitiesOnly=yes \
  -o PreferredAuthentications=password \
  root@$NODE_IP

# Verify key auth works (no password)
ssh $NODE_ALIAS 'hostname'

How CC connects for the rest of this runbook

If key auth is set up:

ssh $NODE_ALIAS '<command>'

If not:

sshpass -p "$ROOT_PASS" ssh $NODE_ALIAS '<command>'

Phase 2: Base Configuration

ssh $NODE_ALIAS 'apt update && apt dist-upgrade -y'
ssh $NODE_ALIAS 'timedatectl set-timezone America/Boise'
ssh $NODE_ALIAS 'timedatectl status | grep -i sync'

# Disable enterprise repo
ssh $NODE_ALIAS 'sed -i "s/^deb/# deb/" /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null; true'

# Add no-subscription repo
ssh $NODE_ALIAS 'grep -q "pve-no-subscription" /etc/apt/sources.list.d/pve-no-subscription.list 2>/dev/null || \
  echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list'

Phase 3: Tailscale

ssh $NODE_ALIAS 'curl -fsSL https://tailscale.com/install.sh | sh'
ssh $NODE_ALIAS 'tailscale up --login-server=https://<HEADSCALE_URL> --auth-key=<PREAUTH_KEY>'

# Get Tailscale IP and add alias
TSIP=$(ssh $NODE_ALIAS 'tailscale ip -4')
echo "Tailscale IP: $TSIP"

grep -q "Host ts-$NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF

Host ts-$NODE_ALIAS
    HostName $TSIP
    User root
EOF

ssh ts-$NODE_ALIAS 'hostname'

Phase 4: Verify Cluster Membership

You join the node to the cluster. CC verifies it's there.

ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep "Cluster Member"'
ssh data 'pvecm nodes'

If not in the cluster yet, stop and tell the user. Do not run pvecm add.


Phase 5: Verify

# Authentik SSO (syncs via cluster)
ssh $NODE_ALIAS 'pveum realm list | grep authentik'

# Storage
ssh $NODE_ALIAS 'pvesm status'
ssh $NODE_ALIAS 'lsblk && echo "---" && vgs && lvs'

Phase 6: Update Inventory

Add to CLAUDE.md cluster table:

| <NODE_ALIAS> | <NODE_IP> | <TSIP> |

Update any hardcoded node lists:

  • proxmox-audit.sh (NODES array)
  • Monitoring/backup targets

Final Verification

Every line must say OK.

echo "=== $NODE_ALIAS ==="
echo -n "SSH (local):     "; ssh $NODE_ALIAS 'echo OK' 2>&1
echo -n "SSH (tailscale): "; ssh ts-$NODE_ALIAS 'echo OK' 2>&1
echo -n "Cluster:         "; ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep -q "Cluster Member: Yes" && echo OK || echo FAIL'
echo -n "Tailscale:       "; ssh $NODE_ALIAS 'tailscale status --self >/dev/null 2>&1 && echo OK || echo FAIL'
echo -n "OIDC realm:      "; ssh $NODE_ALIAS 'pveum realm list 2>/dev/null | grep -q authentik && echo OK || echo FAIL'
echo -n "Storage:         "; ssh $NODE_ALIAS 'pvesm status >/dev/null 2>&1 && echo OK || echo FAIL'
echo -n "PVE version:     "; ssh $NODE_ALIAS 'pveversion'
echo -n "Time sync:       "; ssh $NODE_ALIAS 'timedatectl show -p NTPSynchronized --value'

Troubleshooting

"Too many authentication failures" IdentitiesOnly yes missing from Host * in ~/.ssh/config.

sshpass "Permission denied" Add -o PreferredAuthentications=password -o IdentitiesOnly=yes.

Cluster join corosync errors Check /etc/hosts on all nodes includes the new hostname and IP.

Authentik realm missing Check systemctl status pve-cluster. Realm syncs via pmxcfs in /etc/pve/domains.cfg.

Can't migrate VMs to node Storage mismatch. Compare pvesm status on both nodes.