- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup) - Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing - Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack - Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure - Removes 4 deprecated runbook duplicates (canonical versions live in projects/) - Adds .gitignore for binary archives and editor temp files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
215 lines
5.2 KiB
Markdown
215 lines
5.2 KiB
Markdown
# Runbook: Onboard a Proxmox Node
|
|
|
|
You install Proxmox. You give CC an IP and a root password. CC does the rest.
|
|
|
|
---
|
|
|
|
## Current Cluster
|
|
|
|
| Alias | Local IP | Tailscale IP |
|
|
|----------|-----------------|-----------------|
|
|
| data | 192.168.1.240 | 100.64.0.20 |
|
|
| utility | 192.168.1.241 | 100.64.0.19 |
|
|
| cloud | 192.168.1.242 | 100.64.0.22 |
|
|
| media | 192.168.1.243 | 100.64.0.21 |
|
|
|
|
Management host: **cortex**
|
|
|
|
---
|
|
|
|
## Inputs
|
|
|
|
```
|
|
NODE_IP= # e.g. 192.168.1.244
|
|
NODE_ALIAS= # e.g. storage (lowercase, no dots)
|
|
ROOT_PASS= # root password for initial key copy
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 1: SSH Access
|
|
|
|
Nothing works without this.
|
|
|
|
```bash
|
|
# Ensure sshpass is installed
|
|
which sshpass || sudo apt install -y sshpass
|
|
|
|
# Test access immediately
|
|
sshpass -p "$ROOT_PASS" ssh \
|
|
-o StrictHostKeyChecking=accept-new \
|
|
-o IdentitiesOnly=yes \
|
|
-o PreferredAuthentications=password \
|
|
root@$NODE_IP 'hostname'
|
|
```
|
|
|
|
### Gate
|
|
|
|
Must return the hostname. **Stop if this fails.**
|
|
|
|
### Add host alias
|
|
|
|
```bash
|
|
# Ensure ~/.ssh/config has global defaults (idempotent)
|
|
grep -q "IdentitiesOnly yes" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << 'EOF'
|
|
|
|
Host *
|
|
IdentitiesOnly yes
|
|
StrictHostKeyChecking accept-new
|
|
ConnectTimeout 10
|
|
ServerAliveInterval 30
|
|
ServerAliveCountMax 3
|
|
EOF
|
|
|
|
# Add alias (idempotent)
|
|
grep -q "Host $NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF
|
|
|
|
Host $NODE_ALIAS
|
|
HostName $NODE_IP
|
|
User root
|
|
EOF
|
|
```
|
|
|
|
### Optional: Set up key auth
|
|
|
|
Eliminates the need for sshpass on every command to this node.
|
|
|
|
```bash
|
|
ls ~/.ssh/id_ed25519 || ssh-keygen -t ed25519 -C "cortex" -N "" -f ~/.ssh/id_ed25519
|
|
|
|
sshpass -p "$ROOT_PASS" ssh-copy-id \
|
|
-o StrictHostKeyChecking=accept-new \
|
|
-o IdentitiesOnly=yes \
|
|
-o PreferredAuthentications=password \
|
|
root@$NODE_IP
|
|
|
|
# Verify key auth works (no password)
|
|
ssh $NODE_ALIAS 'hostname'
|
|
```
|
|
|
|
### How CC connects for the rest of this runbook
|
|
|
|
If key auth is set up:
|
|
```bash
|
|
ssh $NODE_ALIAS '<command>'
|
|
```
|
|
|
|
If not:
|
|
```bash
|
|
sshpass -p "$ROOT_PASS" ssh $NODE_ALIAS '<command>'
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 2: Base Configuration
|
|
|
|
```bash
|
|
ssh $NODE_ALIAS 'apt update && apt dist-upgrade -y'
|
|
ssh $NODE_ALIAS 'timedatectl set-timezone America/Boise'
|
|
ssh $NODE_ALIAS 'timedatectl status | grep -i sync'
|
|
|
|
# Disable enterprise repo
|
|
ssh $NODE_ALIAS 'sed -i "s/^deb/# deb/" /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null; true'
|
|
|
|
# Add no-subscription repo
|
|
ssh $NODE_ALIAS 'grep -q "pve-no-subscription" /etc/apt/sources.list.d/pve-no-subscription.list 2>/dev/null || \
|
|
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list'
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 3: Tailscale
|
|
|
|
```bash
|
|
ssh $NODE_ALIAS 'curl -fsSL https://tailscale.com/install.sh | sh'
|
|
ssh $NODE_ALIAS 'tailscale up --login-server=https://<HEADSCALE_URL> --auth-key=<PREAUTH_KEY>'
|
|
|
|
# Get Tailscale IP and add alias
|
|
TSIP=$(ssh $NODE_ALIAS 'tailscale ip -4')
|
|
echo "Tailscale IP: $TSIP"
|
|
|
|
grep -q "Host ts-$NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF
|
|
|
|
Host ts-$NODE_ALIAS
|
|
HostName $TSIP
|
|
User root
|
|
EOF
|
|
|
|
ssh ts-$NODE_ALIAS 'hostname'
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 4: Verify Cluster Membership
|
|
|
|
You join the node to the cluster. CC verifies it's there.
|
|
|
|
```bash
|
|
ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep "Cluster Member"'
|
|
ssh data 'pvecm nodes'
|
|
```
|
|
|
|
If not in the cluster yet, **stop and tell the user**. Do not run `pvecm add`.
|
|
|
|
---
|
|
|
|
## Phase 5: Verify
|
|
|
|
```bash
|
|
# Authentik SSO (syncs via cluster)
|
|
ssh $NODE_ALIAS 'pveum realm list | grep authentik'
|
|
|
|
# Storage
|
|
ssh $NODE_ALIAS 'pvesm status'
|
|
ssh $NODE_ALIAS 'lsblk && echo "---" && vgs && lvs'
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 6: Update Inventory
|
|
|
|
Add to CLAUDE.md cluster table:
|
|
```
|
|
| <NODE_ALIAS> | <NODE_IP> | <TSIP> |
|
|
```
|
|
|
|
Update any hardcoded node lists:
|
|
- proxmox-audit.sh (NODES array)
|
|
- Monitoring/backup targets
|
|
|
|
---
|
|
|
|
## Final Verification
|
|
|
|
Every line must say OK.
|
|
|
|
```bash
|
|
echo "=== $NODE_ALIAS ==="
|
|
echo -n "SSH (local): "; ssh $NODE_ALIAS 'echo OK' 2>&1
|
|
echo -n "SSH (tailscale): "; ssh ts-$NODE_ALIAS 'echo OK' 2>&1
|
|
echo -n "Cluster: "; ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep -q "Cluster Member: Yes" && echo OK || echo FAIL'
|
|
echo -n "Tailscale: "; ssh $NODE_ALIAS 'tailscale status --self >/dev/null 2>&1 && echo OK || echo FAIL'
|
|
echo -n "OIDC realm: "; ssh $NODE_ALIAS 'pveum realm list 2>/dev/null | grep -q authentik && echo OK || echo FAIL'
|
|
echo -n "Storage: "; ssh $NODE_ALIAS 'pvesm status >/dev/null 2>&1 && echo OK || echo FAIL'
|
|
echo -n "PVE version: "; ssh $NODE_ALIAS 'pveversion'
|
|
echo -n "Time sync: "; ssh $NODE_ALIAS 'timedatectl show -p NTPSynchronized --value'
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
**"Too many authentication failures"**
|
|
`IdentitiesOnly yes` missing from `Host *` in `~/.ssh/config`.
|
|
|
|
**sshpass "Permission denied"**
|
|
Add `-o PreferredAuthentications=password -o IdentitiesOnly=yes`.
|
|
|
|
**Cluster join corosync errors**
|
|
Check `/etc/hosts` on all nodes includes the new hostname and IP.
|
|
|
|
**Authentik realm missing**
|
|
Check `systemctl status pve-cluster`. Realm syncs via pmxcfs in `/etc/pve/domains.cfg`.
|
|
|
|
**Can't migrate VMs to node**
|
|
Storage mismatch. Compare `pvesm status` on both nodes.
|