Migration: consolidate Echo6 docs to cortex with full infrastructure cleanup sync
- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup) - Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing - Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack - Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure - Removes 4 deprecated runbook duplicates (canonical versions live in projects/) - Adds .gitignore for binary archives and editor temp files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
89834796ff
commit
e9231ac24a
93 changed files with 51223 additions and 254 deletions
215
runbooks/proxmox-onboard-node.md
Normal file
215
runbooks/proxmox-onboard-node.md
Normal file
|
|
@ -0,0 +1,215 @@
|
|||
# Runbook: Onboard a Proxmox Node
|
||||
|
||||
You install Proxmox. You give CC an IP and a root password. CC does the rest.
|
||||
|
||||
---
|
||||
|
||||
## Current Cluster
|
||||
|
||||
| Alias | Local IP | Tailscale IP |
|
||||
|----------|-----------------|-----------------|
|
||||
| data | 192.168.1.240 | 100.64.0.20 |
|
||||
| utility | 192.168.1.241 | 100.64.0.19 |
|
||||
| cloud | 192.168.1.242 | 100.64.0.22 |
|
||||
| media | 192.168.1.243 | 100.64.0.21 |
|
||||
|
||||
Management host: **cortex**
|
||||
|
||||
---
|
||||
|
||||
## Inputs
|
||||
|
||||
```
|
||||
NODE_IP= # e.g. 192.168.1.244
|
||||
NODE_ALIAS= # e.g. storage (lowercase, no dots)
|
||||
ROOT_PASS= # root password for initial key copy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: SSH Access
|
||||
|
||||
Nothing works without this.
|
||||
|
||||
```bash
|
||||
# Ensure sshpass is installed
|
||||
which sshpass || sudo apt install -y sshpass
|
||||
|
||||
# Test access immediately
|
||||
sshpass -p "$ROOT_PASS" ssh \
|
||||
-o StrictHostKeyChecking=accept-new \
|
||||
-o IdentitiesOnly=yes \
|
||||
-o PreferredAuthentications=password \
|
||||
root@$NODE_IP 'hostname'
|
||||
```
|
||||
|
||||
### Gate
|
||||
|
||||
Must return the hostname. **Stop if this fails.**
|
||||
|
||||
### Add host alias
|
||||
|
||||
```bash
|
||||
# Ensure ~/.ssh/config has global defaults (idempotent)
|
||||
grep -q "IdentitiesOnly yes" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << 'EOF'
|
||||
|
||||
Host *
|
||||
IdentitiesOnly yes
|
||||
StrictHostKeyChecking accept-new
|
||||
ConnectTimeout 10
|
||||
ServerAliveInterval 30
|
||||
ServerAliveCountMax 3
|
||||
EOF
|
||||
|
||||
# Add alias (idempotent)
|
||||
grep -q "Host $NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF
|
||||
|
||||
Host $NODE_ALIAS
|
||||
HostName $NODE_IP
|
||||
User root
|
||||
EOF
|
||||
```
|
||||
|
||||
### Optional: Set up key auth
|
||||
|
||||
Eliminates the need for sshpass on every command to this node.
|
||||
|
||||
```bash
|
||||
ls ~/.ssh/id_ed25519 || ssh-keygen -t ed25519 -C "cortex" -N "" -f ~/.ssh/id_ed25519
|
||||
|
||||
sshpass -p "$ROOT_PASS" ssh-copy-id \
|
||||
-o StrictHostKeyChecking=accept-new \
|
||||
-o IdentitiesOnly=yes \
|
||||
-o PreferredAuthentications=password \
|
||||
root@$NODE_IP
|
||||
|
||||
# Verify key auth works (no password)
|
||||
ssh $NODE_ALIAS 'hostname'
|
||||
```
|
||||
|
||||
### How CC connects for the rest of this runbook
|
||||
|
||||
If key auth is set up:
|
||||
```bash
|
||||
ssh $NODE_ALIAS '<command>'
|
||||
```
|
||||
|
||||
If not:
|
||||
```bash
|
||||
sshpass -p "$ROOT_PASS" ssh $NODE_ALIAS '<command>'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Base Configuration
|
||||
|
||||
```bash
|
||||
ssh $NODE_ALIAS 'apt update && apt dist-upgrade -y'
|
||||
ssh $NODE_ALIAS 'timedatectl set-timezone America/Boise'
|
||||
ssh $NODE_ALIAS 'timedatectl status | grep -i sync'
|
||||
|
||||
# Disable enterprise repo
|
||||
ssh $NODE_ALIAS 'sed -i "s/^deb/# deb/" /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null; true'
|
||||
|
||||
# Add no-subscription repo
|
||||
ssh $NODE_ALIAS 'grep -q "pve-no-subscription" /etc/apt/sources.list.d/pve-no-subscription.list 2>/dev/null || \
|
||||
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Tailscale
|
||||
|
||||
```bash
|
||||
ssh $NODE_ALIAS 'curl -fsSL https://tailscale.com/install.sh | sh'
|
||||
ssh $NODE_ALIAS 'tailscale up --login-server=https://<HEADSCALE_URL> --auth-key=<PREAUTH_KEY>'
|
||||
|
||||
# Get Tailscale IP and add alias
|
||||
TSIP=$(ssh $NODE_ALIAS 'tailscale ip -4')
|
||||
echo "Tailscale IP: $TSIP"
|
||||
|
||||
grep -q "Host ts-$NODE_ALIAS$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << EOF
|
||||
|
||||
Host ts-$NODE_ALIAS
|
||||
HostName $TSIP
|
||||
User root
|
||||
EOF
|
||||
|
||||
ssh ts-$NODE_ALIAS 'hostname'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Verify Cluster Membership
|
||||
|
||||
You join the node to the cluster. CC verifies it's there.
|
||||
|
||||
```bash
|
||||
ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep "Cluster Member"'
|
||||
ssh data 'pvecm nodes'
|
||||
```
|
||||
|
||||
If not in the cluster yet, **stop and tell the user**. Do not run `pvecm add`.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Verify
|
||||
|
||||
```bash
|
||||
# Authentik SSO (syncs via cluster)
|
||||
ssh $NODE_ALIAS 'pveum realm list | grep authentik'
|
||||
|
||||
# Storage
|
||||
ssh $NODE_ALIAS 'pvesm status'
|
||||
ssh $NODE_ALIAS 'lsblk && echo "---" && vgs && lvs'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Update Inventory
|
||||
|
||||
Add to CLAUDE.md cluster table:
|
||||
```
|
||||
| <NODE_ALIAS> | <NODE_IP> | <TSIP> |
|
||||
```
|
||||
|
||||
Update any hardcoded node lists:
|
||||
- proxmox-audit.sh (NODES array)
|
||||
- Monitoring/backup targets
|
||||
|
||||
---
|
||||
|
||||
## Final Verification
|
||||
|
||||
Every line must say OK.
|
||||
|
||||
```bash
|
||||
echo "=== $NODE_ALIAS ==="
|
||||
echo -n "SSH (local): "; ssh $NODE_ALIAS 'echo OK' 2>&1
|
||||
echo -n "SSH (tailscale): "; ssh ts-$NODE_ALIAS 'echo OK' 2>&1
|
||||
echo -n "Cluster: "; ssh $NODE_ALIAS 'pvecm status 2>/dev/null | grep -q "Cluster Member: Yes" && echo OK || echo FAIL'
|
||||
echo -n "Tailscale: "; ssh $NODE_ALIAS 'tailscale status --self >/dev/null 2>&1 && echo OK || echo FAIL'
|
||||
echo -n "OIDC realm: "; ssh $NODE_ALIAS 'pveum realm list 2>/dev/null | grep -q authentik && echo OK || echo FAIL'
|
||||
echo -n "Storage: "; ssh $NODE_ALIAS 'pvesm status >/dev/null 2>&1 && echo OK || echo FAIL'
|
||||
echo -n "PVE version: "; ssh $NODE_ALIAS 'pveversion'
|
||||
echo -n "Time sync: "; ssh $NODE_ALIAS 'timedatectl show -p NTPSynchronized --value'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Too many authentication failures"**
|
||||
`IdentitiesOnly yes` missing from `Host *` in `~/.ssh/config`.
|
||||
|
||||
**sshpass "Permission denied"**
|
||||
Add `-o PreferredAuthentications=password -o IdentitiesOnly=yes`.
|
||||
|
||||
**Cluster join corosync errors**
|
||||
Check `/etc/hosts` on all nodes includes the new hostname and IP.
|
||||
|
||||
**Authentik realm missing**
|
||||
Check `systemctl status pve-cluster`. Realm syncs via pmxcfs in `/etc/pve/domains.cfg`.
|
||||
|
||||
**Can't migrate VMs to node**
|
||||
Storage mismatch. Compare `pvesm status` on both nodes.
|
||||
Loading…
Add table
Add a link
Reference in a new issue