Migration: consolidate Echo6 docs to cortex with full infrastructure cleanup sync

- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup) - Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing - Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack - Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure - Removes 4 deprecated runbook duplicates (canonical versions live in projects/) - Adds .gitignore for binary archives and editor temp files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 06:02:16 +00:00 · 2026-04-13 06:02:16 +00:00 · e9231ac24a
commit e9231ac24a
parent 89834796ff
93 changed files with 51223 additions and 254 deletions
--- a/runbooks/peertube-remote-runner.md
+++ b/runbooks/peertube-remote-runner.md
@ -0,0 +1,458 @@
+# PeerTube Remote Runner — GPU Transcoding
+
+Deploy a PeerTube remote runner with NVENC GPU transcoding. The runner pulls jobs from PeerTube over WebSocket, transcodes with the GPU, and uploads HLS streams back.
+
+Use this when adding a new runner node, rebuilding an existing one, or re-registering after a PeerTube rebuild.
+
+---
+
+## Prerequisites
+
+- PeerTube instance running and accessible (HTTP, not necessarily HTTPS)
+- PeerTube admin credentials or API access to generate runner registration tokens
+- Target machine with:
+  - NVIDIA GPU with NVENC support (Maxwell gen 2+ / GTX 950+)
+  - NVIDIA drivers installed and working (`nvidia-smi` returns output)
+  - Node.js 18+ installed
+  - SSH access from CC host
+
+If the target machine needs NVIDIA drivers or Node.js, see `proxmox-create-ubuntu-vm.md` Steps 9 and 11.
+
+---
+
+## Inputs
+
+Prompt the user for all of these before executing:
+
+```
+RUNNER_HOST=            # SSH alias or IP for the runner machine (e.g., cortex)
+RUNNER_NAME=            # Human-readable runner name (e.g., "cortex-nvenc")
+RUNNER_USER=            # User to run the service as (e.g., "zvx")
+PT_URL=                 # PeerTube instance URL reachable from runner (e.g., "http://100.64.0.23:9000")
+PT_HOST_HEADER=         # PeerTube's public hostname for Host header (e.g., "stream.echo6.co")
+PT_ADMIN_USER=          # PeerTube admin username (e.g., "root")
+PT_ADMIN_PASS=          # PeerTube admin password
+INSTALL_DIR=            # Where to put runner config (default: /opt/peertube-runner)
+```
+
+---
+
+## Step 1: Verify GPU
+
+```bash
+ssh $RUNNER_HOST 'nvidia-smi --query-gpu=name,driver_version,memory.total,encoder.stats.sessionCount --format=csv,noheader'
+```
+
+### Gate
+
+Must return GPU name, driver version, and VRAM. If it fails:
+
+- No output → NVIDIA drivers not installed. See `proxmox-create-ubuntu-vm.md` Step 9.
+- "NVML: Driver/library version mismatch" → reboot the machine.
+- "No devices found" → GPU passthrough not configured (VMs) or hardware issue.
+
+Check NVENC specifically:
+
+```bash
+ssh $RUNNER_HOST 'nvidia-smi -q | grep -A 5 "Encoder"'
+```
+
+Must show encoder session info. If "N/A", the GPU doesn't support NVENC or drivers are too old.
+
+---
+
+## Step 2: Install peertube-runner
+
+PeerTube runner is distributed via npm.
+
+```bash
+ssh $RUNNER_HOST 'which node && node --version'  # Must be 18+
+```
+
+Install the runner:
+
+```bash
+ssh $RUNNER_HOST 'sudo npm install -g @peertube/peertube-runner'
+ssh $RUNNER_HOST 'which peertube-runner && peertube-runner --version'
+```
+
+### Gate
+
+`peertube-runner --version` must return a version number. If npm install fails, check Node.js version (must be 18+).
+
+Create config directory:
+
+```bash
+ssh $RUNNER_HOST "sudo mkdir -p $INSTALL_DIR && sudo chown $RUNNER_USER:$RUNNER_USER $INSTALL_DIR"
+```
+
+---
+
+## Step 3: Generate Registration Token on PeerTube
+
+Get a registration token from the PeerTube instance. This requires admin access.
+
+### Option A: Via API
+
+```bash
+# Get OAuth client credentials
+CLIENT_CREDS=$(ssh $RUNNER_HOST "curl -s $PT_URL/api/v1/oauth-clients/local -H 'Host: $PT_HOST_HEADER'")
+CLIENT_ID=$(echo "$CLIENT_CREDS" | jq -r '.client_id')
+CLIENT_SECRET=$(echo "$CLIENT_CREDS" | jq -r '.client_secret')
+
+# Get admin token
+TOKEN_RESP=$(ssh $RUNNER_HOST "curl -s $PT_URL/api/v1/users/token \
+  -H 'Host: $PT_HOST_HEADER' \
+  --data 'client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&grant_type=password&username=$PT_ADMIN_USER&password=$PT_ADMIN_PASS'")
+ACCESS_TOKEN=$(echo "$TOKEN_RESP" | jq -r '.access_token')
+
+# Generate runner registration token
+REG_TOKEN=$(ssh $RUNNER_HOST "curl -s -X POST $PT_URL/api/v1/runners/registration-tokens/generate \
+  -H 'Host: $PT_HOST_HEADER' \
+  -H 'Authorization: Bearer $ACCESS_TOKEN' | jq -r '.registrationToken'")
+
+echo "Registration token: $REG_TOKEN"
+```
+
+### Option B: Via PeerTube Admin UI
+
+1. Log into PeerTube as admin
+2. Administration → System → Runners
+3. Click "Generate registration token"
+4. Copy the token
+
+### Gate
+
+Must have a registration token string. If the API returns errors:
+
+- 401 → wrong admin credentials
+- 404 → PeerTube version too old (runners require v5.2+)
+- Connection refused → PeerTube not reachable from runner. Check URL and network.
+
+---
+
+## Step 4: Register Runner
+
+```bash
+ssh $RUNNER_HOST "peertube-runner register \
+  --url $PT_URL \
+  --registration-token $REG_TOKEN \
+  --runner-name $RUNNER_NAME"
+```
+
+### Gate
+
+Must complete without errors. Verify registration:
+
+```bash
+ssh $RUNNER_HOST 'peertube-runner list-registered'
+```
+
+Must show the PeerTube instance URL. If registration fails:
+
+- "Invalid registration token" → token already used or expired. Generate a new one.
+- "ECONNREFUSED" → runner can't reach PeerTube. Test: `curl -s $PT_URL/api/v1/config`
+- "self-signed certificate" → if PeerTube uses HTTPS with self-signed cert, use `NODE_TLS_REJECT_UNAUTHORIZED=0` (not recommended) or fix the cert.
+
+---
+
+## Step 5: Configure NVENC
+
+The runner auto-detects ffmpeg capabilities, but verify NVENC is available to ffmpeg:
+
+```bash
+ssh $RUNNER_HOST 'ffmpeg -encoders 2>/dev/null | grep nvenc'
+```
+
+Must show `h264_nvenc` and `hevc_nvenc`. If missing, install ffmpeg with NVENC support:
+
+```bash
+# Ubuntu/Debian — the default ffmpeg usually includes NVENC if drivers are installed
+ssh $RUNNER_HOST 'sudo apt install -y ffmpeg'
+
+# Re-check
+ssh $RUNNER_HOST 'ffmpeg -encoders 2>/dev/null | grep nvenc'
+```
+
+If still missing, the NVIDIA drivers may not include the encoding libraries. Install:
+
+```bash
+ssh $RUNNER_HOST 'sudo apt install -y libnvidia-encode-550'  # Match your driver version
+```
+
+---
+
+## Step 6: Create systemd Service
+
+```bash
+ssh $RUNNER_HOST "sudo tee /etc/systemd/system/peertube-runner.service > /dev/null << 'EOF'
+[Unit]
+Description=PeerTube Remote Runner (NVENC)
+After=network-online.target nvidia-persistenced.service
+Wants=network-online.target
+Requires=nvidia-persistenced.service
+
+[Service]
+Type=simple
+User=$RUNNER_USER
+Group=$RUNNER_USER
+Environment=NODE_ENV=production
+ExecStart=/usr/bin/peertube-runner server \
+  --enable-job vod-hls-transcoding \
+  --enable-job vod-audio-merge-transcoding \
+  --enable-job live-rtmp-hls-transcoding \
+  --enable-job video-studio-transcoding \
+  --enable-job video-transcription
+WorkingDirectory=/home/$RUNNER_USER
+Restart=always
+RestartSec=30
+StandardOutput=journal
+StandardError=journal
+SyslogIdentifier=peertube-runner
+MemoryMax=20G
+
+[Install]
+WantedBy=multi-user.target
+EOF"
+
+ssh $RUNNER_HOST 'sudo systemctl daemon-reload'
+ssh $RUNNER_HOST 'sudo systemctl enable peertube-runner'
+ssh $RUNNER_HOST 'sudo systemctl start peertube-runner'
+```
+
+### Gate
+
+```bash
+ssh $RUNNER_HOST 'systemctl is-active peertube-runner'
+```
+
+Must return `active`. If it fails, check logs:
+
+```bash
+ssh $RUNNER_HOST 'journalctl -u peertube-runner -n 50 --no-pager'
+```
+
+Common failures:
+- "Cannot find module" → peertube-runner not installed globally, or PATH issue. Check `which peertube-runner`.
+- "nvidia-persistenced.service not found" → remove the `Requires=` line if nvidia-persistenced isn't set up (it's optional but recommended).
+
+---
+
+## Step 7: Install Health Check
+
+Cron script that auto-restarts the runner if it crashes or the GPU becomes inaccessible.
+
+```bash
+ssh $RUNNER_HOST "sudo tee $INSTALL_DIR/health.sh > /dev/null << 'HEALTH'
+#!/bin/bash
+LOG_TAG=\"peertube-runner-health\"
+
+if ! systemctl is-active --quiet peertube-runner; then
+    logger -t \$LOG_TAG \"Runner not active, restarting...\"
+    systemctl restart peertube-runner
+    sleep 10
+fi
+
+if ! pgrep -f \"peertube-runner server\" > /dev/null; then
+    logger -t \$LOG_TAG \"Runner process not found, restarting service...\"
+    systemctl restart peertube-runner
+fi
+
+if ! nvidia-smi > /dev/null 2>&1; then
+    logger -t \$LOG_TAG \"GPU not accessible, restarting nvidia-persistenced and runner...\"
+    systemctl restart nvidia-persistenced 2>/dev/null
+    sleep 5
+    systemctl restart peertube-runner
+fi
+HEALTH
+chmod +x $INSTALL_DIR/health.sh"
+
+# Add cron job (every 5 minutes)
+ssh $RUNNER_HOST "(crontab -l 2>/dev/null | grep -v peertube-runner-health; echo '*/5 * * * * $INSTALL_DIR/health.sh') | crontab -"
+```
+
+Verify:
+
+```bash
+ssh $RUNNER_HOST 'crontab -l | grep peertube'
+```
+
+---
+
+## Step 8: Test Transcoding
+
+Upload a test video and verify the full pipeline works.
+
+### Quick test via API
+
+```bash
+# Download a short test video
+ssh $RUNNER_HOST 'curl -L -o /tmp/test-video.mp4 "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4" 2>/dev/null'
+
+# Upload to PeerTube (reuse ACCESS_TOKEN from Step 3)
+ssh $RUNNER_HOST "curl -s -X POST $PT_URL/api/v1/videos/upload \
+  -H 'Host: $PT_HOST_HEADER' \
+  -H 'Authorization: Bearer $ACCESS_TOKEN' \
+  -F 'videofile=@/tmp/test-video.mp4' \
+  -F 'name=Runner Test Video' \
+  -F 'channelId=1' \
+  -F 'privacy=1' \
+  -F 'waitTranscoding=true' | jq '{uuid, name, state}'"
+```
+
+### Verify GPU is processing
+
+Within 30 seconds of upload:
+
+```bash
+ssh $RUNNER_HOST 'nvidia-smi --query-gpu=utilization.gpu,utilization.encoder,temperature.gpu,power.draw --format=csv,noheader'
+```
+
+GPU utilization and encoder utilization should be non-zero. If encoder shows 0% but GPU shows activity, NVENC isn't being used — check ffmpeg encoder detection (Step 5).
+
+### Check runner logs
+
+```bash
+ssh $RUNNER_HOST 'journalctl -u peertube-runner -n 20 --no-pager | grep -i "transcod\|job\|error"'
+```
+
+Should show job pickup, transcoding progress, and completion.
+
+### Clean up
+
+```bash
+# Delete test video via API (optional)
+ssh $RUNNER_HOST "curl -s -X DELETE $PT_URL/api/v1/videos/<VIDEO_UUID> \
+  -H 'Host: $PT_HOST_HEADER' \
+  -H 'Authorization: Bearer $ACCESS_TOKEN'"
+
+ssh $RUNNER_HOST 'rm -f /tmp/test-video.mp4'
+```
+
+---
+
+## Verification Checklist
+
+```bash
+echo "=== PeerTube Runner Check ==="
+echo ""
+echo "GPU:          $(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null || echo 'MISSING')"
+echo "NVENC:        $(ffmpeg -encoders 2>/dev/null | grep -c h264_nvenc) encoders"
+echo "Runner ver:   $(peertube-runner --version 2>/dev/null || echo 'NOT INSTALLED')"
+echo "Registered:   $(peertube-runner list-registered 2>/dev/null | grep -c 'http' || echo '0') instance(s)"
+echo "Service:      $(systemctl is-active peertube-runner 2>/dev/null || echo 'NOT RUNNING')"
+echo "Health cron:  $(crontab -l 2>/dev/null | grep -c peertube-runner || echo '0') entries"
+echo "ffmpeg procs: $(pgrep -c ffmpeg 2>/dev/null || echo '0') active"
+echo "GPU util:     $(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader 2>/dev/null || echo 'N/A')"
+```
+
+---
+
+## Adding a Second PeerTube Instance
+
+To register the same runner with another PeerTube instance:
+
+```bash
+peertube-runner register \
+  --url <SECOND_PT_URL> \
+  --registration-token <TOKEN> \
+  --runner-name $RUNNER_NAME
+```
+
+The runner handles multiple registrations automatically — it polls all registered instances for jobs.
+
+## Unregistering
+
+```bash
+# List registrations
+peertube-runner list-registered
+
+# Unregister from a specific instance
+peertube-runner unregister --url $PT_URL
+```
+
+---
+
+## Troubleshooting
+
+### Runner picks up jobs but transcoding fails immediately
+
+Check ffmpeg NVENC access:
+
+```bash
+ffmpeg -y -f lavfi -i testsrc=duration=5:size=1280x720:rate=30 -c:v h264_nvenc /tmp/nvenc-test.mp4
+```
+
+If this fails, NVENC isn't accessible to ffmpeg. Common causes: wrong driver version, missing libnvidia-encode, or GPU in use by another process that's holding all NVENC sessions.
+
+### Runner shows 0 active jobs despite pending queue
+
+- WebSocket connection issue. Check: `journalctl -u peertube-runner | grep -i websocket`
+- Runner registered with wrong URL. Verify: `peertube-runner list-registered`
+- PeerTube remote runners not enabled. Check PeerTube config: `transcoding.remote_runners.enabled` must be `true`
+
+### GPU utilization stuck at 100% / NVENC sessions maxed
+
+The RTX A4000 supports ~3 simultaneous NVENC sessions (consumer cards are limited to 3, pro cards vary). If all sessions are in use, new jobs queue on the runner side. This is normal — throughput is limited by NVENC session count, not GPU compute.
+
+To increase throughput: patch the NVENC session limit (search "nvidia nvenc patch") or add a second runner node.
+
+### Runner keeps disconnecting / restarting
+
+- Check memory: `free -h`. The 20GB MemoryMax in the service file may be too low if processing many concurrent jobs. Increase if needed.
+- Check disk space: transcoding uses temp space. Ensure `/tmp` or the runner's working directory has sufficient free space (10GB+ recommended).
+- Network instability between runner and PeerTube. Use Tailscale IP instead of public URL for reliability.
+
+### After PeerTube rebuild, runner can't connect
+
+Registration is tied to the PeerTube instance. After a rebuild:
+
+1. Unregister: `peertube-runner unregister --url $PT_URL`
+2. Generate new registration token on the new PeerTube
+3. Re-register: `peertube-runner register --url $PT_URL --registration-token <NEW_TOKEN> --runner-name $RUNNER_NAME`
+4. Restart: `sudo systemctl restart peertube-runner`
+
+---
+
+## Quick Reference: Current Runners
+
+| Runner | Host | GPU | PeerTube Instance | Status |
+|--------|------|-----|-------------------|--------|
+| cortex-nvenc | cortex (VM 150 on TOC) | RTX A4000 16GB | stream.echo6.co (CT 110) | Active |
+
+---
+
+## Whisper Transcription Setup
+
+The runner also handles `video-transcription` jobs (auto-captioning via Whisper). A smart wrapper routes jobs based on audio duration:
+
+### Smart Wrapper (`/usr/local/bin/whisper-smart` on cortex)
+
+- **Model:** `medium` (good accuracy, fits in VRAM on float16)
+- **GPU path (< 1 hour):** `--device cuda --compute_type float16` — ~3.7GB VRAM, fast
+- **CPU path (>= 1 hour):** `--device cpu --compute_type int8` — ~8-11GB RAM, slow but avoids VRAM exhaustion
+- **CPU serialization:** `flock --nonblock /tmp/whisper-cpu.lock` — only one CPU transcription at a time. If lock is held, wrapper exits 1 and the runner retries the job later.
+- **Concurrency:** Runner config set to `concurrency = 2` — allows one GPU + one CPU job in parallel
+
+### Symlink chain
+
+```
+/usr/local/bin/whisper-ctranslate2 → /usr/local/bin/whisper-smart (the smart wrapper)
+/usr/local/bin/whisper-ctranslate2-real → /home/zvx/.local/bin/whisper-ctranslate2 (actual Python binary)
+```
+
+### Key config
+
+- **Runner config:** `~/.config/peertube-runner-nodejs/default/config.toml` — `model = "medium"`, `concurrency = 2`
+- **PeerTube config:** `/var/www/peertube/config/production.yaml` on CT 110 — `model-name: medium`
+- **systemd MemoryMax:** `20G` (CPU int8 medium model peaks at ~11GB)
+
+### Wrapper log
+
+```bash
+tail -f /tmp/whisper-wrapper.log   # Shows mode (GPU/CPU/CPU-BLOCKED), duration, args
+```
+
+---
+
+*Last updated: 2026-02-17 — Added Whisper smart transcription setup, MemoryMax 12G→20G, concurrency 1→2*