echo6-docs/projects/cc-deploy-watchtower-v2.md

223 lines
6.2 KiB
Markdown
Raw Normal View History

# Deploy WATCHTOWER v2 — Modular Ops Dashboard
**Context:** CC runs on cortex. WATCHTOWER deploys to Contabo (100.64.0.1). The tarball is at `/home/zvx/projects/contabo/watchtower/watchtower-v2.tar.gz` on cortex. This runbook is at `/home/zvx/.ref/projects/` on cortex.
WATCHTOWER v2 is a modular FastAPI monitoring dashboard. Collectors are auto-discovered from `app/collectors/` and enabled via `{NAME}_ENABLED=true` in `.env`. Adding new monitoring targets requires zero edits to existing files.
## Pre-flight: Transfer tarball and SSH check
```bash
# SCP tarball from cortex (this machine) to Contabo
scp /home/zvx/projects/contabo/watchtower/watchtower-v2.tar.gz zvx@100.64.0.1:/tmp/
# Verify sshpass is installed on Contabo
ssh zvx@100.64.0.1 "which sshpass || sudo apt-get install -y sshpass"
# Test SSH from Contabo to each monitored node
ssh zvx@100.64.0.1 << 'SSHEOF'
echo "=== PeerTube (100.64.0.23) ==="
sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=no zvx@100.64.0.23 "hostname && echo OK" 2>&1
echo "=== cortex/GPU (100.64.0.14) ==="
sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=no zvx@100.64.0.14 "hostname && echo OK" 2>&1
SSHEOF
```
If either SSH fails, stop and report the error. Do not proceed without working SSH to at least one target.
---
## Phase 1: Deploy codebase
All remaining commands run on Contabo. SSH in:
```bash
ssh zvx@100.64.0.1
```
Then:
```bash
# Clean any old install
sudo rm -rf /opt/watchtower
# Extract v2 tarball
sudo tar xzf /tmp/watchtower-v2.tar.gz -C /opt/
sudo mv /opt/watchtower-v2 /opt/watchtower
sudo chown -R $USER:$USER /opt/watchtower
cd /opt/watchtower
```
### Create .env from example
```bash
cp .env.example .env
```
The defaults in `.env.example` are already set to the correct current values:
| Target | IP | User | Notes |
|--------|-----|------|-------|
| GPU (cortex) | 100.64.0.14 | zvx | nvidia-smi |
| PeerTube | 100.64.0.23 | zvx | Native PostgreSQL (`peertube_prod`), pipeline at `/opt/bulk-import/` |
| RECON | disabled | — | Flip `RECON_ENABLED=true` when rebuilt |
### Verify PeerTube PostgreSQL access
PostgreSQL runs natively on the PeerTube CT (not in Docker). Verify:
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "sudo -u postgres psql -d peertube_prod -t -A -c 'SELECT COUNT(*) FROM video;'"
```
Should return the video count (e.g., 207). If it errors, the DB name may be different — check with:
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "sudo -u postgres psql -l"
```
Update `PT_DB_NAME` in `.env` if needed.
### Verify bulk-import pipeline paths
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "ls -la /opt/bulk-import/ 2>/dev/null && wc -l /opt/bulk-import/downloaded.txt 2>/dev/null || echo 'PATH NOT FOUND'"
```
---
## Phase 2: Build and start
```bash
cd /opt/watchtower
docker compose up -d --build
# Wait for startup then check logs
sleep 5
docker logs watchtower 2>&1 | tail -30
```
### Expected log output
```
WATCHTOWER starting up...
Database connected: /data/watchtower.db
[registry] Loaded collector: gpu (GPU (cortex))
[registry] Loaded collector: peertube (PeerTube Ingest)
[registry] Skipped collector: recon (RECON_ENABLED=false)
[registry] 2 collector(s) active: ['gpu', 'peertube']
[gpu] collector starting (interval: 60s)
[peertube] collector starting (interval: 60s)
```
### Verify collectors
```bash
# Wait for first poll cycle
sleep 65
echo "=== Health ==="
curl -s http://localhost:8084/api/health | python3 -m json.tool
echo "=== Collector Manifest ==="
curl -s http://localhost:8084/api/collectors | python3 -m json.tool
echo "=== GPU Data ==="
curl -s http://localhost:8084/api/c/gpu | python3 -m json.tool
echo "=== PeerTube Data ==="
curl -s http://localhost:8084/api/c/peertube | python3 -m json.tool
```
### ⛔ STOP — Report collector status
Tell me:
1. Which collectors show `"online": true`
2. Any errors from the logs or API responses
3. The PeerTube DB name if it wasn't `peertube_prod`
Do not proceed to Phase 3 until collectors are confirmed.
---
## Phase 3: Public access (Caddy + Authentik)
### Check DNS
```bash
dig +short wt.echo6.co
```
If it doesn't resolve, report that — DNS record needs to be added manually.
### Check/deploy Caddy config
Caddy is at 100.64.0.8 on the mesh.
```bash
echo "=== Check existing config ==="
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "cat ~/docker/caddy/sites/wt.echo6.co* 2>/dev/null || echo 'NO CONFIG FOUND'"
echo "=== Check Caddy is running ==="
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "docker ps --format '{{.Names}}' | grep -i caddy"
```
If no config exists, create it:
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "cat > ~/docker/caddy/sites/wt.echo6.co.caddy << 'CADDYEOF'
wt.echo6.co {
forward_auth localhost:9000 {
uri /outpost.goauthentik.io/auth/caddy
copy_headers X-Authentik-Username X-Authentik-Groups X-Authentik-Email X-Authentik-Name X-Authentik-Uid
trusted_proxies private_ranges
}
reverse_proxy 100.64.0.1:8084
}
CADDYEOF"
```
If config already exists, verify the `reverse_proxy` line points to `100.64.0.1:8084` (Contabo's current Tailscale IP). If it still says `100.64.0.6`, fix it:
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "sed -i 's/100.64.0.6:8084/100.64.0.1:8084/' ~/docker/caddy/sites/wt.echo6.co.caddy"
```
### Reload Caddy
```bash
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "docker exec caddy caddy reload --config /etc/caddy/Caddyfile"
```
### Test
```bash
curl -sI https://wt.echo6.co 2>&1 | head -10
```
Should get 302 redirect to Authentik or 200 if authenticated.
---
## Post-deploy: How updates work
Code is volume-mounted from `/opt/watchtower/app/` into the container on Contabo. To update:
```bash
ssh zvx@100.64.0.1
cd /opt/watchtower
# Edit files or git pull
docker restart watchtower
```
No rebuild needed for code changes. Only rebuild (`docker compose up -d --build`) if `requirements.txt` or `Dockerfile` changes.
## Post-deploy: Adding a new collector
1. Copy `app/collectors/_example.py` to `app/collectors/myservice.py`
2. Edit the class: set `name`, `display_name`, implement `fetch()`
3. Add to `.env`: `MYSERVICE_ENABLED=true` plus any config vars
4. `docker restart watchtower`
The frontend auto-discovers the new panel. No HTML/JS/route edits needed.