223 lines
6.2 KiB
Markdown
223 lines
6.2 KiB
Markdown
|
|
# Deploy WATCHTOWER v2 — Modular Ops Dashboard
|
||
|
|
|
||
|
|
**Context:** CC runs on cortex. WATCHTOWER deploys to Contabo (100.64.0.1). The tarball is at `/home/zvx/projects/contabo/watchtower/watchtower-v2.tar.gz` on cortex. This runbook is at `/home/zvx/.ref/projects/` on cortex.
|
||
|
|
|
||
|
|
WATCHTOWER v2 is a modular FastAPI monitoring dashboard. Collectors are auto-discovered from `app/collectors/` and enabled via `{NAME}_ENABLED=true` in `.env`. Adding new monitoring targets requires zero edits to existing files.
|
||
|
|
|
||
|
|
## Pre-flight: Transfer tarball and SSH check
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# SCP tarball from cortex (this machine) to Contabo
|
||
|
|
scp /home/zvx/projects/contabo/watchtower/watchtower-v2.tar.gz zvx@100.64.0.1:/tmp/
|
||
|
|
|
||
|
|
# Verify sshpass is installed on Contabo
|
||
|
|
ssh zvx@100.64.0.1 "which sshpass || sudo apt-get install -y sshpass"
|
||
|
|
|
||
|
|
# Test SSH from Contabo to each monitored node
|
||
|
|
ssh zvx@100.64.0.1 << 'SSHEOF'
|
||
|
|
echo "=== PeerTube (100.64.0.23) ==="
|
||
|
|
sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=no zvx@100.64.0.23 "hostname && echo OK" 2>&1
|
||
|
|
|
||
|
|
echo "=== cortex/GPU (100.64.0.14) ==="
|
||
|
|
sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=no zvx@100.64.0.14 "hostname && echo OK" 2>&1
|
||
|
|
SSHEOF
|
||
|
|
```
|
||
|
|
|
||
|
|
If either SSH fails, stop and report the error. Do not proceed without working SSH to at least one target.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 1: Deploy codebase
|
||
|
|
|
||
|
|
All remaining commands run on Contabo. SSH in:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ssh zvx@100.64.0.1
|
||
|
|
```
|
||
|
|
|
||
|
|
Then:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clean any old install
|
||
|
|
sudo rm -rf /opt/watchtower
|
||
|
|
|
||
|
|
# Extract v2 tarball
|
||
|
|
sudo tar xzf /tmp/watchtower-v2.tar.gz -C /opt/
|
||
|
|
sudo mv /opt/watchtower-v2 /opt/watchtower
|
||
|
|
sudo chown -R $USER:$USER /opt/watchtower
|
||
|
|
|
||
|
|
cd /opt/watchtower
|
||
|
|
```
|
||
|
|
|
||
|
|
### Create .env from example
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cp .env.example .env
|
||
|
|
```
|
||
|
|
|
||
|
|
The defaults in `.env.example` are already set to the correct current values:
|
||
|
|
|
||
|
|
| Target | IP | User | Notes |
|
||
|
|
|--------|-----|------|-------|
|
||
|
|
| GPU (cortex) | 100.64.0.14 | zvx | nvidia-smi |
|
||
|
|
| PeerTube | 100.64.0.23 | zvx | Native PostgreSQL (`peertube_prod`), pipeline at `/opt/bulk-import/` |
|
||
|
|
| RECON | disabled | — | Flip `RECON_ENABLED=true` when rebuilt |
|
||
|
|
|
||
|
|
### Verify PeerTube PostgreSQL access
|
||
|
|
|
||
|
|
PostgreSQL runs natively on the PeerTube CT (not in Docker). Verify:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "sudo -u postgres psql -d peertube_prod -t -A -c 'SELECT COUNT(*) FROM video;'"
|
||
|
|
```
|
||
|
|
|
||
|
|
Should return the video count (e.g., 207). If it errors, the DB name may be different — check with:
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "sudo -u postgres psql -l"
|
||
|
|
```
|
||
|
|
|
||
|
|
Update `PT_DB_NAME` in `.env` if needed.
|
||
|
|
|
||
|
|
### Verify bulk-import pipeline paths
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.23 "ls -la /opt/bulk-import/ 2>/dev/null && wc -l /opt/bulk-import/downloaded.txt 2>/dev/null || echo 'PATH NOT FOUND'"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 2: Build and start
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd /opt/watchtower
|
||
|
|
|
||
|
|
docker compose up -d --build
|
||
|
|
|
||
|
|
# Wait for startup then check logs
|
||
|
|
sleep 5
|
||
|
|
docker logs watchtower 2>&1 | tail -30
|
||
|
|
```
|
||
|
|
|
||
|
|
### Expected log output
|
||
|
|
|
||
|
|
```
|
||
|
|
WATCHTOWER starting up...
|
||
|
|
Database connected: /data/watchtower.db
|
||
|
|
[registry] Loaded collector: gpu (GPU (cortex))
|
||
|
|
[registry] Loaded collector: peertube (PeerTube Ingest)
|
||
|
|
[registry] Skipped collector: recon (RECON_ENABLED=false)
|
||
|
|
[registry] 2 collector(s) active: ['gpu', 'peertube']
|
||
|
|
[gpu] collector starting (interval: 60s)
|
||
|
|
[peertube] collector starting (interval: 60s)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Verify collectors
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Wait for first poll cycle
|
||
|
|
sleep 65
|
||
|
|
|
||
|
|
echo "=== Health ==="
|
||
|
|
curl -s http://localhost:8084/api/health | python3 -m json.tool
|
||
|
|
|
||
|
|
echo "=== Collector Manifest ==="
|
||
|
|
curl -s http://localhost:8084/api/collectors | python3 -m json.tool
|
||
|
|
|
||
|
|
echo "=== GPU Data ==="
|
||
|
|
curl -s http://localhost:8084/api/c/gpu | python3 -m json.tool
|
||
|
|
|
||
|
|
echo "=== PeerTube Data ==="
|
||
|
|
curl -s http://localhost:8084/api/c/peertube | python3 -m json.tool
|
||
|
|
```
|
||
|
|
|
||
|
|
### ⛔ STOP — Report collector status
|
||
|
|
|
||
|
|
Tell me:
|
||
|
|
1. Which collectors show `"online": true`
|
||
|
|
2. Any errors from the logs or API responses
|
||
|
|
3. The PeerTube DB name if it wasn't `peertube_prod`
|
||
|
|
|
||
|
|
Do not proceed to Phase 3 until collectors are confirmed.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 3: Public access (Caddy + Authentik)
|
||
|
|
|
||
|
|
### Check DNS
|
||
|
|
|
||
|
|
```bash
|
||
|
|
dig +short wt.echo6.co
|
||
|
|
```
|
||
|
|
|
||
|
|
If it doesn't resolve, report that — DNS record needs to be added manually.
|
||
|
|
|
||
|
|
### Check/deploy Caddy config
|
||
|
|
|
||
|
|
Caddy is at 100.64.0.8 on the mesh.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
echo "=== Check existing config ==="
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "cat ~/docker/caddy/sites/wt.echo6.co* 2>/dev/null || echo 'NO CONFIG FOUND'"
|
||
|
|
|
||
|
|
echo "=== Check Caddy is running ==="
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "docker ps --format '{{.Names}}' | grep -i caddy"
|
||
|
|
```
|
||
|
|
|
||
|
|
If no config exists, create it:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "cat > ~/docker/caddy/sites/wt.echo6.co.caddy << 'CADDYEOF'
|
||
|
|
wt.echo6.co {
|
||
|
|
forward_auth localhost:9000 {
|
||
|
|
uri /outpost.goauthentik.io/auth/caddy
|
||
|
|
copy_headers X-Authentik-Username X-Authentik-Groups X-Authentik-Email X-Authentik-Name X-Authentik-Uid
|
||
|
|
trusted_proxies private_ranges
|
||
|
|
}
|
||
|
|
reverse_proxy 100.64.0.1:8084
|
||
|
|
}
|
||
|
|
CADDYEOF"
|
||
|
|
```
|
||
|
|
|
||
|
|
If config already exists, verify the `reverse_proxy` line points to `100.64.0.1:8084` (Contabo's current Tailscale IP). If it still says `100.64.0.6`, fix it:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "sed -i 's/100.64.0.6:8084/100.64.0.1:8084/' ~/docker/caddy/sites/wt.echo6.co.caddy"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Reload Caddy
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sshpass -p '7redditGold' ssh zvx@100.64.0.8 "docker exec caddy caddy reload --config /etc/caddy/Caddyfile"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -sI https://wt.echo6.co 2>&1 | head -10
|
||
|
|
```
|
||
|
|
|
||
|
|
Should get 302 redirect to Authentik or 200 if authenticated.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Post-deploy: How updates work
|
||
|
|
|
||
|
|
Code is volume-mounted from `/opt/watchtower/app/` into the container on Contabo. To update:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
ssh zvx@100.64.0.1
|
||
|
|
cd /opt/watchtower
|
||
|
|
# Edit files or git pull
|
||
|
|
docker restart watchtower
|
||
|
|
```
|
||
|
|
|
||
|
|
No rebuild needed for code changes. Only rebuild (`docker compose up -d --build`) if `requirements.txt` or `Dockerfile` changes.
|
||
|
|
|
||
|
|
## Post-deploy: Adding a new collector
|
||
|
|
|
||
|
|
1. Copy `app/collectors/_example.py` to `app/collectors/myservice.py`
|
||
|
|
2. Edit the class: set `name`, `display_name`, implement `fetch()`
|
||
|
|
3. Add to `.env`: `MYSERVICE_ENABLED=true` plus any config vars
|
||
|
|
4. `docker restart watchtower`
|
||
|
|
|
||
|
|
The frontend auto-discovers the new panel. No HTML/JS/route edits needed.
|