echo6-docs/projects/peertube-phase2-project.md

797 lines
34 KiB
Markdown
Raw Normal View History

# Project: PeerTube Phase 2 — Import Pipeline Build
**Goal:** Build a complete YouTube download → local import → GPU transcode pipeline for 99 channels (~70K+ videos, ~15.3TB) on a fresh PeerTube v8 instance. Clean slate — no legacy code, no old pipeline files. Build it right from scratch.
**CC Host:** cortex (SSH to all nodes via aliases in ~/.ssh/config; Proxmox nodes use sshpass auth)
---
## SSH Prerequisites — RUN FIRST
**Every CC session must verify SSH connectivity before executing any remote commands. Never assume SSH works.**
### Verify cortex → CT 110 (PeerTube)
```bash
# CT 110 uses sshpass auth (same as all LXCs). Check ~/.ssh/config for alias.
# Try alias first, fall back to IP:
ssh -o ConnectTimeout=5 peertube 'hostname' 2>/dev/null \
|| sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 zvx@192.168.1.170 'hostname'
```
### Verify cortex → media node (Proxmox host, for pct commands if needed)
```bash
sshpass -p '7redditGold' ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 root@192.168.1.243 'hostname'
```
### Gate
Both must return hostnames. **Stop and fix SSH before proceeding with ANY step.**
If aliases don't exist in `~/.ssh/config`, add them:
```bash
grep -q "Host peertube$" ~/.ssh/config 2>/dev/null || cat >> ~/.ssh/config << 'EOF'
Host peertube
HostName 192.168.1.170
User zvx
EOF
```
Note: Most pipeline work runs as the `peertube` user inside CT 110. SSH in as zvx, then `sudo -u peertube` or `sudo su - peertube` as needed.
---
## Runbook References
These runbooks live in `~/runbooks/` on cortex. Call them by name when their scope applies:
| Runbook | When to Use in Phase 2 |
|---------|----------------------|
| **`nordvpn-lxc.md`** | **Step 3 — RUN THIS RUNBOOK.** VPN setup on CT 110 with TUN device, NordVPN/WireGuard, split tunneling, rotation script |
| **`peertube-remote-runner.md`** | **ACTIVE — used for video-transcription (Whisper captioning).** Runner on cortex handles auto-captioning with smart GPU/CPU routing. Not used for H.265 video transcoding (pipeline handles that). See runbook for Whisper setup details. |
| `ct-runbook.md` | If CT 110 needs additional packages or baseline changes (provisioned in Phase 1 — reference only) |
| `expose-service-home.md` | stream.echo6.co is already exposed (Phase 1). Reference only if Caddy/DNS/cert issues arise |
| `authentik-oidc-application.md` | PeerTube OIDC already configured (Phase 1). Reference only if SSO breaks |
| `pi-nas-omv-runbook.md` | If NFS storage issues arise (mount problems, permissions, OMV config) |
| `proxmox-onboard-node.md` | SSH access patterns — the Phase 1 prereq pattern above follows this runbook's conventions |
| `proxmox-create-ubuntu-vm.md` | If cortex needs modifications (GPU passthrough, NVIDIA drivers, Docker). Reference only |
**Not applicable to Phase 2:** idahomesh-*, meshmonitor-*, meshtasticd-* runbooks.
---
## Infrastructure (Read-Only Context — Do Not Modify)
### PeerTube Instance
- **CT 110** on **media** node (Proxmox)
- Local IP: 192.168.1.170
- Tailscale IP: 100.64.0.23
- OS: Debian 12, privileged LXC
- PeerTube v8 — **native install** (NOT Docker). No `docker exec` for anything.
- Runs as user: `peertube`
- PostgreSQL: local, accessible via `sudo -u postgres psql peertube_prod` or `sudo -u peertube psql peertube_prod`
- Redis: local
- Nginx: local (port 80), proxied through Caddy on utility node
- Domain: stream.echo6.co
- NFS storage: 18TB from pi-nas (192.168.1.245) mounted at `/var/www/peertube/storage/`
- NFS export path: `/srv/dev-disk-by-uuid-822575b9-1549-4aab-823e-8160d2aa7c68/peertube/`
- PeerTube config: `/var/www/peertube/config/local-production.json` (v8 uses JSON, not YAML)
- PeerTube base dir: `/var/www/peertube/`
- Built-in channel sync: DISABLED (bulk pipeline handles imports)
- Signup: disabled (Authentik SSO only)
### GPU Pre-Transcoding (H.265 via NVENC)
- **cortex** — VM on TOC node, RTX A4000 GPU passthrough
- cortex is also the CC host and runs Ollama/Aurora
- NVENC is separate silicon from CUDA — transcoding won't conflict with LLM inference
- **PeerTube's built-in transcoding is DISABLED** — remote runners ignore transcoding plugins, so there's no way to get H.265 through the runner pipeline
- Instead: a `transcoder.py` service on cortex pulls downloaded videos from CT 110, re-encodes to H.265 with `hevc_nvenc`, pushes back. The importer then uploads already-transcoded files to PeerTube with `waitTranscoding=false`
- Target: H.265, 1080p only, single file per video (no HLS adaptive — LAN/Tailscale viewers don't need it)
- ffmpeg command: `ffmpeg -i input.mp4 -c:v hevc_nvenc -preset medium -cq 28 -c:a aac -b:a 128k output.mp4`
- File transfer: cortex pulls from CT 110 via rsync/SSH, transcodes locally to avoid NFS latency on GPU work, pushes result back
### Runner Service (ACTIVE — video-transcription/captioning)
Runner on cortex handles Whisper auto-captioning. Also registered for VOD transcoding jobs but H.265 video transcoding goes through the pipeline transcoder instead.
```ini
[Unit]
Description=PeerTube Remote Runner (NVENC)
After=network-online.target nvidia-persistenced.service
Wants=network-online.target
Requires=nvidia-persistenced.service
[Service]
Type=simple
User=zvx
Group=zvx
Environment=NODE_ENV=production
Environment=PATH=/opt/peertube-runner/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ExecStart=/usr/bin/peertube-runner server --enable-job vod-hls-transcoding --enable-job vod-audio-merge-transcoding --enable-job live-rtmp-hls-transcoding --enable-job video-studio-transcoding --enable-job video-transcription
WorkingDirectory=/home/zvx
Restart=always
RestartSec=30
StandardOutput=journal
StandardError=journal
SyslogIdentifier=peertube-runner
MemoryMax=20G
[Install]
WantedBy=multi-user.target
```
**Whisper config:** Smart wrapper at `/usr/local/bin/whisper-smart` routes <1hr to GPU (CUDA float16), >=1hr to CPU (int8). CPU jobs serialized via flock. Runner concurrency=2 (1 GPU + 1 CPU in parallel). Model: medium. See `peertube-remote-runner.md` for full details.
### Recovered Runner Health Script
```bash
#!/bin/bash
LOG_TAG="peertube-runner-health"
if ! systemctl is-active --quiet peertube-runner; then
logger -t $LOG_TAG "Runner not active, restarting..."
systemctl restart peertube-runner
sleep 10
fi
if ! pgrep -f "peertube-runner server" > /dev/null; then
logger -t $LOG_TAG "Runner process not found, restarting service..."
systemctl restart peertube-runner
fi
if ! nvidia-smi > /dev/null 2>&1; then
logger -t $LOG_TAG "GPU not accessible, restarting nvidia-persistenced and runner..."
systemctl restart nvidia-persistenced
sleep 5
systemctl restart peertube-runner
fi
```
### SSH / Access
- cortex → CT 110: `ssh peertube` or `ssh root@192.168.1.170` (check ~/.ssh/config)
- cortex → Proxmox nodes: uses sshpass (aliases in ~/.ssh/config)
- CT 110 user for pipeline: `peertube` (same user that runs the PeerTube process)
### VPN
- NordVPN account exists, needs fresh setup on CT 110
- LXC may not support NordVPN CLI (systemd issues) — WireGuard configs as fallback
- Rotation countries: US, CA, UK, DE, NL, SE
- Split tunnel / killswitch off so PeerTube stays accessible locally
---
## Channel Map — The 99 Channels
### Recovered Schema (from old WATCHTOWER add_channel.py)
```json
{
"category": "Tactical/SUT",
"channel_name": "(YT)Garand Thumb",
"actor_name": "garand-thumb",
"youtube_url": "https://www.youtube.com/@GarandThumb",
"youtube_channel_id": null,
"peertube_channel_id": null,
"video_count": 0,
"priority": "H",
"est_videos": 500,
"est_gb": 98
}
```
### Recovered Slug Function
```python
import re
def slugify_channel(name):
"""Convert channel name to PeerTube-safe actor_name."""
name = re.sub(r'^\(YT\)\s*', '', name)
slug = re.sub(r'[^a-z0-9]+', '-', name.lower()).strip('-')
return slug[:50] or 'channel'
```
### Known YouTube URLs (from old PeerTube sync records — 24 channels)
These 24 channels had active sync records with confirmed YouTube URLs:
```
Essential Craftsman → @essentialcraftsman
CommsPrepper → @CommsPrepper
Steven Lavimoniere → @StevenLavimoniere
Andreas Spiess → @AndreasSpiess
Mustie1 → @mustie1
Donyboy73 → @Donyboy73
Turn a Wood Bowl → @TurnaWoodBowl
RoseRed Homestead → @RoseRedHomestead
Homesteading Family → @HomesteadingFamily
My Self Reliance → @MySelfReliance
RegisteredNurseRN → @RegisteredNurseRN
Skinny Medic → @SkinnyMedic
Marine X → @MarineX
Plumberparts → @plumberparts
MedCram → @Medcram
City Prepping → @CityPrepping
Paul Kirtley → @PaulKirtley
Armando Hasudungan → playlist?list=UUesNt4_Z-Pm41RzpAClfVcg
Self Sufficient Me → @Selfsufficientme
Taryl Fixes All → @TarylFixesAll
Engineer775 → @engineer775
WeberAuto → @WeberAuto
Sun Knudsen → @sunknudsen
Master Your Medics → @MasterYourMedics
MCQBushcraft → @MCQBushcraft
ChrisFix → @ChrisFix
```
### The 99 Channels (Finalized Feb 2026)
#### OPSEC / Privacy (6)
| Channel | Priority | Notes |
|---------|----------|-------|
| Michael Bazzell / IntelTechniques | H | OSINT + digital privacy, ex-FBI |
| The Hated One | H | Privacy advocacy, surveillance deep-dives |
| Mental Outlaw | H | Linux + privacy + infosec news |
| Naomi Brockwell TV | M | Privacy-focused tech |
| Techlore | M | Privacy tools and comparisons |
| Sun Knudsen | M | Step-by-step privacy hardening |
#### Physical Security (2)
| Channel | Priority | Notes |
|---------|----------|-------|
| Deviant Ollam | H | Physical penetration testing, lock bypass |
| BosnianBill | M | Lock picking, physical security analysis |
#### Intelligence / OSINT (4)
| Channel | Priority | Notes |
|---------|----------|-------|
| OSINT Dojo | H | OSINT methodology training |
| Benjamin Strick | H | Professional OSINT investigations |
| OSINT Curious | M | OSINT tools and techniques |
| S2 Underground | H | Threat intel, analysis tradecraft |
#### Cybersecurity (7)
| Channel | Priority | Notes |
|---------|----------|-------|
| John Hammond | H | CTF walkthroughs, malware analysis |
| IppSec | H | HackTheBox walkthroughs |
| LiveOverflow | H | Binary exploitation, web security |
| Professor Messer | M | CompTIA certification training |
| The Cyber Mentor | M | Ethical hacking courses |
| Hak5 | M | Hacking tools and techniques |
| David Bombal | M | Networking + cybersecurity |
#### Tactical / SUT (6)
| Channel | Priority | Notes |
|---------|----------|-------|
| Garand Thumb | H | Tactics, gear testing, NV |
| Dirty Civilian | H | SUT for civilians |
| One Shepherd | H | Former SOF, tactical training |
| Brent0331 | H | USMC veteran, tactical analysis |
| Brass Facts | M | Firearms philosophy, gear testing |
| Sage Dynamics | H | Research-based torture tests |
#### Firearms (8)
| Channel | Priority | Notes |
|---------|----------|-------|
| Forgotten Weapons | H | Historical + technical firearms (largest channel, ~3K videos) |
| Paul Harrell | H | Terminal ballistics, practical shooting |
| 9-Hole Reviews | M | Precision rifle, historical accuracy |
| Lucky Gunner | M | Ammo testing, concealed carry |
| C&Rsenal | M | WWI/WWII firearms deep-dives |
| Jerry Miculek | M | Speed shooting, competition |
| InRangeTV | M | Firearms + mud tests |
| Hickok45 | M | Reviews + shooting demonstrations |
#### Comms / Signals (7)
| Channel | Priority | Notes |
|---------|----------|-------|
| OH8STN | H | Off-grid digital comms, Winlink |
| Andreas Spiess | H | Electronics + LoRa + radio |
| Ham Radio Crash Course | H | Amateur radio training |
| Tech Minds | M | SDR, radio tech |
| The Comms Channel | M | Comms gear and planning |
| KM4ACK | H | Build-a-Pi, ham radio software |
| Signals Everywhere | M | SDR + spectrum analysis |
#### Medical (5)
| Channel | Priority | Notes |
|---------|----------|-------|
| PrepMedic | H | Flight paramedic, trauma care |
| Skinny Medic | H | IFAK, trauma kits |
| MedWild | H | Wilderness medicine |
| Crisis Medicine | H | Former 18D SF Medic, TCCC |
| Ninja Nerd | H | Comprehensive physiology/pathology |
#### Linux / Infrastructure (6)
| Channel | Priority | Notes |
|---------|----------|-------|
| Lawrence Systems | H | Enterprise networking + Linux |
| Learn Linux TV | H | Linux tutorials and homelab |
| Jeff Geerling | H | Raspberry Pi, Ansible, self-hosting |
| Techno Tim | M | Homelab, Docker, Kubernetes |
| Level1Techs | M | Hardware + Linux deep-dives |
| Wolfgang's Channel | M | Self-hosting, privacy infra |
#### Hardware / Electronics (4)
| Channel | Priority | Notes |
|---------|----------|-------|
| Ben Eater | H | Computer architecture from scratch |
| EEVblog | H | Electronics engineering |
| GreatScott! | M | Electronics projects |
| Big Clive | M | Electronics teardowns |
#### Auto / Mechanical (7)
| Channel | Priority | Notes |
|---------|----------|-------|
| ChrisFix | H | DIY auto repair fundamentals |
| Mustie1 | H | Dead machinery resurrection |
| South Main Auto | H | Diagnostic logic |
| 1A Auto | H | Make/model/year repair encyclopedia (~4,500 videos) |
| Pine Hollow Auto Diagnostics | M | Advanced diagnostics |
| ScannerDanner | M | Master electrical diagnostics |
| Diesel Creek | M | Heavy equipment repair |
#### Construction / Trades (7)
| Channel | Priority | Notes |
|---------|----------|-------|
| Essential Craftsman | H | Construction + life skills |
| Matt Risinger | H | Building science |
| Mike Haduck Masonry | M | Foundations, concrete, stone |
| Awesome Framers | M | Structural framing |
| This Old House | M | Home renovation |
| Electrician U | M | Electrical trade training |
| Got2Learn | M | Plumbing/electrical tutorials |
#### Welding / Fabrication (3)
| Channel | Priority | Notes |
|---------|----------|-------|
| Welding Tips and Tricks | H | Welding instruction |
| ChuckE2009 | M | Welding + fabrication |
| Paul Sellers | H | Hand tool woodworking master |
#### Sustainment / Fieldcraft (2)
| Channel | Priority | Notes |
|---------|----------|-------|
| Corporals Corner | H | Field skills, shelter, fire |
| Gray Bearded Green Beret | H | SF wilderness medicine + fieldcraft |
#### Homesteading / Production (8)
| Channel | Priority | Notes |
|---------|----------|-------|
| City Prepping | H | Urban/suburban preparedness |
| My Self Reliance | H | Off-grid building |
| Engineer775 | H | Off-grid power systems |
| Project Farm | H | Tool and product testing |
| Will Prowse / DIY Solar Power | H | Solar power systems |
| Townsends | M | 18th century skills + cooking |
| RoseRed Homestead | M | Homesteading skills |
| The Urban Prepper | M | Urban preparedness, modular bags |
#### Preparedness (1)
| Channel | Priority | Notes |
|---------|----------|-------|
| The Provident Prepper | M | Preparedness planning methodology |
#### Energy / Alt-Fuel (1)
| Channel | Priority | Notes |
|---------|----------|-------|
| Adeptus Beta | M | Wood gasification (~7GB, tiny) |
#### Education / STEM (6)
| Channel | Priority | Notes |
|---------|----------|-------|
| Practical Engineering | H | Civil engineering with demos |
| Real Engineering | M | Aerospace, energy, transport |
| The Efficient Engineer | M | Core engineering fundamentals |
| NurdRage | M | Chemistry experiments |
| NileRed | M | Chemistry deep-dives |
| Veritasium | M | Science + engineering |
#### Education / Math (2)
| Channel | Priority | Notes |
|---------|----------|-------|
| Professor Leonard | H | Full calculus + stats lectures |
| Organic Chemistry Tutor | M | Math + science tutorials |
#### Education / CS (2)
| Channel | Priority | Notes |
|---------|----------|-------|
| Computerphile | H | Crypto, networking theory, security concepts |
| MIT Missing Semester | M | Shell, git, dev tools (tiny, ~50 videos) |
#### Small Engine (1)
| Channel | Priority | Notes |
|---------|----------|-------|
| Donyboy73 | M | Small engine repair |
#### Woodworking (1)
| Channel | Priority | Notes |
|---------|----------|-------|
| Steve Ramsey | M | Beginner woodworking |
#### Home Repair (2)
| Channel | Priority | Notes |
|---------|----------|-------|
| Home RenoVision DIY | M | Home repair tutorials |
| Roger Wakefield | M | Plumbing |
#### Bushcraft (1)
| Channel | Priority | Notes |
|---------|----------|-------|
| Joe Robinet | M | Bushcraft and camping |
**Total: 99 channels across 20 categories**
---
## Execution Steps
### Step 1: Channel Map Generation
**Where:** CT 110
**What:** Build `/opt/bulk-import/config/channel-map.json`
**SSH Gate:** `ssh peertube 'hostname'` must succeed before proceeding.
1. Create directory structure:
```bash
# Scripts and config on local disk
mkdir -p /opt/bulk-import/{config,logs}
chown -R peertube:peertube /opt/bulk-import
# Video data on NFS (18TB pi-nas mount)
mkdir -p /var/www/peertube/storage/pipeline/{staging,completed,transcoded,failed}
chown -R peertube:peertube /var/www/peertube/storage/pipeline
# Symlink data dirs so scripts use /opt/bulk-import/ paths
ln -sfn /var/www/peertube/storage/pipeline/staging /opt/bulk-import/staging
ln -sfn /var/www/peertube/storage/pipeline/completed /opt/bulk-import/completed
ln -sfn /var/www/peertube/storage/pipeline/transcoded /opt/bulk-import/transcoded
ln -sfn /var/www/peertube/storage/pipeline/failed /opt/bulk-import/failed
```
2. For each of the 99 channels:
- Look up the actual YouTube channel URL (use `yt-dlp --print channel_url --playlist-items 1 --skip-download "https://www.youtube.com/@ChannelHandle"` for any that need verification)
- Generate `actor_name` via slugify
- Write to channel-map.json
3. Use the 24 known URLs from old sync records as a head start. The remaining 75 need URL resolution.
**⚠️ This step requires yt-dlp installed and working on CT 110. If yt-dlp isn't installed yet, install it first:**
```bash
curl -L https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /usr/local/bin/yt-dlp
chmod +x /usr/local/bin/yt-dlp
```
**⚠️ YouTube may rate-limit channel lookups. Space requests 2-3 seconds apart. If rate-limited, use cookies or VPN.**
### Step 2: PeerTube Channel Creation
**Where:** CT 110
**What:** Batch-create all 99 channels via PeerTube API
**SSH Gate:** `ssh peertube 'curl -s http://localhost:9000/api/v1/config | head -c 50'` — must return JSON. Confirms both SSH and PeerTube are up.
1. Get OAuth token from PeerTube API (local, port 9000):
```bash
# Get client credentials
curl -s http://localhost:9000/api/v1/oauth-clients/local -H "Host: stream.echo6.co"
# Get user token
curl -s http://localhost:9000/api/v1/users/token \
-H "Host: stream.echo6.co" \
--data "client_id=<CLIENT_ID>&client_secret=<CLIENT_SECRET>&grant_type=password&username=root&password=<PASSWORD>"
```
2. For each channel in channel-map.json:
```bash
curl -s -X POST http://localhost:9000/api/v1/video-channels \
-H "Host: stream.echo6.co" \
-H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d '{"name": "<actor_name>", "displayName": "(YT)<channel_name>", "description": "Imported from YouTube: <youtube_url>"}'
```
3. Capture the returned channel ID and update `peertube_channel_id` in channel-map.json
4. Verify: `curl -s http://localhost:9000/api/v1/video-channels -H "Host: stream.echo6.co" | python3 -m json.tool | grep -c '"name"'` should return 99 (plus the default channel)
### Step 3: NordVPN Setup
**Where:** CT 110
**What:** Install VPN for IP rotation during YouTube downloads
**SSH Gate:** `ssh peertube 'hostname'` must succeed.
**➡️ RUN RUNBOOK: `~/runbooks/nordvpn-lxc.md`**
Use these inputs:
```
CTID=110
CT_HOST=peertube
PVE_HOST=media # or root@192.168.1.243
NORDVPN_TOKEN= # ⚠️ Get from Matt
VPN_COUNTRIES="United_States,Canada,United_Kingdom,Germany,Netherlands,Sweden"
VPN_CONFIG_DIR=/opt/bulk-import/config/vpn
```
**Additional context for this deployment:**
- CT 110 runs PeerTube on port 9000 — split tunneling is MANDATORY so PeerTube stays reachable on 192.168.1.170 and 100.64.0.23 while VPN is active
- The rotation script at `/opt/bulk-import/config/vpn/vpn-rotate.sh` will be called by `downloader.py` (Step 5) on rate-limit detection
- After runbook completes, verify PeerTube still accessible: `curl -s http://192.168.1.170:9000/api/v1/config | head -c 50` (from another machine, while VPN is up on CT 110)
**⚠️ NordVPN token required from Matt. Cannot proceed without it.**
### Step 4: YouTube Cookies
**Where:** CT 110
**What:** Export browser cookies for yt-dlp bot detection bypass
1. Matt exports cookies from browser (Netscape format) using "Get cookies.txt LOCALLY" extension
2. SCP to CT 110: `scp cookies.txt root@192.168.1.170:/opt/bulk-import/config/cookies.txt`
3. Fix perms: `chown peertube:peertube /opt/bulk-import/config/cookies.txt && chmod 600 /opt/bulk-import/config/cookies.txt`
4. Test: `sudo -u peertube yt-dlp --cookies /opt/bulk-import/config/cookies.txt --simulate "https://www.youtube.com/watch?v=dQw4w9WgXcQ"`
**⚠️ Cookies expire every 2-4 weeks. Needs manual refresh.**
### Step 5: Build downloader.py
**Where:** CT 110 at `/opt/bulk-import/downloader.py`
**What:** Round-robin YouTube channel downloader with VPN rotation
**Deploy:** Write file locally on cortex, then `scp` to CT 110. Or write directly via `ssh peertube 'cat > /opt/bulk-import/downloader.py << "PYEOF" ... PYEOF'`
**SSH Gate:** `ssh peertube 'ls /opt/bulk-import/config/channel-map.json'` — channel map must exist (Step 1 complete).
Requirements:
- Round-robin across all 99 channels (don't hammer one channel)
- yt-dlp with: `--cookies`, `--download-archive downloaded.txt` (dedup), `--write-info-json`, `--write-thumbnail`, `--format "bestvideo[height<=1080]+bestaudio/best[height<=1080]"`, `--merge-output-format mp4`
- Downloads land in `/opt/bulk-import/staging/<actor_name>/<video_id>/` with .mp4 + .info.json + .jpg
- On successful download, move to `/opt/bulk-import/completed/<actor_name>/<video_id>/`
- **Note:** transcoder.py (Step 6) picks up from completed/ — downloader does NOT feed importer directly
- VPN rotation: detect rate-limit (HTTP 429, sign-in required, bot detection), disconnect current VPN, connect to next country in rotation list, retry
- State file: `/opt/bulk-import/config/downloader-state.json` — tracks current channel index, current VPN country, last activity timestamp
- Logging to `/opt/bulk-import/logs/downloader.log` — include `=== Channel: <name> ===` markers (WATCHTOWER parses these)
- Target throughput: ~30 videos/hr
- Graceful shutdown on SIGTERM/SIGINT
### Step 6: Build transcoder.py
**Where:** cortex (local — this IS the CC host) at `/opt/bulk-import/transcoder.py`
**What:** Pulls H.264 videos from CT 110, re-encodes to H.265 via NVENC, pushes back
**Connectivity Gate:**
```bash
nvidia-smi > /dev/null 2>&1 && echo "GPU OK" || echo "GPU MISSING"
ffmpeg -encoders 2>/dev/null | grep -q hevc_nvenc && echo "HEVC NVENC OK" || echo "HEVC NVENC MISSING"
ssh peertube 'ls /opt/bulk-import/completed/' > /dev/null 2>&1 && echo "SSH OK" || echo "SSH FAIL"
```
Requirements:
- Watch CT 110's `/opt/bulk-import/completed/` for new video directories (via SSH/rsync polling, not inotify — it's remote)
- For each video dir found:
1. `rsync` the dir from CT 110 to cortex local temp: `/opt/bulk-import/transcode-work/<actor_name>/<video_id>/`
2. Run ffmpeg: `ffmpeg -hwaccel cuda -i input.mp4 -c:v hevc_nvenc -preset medium -cq 28 -tag:v hvc1 -c:a aac -b:a 128k output.mp4`
- `-cq 28` = constant quality mode (NVENC equivalent of CRF)
- `-tag:v hvc1` = Apple/browser compatible HEVC tag
- `-preset medium` = balance speed/quality (can tune later)
- Preserve .info.json and .jpg (just copy, don't re-encode)
3. `rsync` the transcoded dir back to CT 110: `/opt/bulk-import/transcoded/<actor_name>/<video_id>/`
4. Remove the source from CT 110's `completed/` dir (it's been transcoded)
5. Clean up local temp
- Skip videos that already exist in `transcoded/`
- Logging to `/opt/bulk-import/logs/transcoder.log` on cortex (and/or stream to CT 110)
- State file: `/opt/bulk-import/config/transcoder-state.json` on cortex
- Graceful shutdown on SIGTERM/SIGINT — finish current transcode, don't start new ones
- Target throughput: depends on video length, but NVENC should handle ~2-5 videos/hr for typical 10-20min content at 1080p
- One video at a time (NVENC session limit on A4000)
**Directory structure on cortex:**
```
/opt/bulk-import/ ← transcoder home on cortex
├── transcoder.py
├── config/
│ └── transcoder-state.json
├── logs/
│ └── transcoder.log
└── transcode-work/ ← temp working dir, cleaned after each video
```
**ffmpeg must be installed on cortex with NVENC support:**
```bash
sudo apt install -y ffmpeg
ffmpeg -encoders 2>/dev/null | grep hevc_nvenc # must show hevc_nvenc
# If missing: sudo apt install -y libnvidia-encode-550 (match driver version)
```
### Step 7: Build importer.py
**Where:** CT 110 at `/opt/bulk-import/importer.py`
**What:** Watches transcoded/ dir, uploads to PeerTube via API
**Deploy:** Same as Step 5 — write locally, scp to CT 110.
**SSH Gate:** `ssh peertube 'ls /opt/bulk-import/config/channel-map.json && curl -s http://localhost:9000/api/v1/config | head -c 50'` — channel map AND PeerTube API must be reachable.
Requirements:
- Watch `/opt/bulk-import/transcoded/` for new video directories (NOT completed/ — transcoder feeds this)
- For each video dir: read .info.json, extract title, description, upload_date (→ originallyPublishedAt), tags, thumbnail
- Map `<actor_name>` from dir path → `peertube_channel_id` from channel-map.json
- Upload via PeerTube API: `POST /api/v1/videos/upload` with multipart form data
- Set: name, description, channelId, originallyPublishedAt, tags (first 5), thumbnailfile, privacy (1=public), **waitTranscoding=false** (video is already H.265, no PeerTube transcoding needed)
- On success: **DELETE the video dir from `transcoded/`** — PeerTube's storage is the authoritative copy. No `imported/` directory.
- On failure: move to `/opt/bulk-import/failed/` with error log
- Rate: process one video at a time, ~50/hr max (don't overwhelm PeerTube)
- Dedup: check if video title + channel already exists before uploading
- Logging to `/opt/bulk-import/logs/importer.log`
- OAuth token management: cache token, refresh on 401
### Step 8: Systemd Services
**Where:** CT 110 (downloader + importer) AND cortex (transcoder)
**What:** Service files for all three pipeline components
**SSH Gate:** `ssh peertube 'ls /opt/bulk-import/downloader.py /opt/bulk-import/importer.py'` — both CT 110 scripts must exist (Steps 5 and 7 complete). `/opt/bulk-import/transcoder.py` must exist on cortex (Step 6 complete).
**On CT 110:**
```bash
# /etc/systemd/system/pt-downloader.service
[Unit]
Description=PeerTube Bulk Downloader
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=peertube
Group=peertube
ExecStart=/usr/bin/python3 /opt/bulk-import/downloader.py
WorkingDirectory=/opt/bulk-import
Restart=always
RestartSec=60
StandardOutput=journal
StandardError=journal
SyslogIdentifier=pt-downloader
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/pt-importer.service — same pattern, ExecStart points to importer.py
```
**On cortex:**
```bash
# /etc/systemd/system/pt-transcoder.service
[Unit]
Description=PeerTube H.265 NVENC Transcoder
After=network-online.target nvidia-persistenced.service
Wants=network-online.target
[Service]
Type=simple
User=zvx
Group=zvx
ExecStart=/usr/bin/python3 /opt/bulk-import/transcoder.py
WorkingDirectory=/opt/bulk-import
Restart=always
RestartSec=60
StandardOutput=journal
StandardError=journal
SyslogIdentifier=pt-transcoder
MemoryMax=12G
[Install]
WantedBy=multi-user.target
```
Enable but **do not start** until testing is complete.
### Step 9: PeerTube Transcoding Config — DISABLED
**Where:** CT 110
**What:** Disable PeerTube's built-in transcoding — videos arrive pre-transcoded as H.265
**SSH Gate:** `ssh peertube 'hostname'` must succeed.
Edit `/var/www/peertube/config/local-production.json`:
```json
{
"transcoding": {
"enabled": false
},
"import": {
"videos": {
"concurrency": 4,
"http": { "enabled": true },
"torrent": { "enabled": false }
}
},
"video_channel_synchronization": {
"enabled": false
}
}
```
Restart PeerTube after config changes: `sudo systemctl restart peertube`
**Why disabled:** Videos are pre-transcoded to H.265 by cortex (Step 6) before import. The importer uploads with `waitTranscoding=false`. PeerTube serves the file as-is. No runner needed, no re-encode, no wasted cycles.
### Step 10: Integration Test
**Full connectivity gate — ALL must pass:**
```bash
ssh peertube 'hostname' # SSH to CT 110
ssh peertube 'curl -s http://localhost:9000/api/v1/config | head -c 50' # PeerTube API
ssh peertube 'systemctl is-active peertube' # PeerTube service
nvidia-smi > /dev/null 2>&1 && echo "GPU OK" # cortex GPU
ffmpeg -encoders 2>/dev/null | grep -q hevc_nvenc && echo "NVENC OK" # HEVC encoder
ssh peertube 'ls /opt/bulk-import/completed/' > /dev/null && echo "Dirs OK" # Pipeline dirs
```
1. Start downloader — let it grab 5-10 videos from 2-3 different channels
2. Verify videos land in `/opt/bulk-import/completed/` with .mp4 + .info.json + .jpg
3. Start transcoder on cortex — verify it pulls videos, encodes H.265 via NVENC (`nvidia-smi` shows encoder utilization)
4. Verify transcoded files land in `/opt/bulk-import/transcoded/` on CT 110, and originals cleared from `completed/`
5. Verify transcoded file is H.265: `ffprobe -v error -select_streams v:0 -show_entries stream=codec_name -of csv=p=0 <file>` should return `hevc`
6. Start importer — verify videos appear in PeerTube UI with correct metadata, channel assignment, thumbnails
7. Verify playback works at stream.echo6.co (H.265 plays natively in modern browsers via HLS/web-video)
8. Check dedup — restart downloader, verify it skips already-downloaded videos
9. Check VPN rotation — trigger a rate limit (or simulate), verify country switches
### Step 11: Go-Live
**On CT 110:**
```bash
systemctl start pt-downloader && systemctl start pt-importer
systemctl enable pt-downloader && systemctl enable pt-importer
```
**On cortex:**
```bash
systemctl start pt-transcoder
systemctl enable pt-transcoder
```
Monitor for 24 hours. Expected steady-state:
- Downloader: ~30 videos/hr
- Transcoder: ~2-5 videos/hr (bottleneck — NVENC is fast but 1080p H.265 takes time per video)
- Importer: keeps up with transcoder output, ~50/hr capacity but paced by transcoder
- GPU utilization: 80-100% encoder, minimal CUDA (no conflict with Ollama)
**⚠️ The transcoder is the bottleneck.** At ~3 videos/hr average, 70K videos = ~970 days. Strategies to accelerate:
- Lower quality preset: `-preset fast` or `-preset hp` (speed over quality)
- Accept lower CQ: `-cq 32` instead of 28 (smaller files, slightly lower quality)
- Run 2 NVENC sessions in parallel (A4000 supports ~3 concurrent)
- Add a second GPU node
- Accept H.264 for bulk and only H.265 for new imports going forward
---
## Manual Inputs Required (Before CC Can Execute)
| Item | Who | When Needed |
|------|-----|-------------|
| NordVPN token | Matt | Step 3 |
| YouTube cookies.txt | Matt | Step 4 |
| PeerTube admin password | Matt | Step 2 (OAuth) |
---
## Dependencies Between Steps
```
Step 1 (channel map) ──→ Step 2 (create channels) ──→ Step 7 (importer needs channel IDs)
Step 3 (VPN) + Step 4 (cookies) ──→ Step 5 (downloader) ──→ Step 6 (transcoder reads completed/)
Step 7 (importer reads transcoded/)
Step 9 (disable PT transcoding) ←── independent, do anytime before Step 10
Step 10 (integration test) ←── requires ALL of 1-9
Step 11 (go-live) ←── requires Step 10 pass
```
Steps 1-2 and Step 9 are independent workstreams. Steps 3-4 require Matt's manual input. Steps 5, 6, 7 are the three core scripts. Step 6 runs on cortex; everything else runs on CT 110.
---
## What NOT to Build (Phase 3 — WATCHTOWER)
WATCHTOWER (the monitoring dashboard) is Phase 3. Don't build it now. The pipeline scripts should have enough logging that we can monitor via `journalctl` and log files during Phase 2. WATCHTOWER will eventually:
- SSH into CT 110 to read pipeline metrics (but CT 110 is native now, not Docker — queries change)
- Point to cortex instead of old TOC for GPU stats
- Read channel-map.json from `/opt/bulk-import/config/` instead of old `/mnt/data/bulk-import/`
- Need new .env config for all changed IPs
But that's later. Pipeline first.
---
## Channel Management (via RECON Dashboard)
**Added 2026-02-18.** Channel management UI is now in the RECON dashboard Upload tab at `http://192.168.1.130:8420/upload`. No more SSH + manual JSON editing to add channels.
- **Sudoers:** `/etc/sudoers.d/recon-mgmt` on CT 110 — allows zvx to run yt-dlp, psql, and tee as peertube
- **API endpoints** in `/opt/recon/lib/api.py`:
- `GET /api/peertube/channels` — list all channels with video counts from PeerTube DB
- `GET /api/peertube/channels/stats` — total channels, total videos, downloader status
- `POST /api/peertube/channels/add` — resolve YT URL via yt-dlp, create PeerTube channel, update channel-map.json
- `DELETE /api/peertube/channels/<actor_name>` — remove from JSON and PeerTube
- **UI features:** stats bar, add form (URL + category + priority), sortable channel table, remove button
- **All operations go through SSH from CT 130 → CT 110** using the existing `_ssh_peertube()` helper