# PeerTube Remote Runner — GPU Transcoding Deploy a PeerTube remote runner with NVENC GPU transcoding. The runner pulls jobs from PeerTube over WebSocket, transcodes with the GPU, and uploads HLS streams back. Use this when adding a new runner node, rebuilding an existing one, or re-registering after a PeerTube rebuild. --- ## Prerequisites - PeerTube instance running and accessible (HTTP, not necessarily HTTPS) - PeerTube admin credentials or API access to generate runner registration tokens - Target machine with: - NVIDIA GPU with NVENC support (Maxwell gen 2+ / GTX 950+) - NVIDIA drivers installed and working (`nvidia-smi` returns output) - Node.js 18+ installed - SSH access from CC host If the target machine needs NVIDIA drivers or Node.js, see `proxmox-create-ubuntu-vm.md` Steps 9 and 11. --- ## Inputs Prompt the user for all of these before executing: ``` RUNNER_HOST= # SSH alias or IP for the runner machine (e.g., cortex) RUNNER_NAME= # Human-readable runner name (e.g., "cortex-nvenc") RUNNER_USER= # User to run the service as (e.g., "zvx") PT_URL= # PeerTube instance URL reachable from runner (e.g., "http://100.64.0.23:9000") PT_HOST_HEADER= # PeerTube's public hostname for Host header (e.g., "stream.echo6.co") PT_ADMIN_USER= # PeerTube admin username (e.g., "root") PT_ADMIN_PASS= # PeerTube admin password INSTALL_DIR= # Where to put runner config (default: /opt/peertube-runner) ``` --- ## Step 1: Verify GPU ```bash ssh $RUNNER_HOST 'nvidia-smi --query-gpu=name,driver_version,memory.total,encoder.stats.sessionCount --format=csv,noheader' ``` ### Gate Must return GPU name, driver version, and VRAM. If it fails: - No output → NVIDIA drivers not installed. See `proxmox-create-ubuntu-vm.md` Step 9. - "NVML: Driver/library version mismatch" → reboot the machine. - "No devices found" → GPU passthrough not configured (VMs) or hardware issue. Check NVENC specifically: ```bash ssh $RUNNER_HOST 'nvidia-smi -q | grep -A 5 "Encoder"' ``` Must show encoder session info. If "N/A", the GPU doesn't support NVENC or drivers are too old. --- ## Step 2: Install peertube-runner PeerTube runner is distributed via npm. ```bash ssh $RUNNER_HOST 'which node && node --version' # Must be 18+ ``` Install the runner: ```bash ssh $RUNNER_HOST 'sudo npm install -g @peertube/peertube-runner' ssh $RUNNER_HOST 'which peertube-runner && peertube-runner --version' ``` ### Gate `peertube-runner --version` must return a version number. If npm install fails, check Node.js version (must be 18+). Create config directory: ```bash ssh $RUNNER_HOST "sudo mkdir -p $INSTALL_DIR && sudo chown $RUNNER_USER:$RUNNER_USER $INSTALL_DIR" ``` --- ## Step 3: Generate Registration Token on PeerTube Get a registration token from the PeerTube instance. This requires admin access. ### Option A: Via API ```bash # Get OAuth client credentials CLIENT_CREDS=$(ssh $RUNNER_HOST "curl -s $PT_URL/api/v1/oauth-clients/local -H 'Host: $PT_HOST_HEADER'") CLIENT_ID=$(echo "$CLIENT_CREDS" | jq -r '.client_id') CLIENT_SECRET=$(echo "$CLIENT_CREDS" | jq -r '.client_secret') # Get admin token TOKEN_RESP=$(ssh $RUNNER_HOST "curl -s $PT_URL/api/v1/users/token \ -H 'Host: $PT_HOST_HEADER' \ --data 'client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&grant_type=password&username=$PT_ADMIN_USER&password=$PT_ADMIN_PASS'") ACCESS_TOKEN=$(echo "$TOKEN_RESP" | jq -r '.access_token') # Generate runner registration token REG_TOKEN=$(ssh $RUNNER_HOST "curl -s -X POST $PT_URL/api/v1/runners/registration-tokens/generate \ -H 'Host: $PT_HOST_HEADER' \ -H 'Authorization: Bearer $ACCESS_TOKEN' | jq -r '.registrationToken'") echo "Registration token: $REG_TOKEN" ``` ### Option B: Via PeerTube Admin UI 1. Log into PeerTube as admin 2. Administration → System → Runners 3. Click "Generate registration token" 4. Copy the token ### Gate Must have a registration token string. If the API returns errors: - 401 → wrong admin credentials - 404 → PeerTube version too old (runners require v5.2+) - Connection refused → PeerTube not reachable from runner. Check URL and network. --- ## Step 4: Register Runner ```bash ssh $RUNNER_HOST "peertube-runner register \ --url $PT_URL \ --registration-token $REG_TOKEN \ --runner-name $RUNNER_NAME" ``` ### Gate Must complete without errors. Verify registration: ```bash ssh $RUNNER_HOST 'peertube-runner list-registered' ``` Must show the PeerTube instance URL. If registration fails: - "Invalid registration token" → token already used or expired. Generate a new one. - "ECONNREFUSED" → runner can't reach PeerTube. Test: `curl -s $PT_URL/api/v1/config` - "self-signed certificate" → if PeerTube uses HTTPS with self-signed cert, use `NODE_TLS_REJECT_UNAUTHORIZED=0` (not recommended) or fix the cert. --- ## Step 5: Configure NVENC The runner auto-detects ffmpeg capabilities, but verify NVENC is available to ffmpeg: ```bash ssh $RUNNER_HOST 'ffmpeg -encoders 2>/dev/null | grep nvenc' ``` Must show `h264_nvenc` and `hevc_nvenc`. If missing, install ffmpeg with NVENC support: ```bash # Ubuntu/Debian — the default ffmpeg usually includes NVENC if drivers are installed ssh $RUNNER_HOST 'sudo apt install -y ffmpeg' # Re-check ssh $RUNNER_HOST 'ffmpeg -encoders 2>/dev/null | grep nvenc' ``` If still missing, the NVIDIA drivers may not include the encoding libraries. Install: ```bash ssh $RUNNER_HOST 'sudo apt install -y libnvidia-encode-550' # Match your driver version ``` --- ## Step 6: Create systemd Service ```bash ssh $RUNNER_HOST "sudo tee /etc/systemd/system/peertube-runner.service > /dev/null << 'EOF' [Unit] Description=PeerTube Remote Runner (NVENC) After=network-online.target nvidia-persistenced.service Wants=network-online.target Requires=nvidia-persistenced.service [Service] Type=simple User=$RUNNER_USER Group=$RUNNER_USER Environment=NODE_ENV=production ExecStart=/usr/bin/peertube-runner server \ --enable-job vod-hls-transcoding \ --enable-job vod-audio-merge-transcoding \ --enable-job live-rtmp-hls-transcoding \ --enable-job video-studio-transcoding \ --enable-job video-transcription WorkingDirectory=/home/$RUNNER_USER Restart=always RestartSec=30 StandardOutput=journal StandardError=journal SyslogIdentifier=peertube-runner MemoryMax=20G [Install] WantedBy=multi-user.target EOF" ssh $RUNNER_HOST 'sudo systemctl daemon-reload' ssh $RUNNER_HOST 'sudo systemctl enable peertube-runner' ssh $RUNNER_HOST 'sudo systemctl start peertube-runner' ``` ### Gate ```bash ssh $RUNNER_HOST 'systemctl is-active peertube-runner' ``` Must return `active`. If it fails, check logs: ```bash ssh $RUNNER_HOST 'journalctl -u peertube-runner -n 50 --no-pager' ``` Common failures: - "Cannot find module" → peertube-runner not installed globally, or PATH issue. Check `which peertube-runner`. - "nvidia-persistenced.service not found" → remove the `Requires=` line if nvidia-persistenced isn't set up (it's optional but recommended). --- ## Step 7: Install Health Check Cron script that auto-restarts the runner if it crashes or the GPU becomes inaccessible. ```bash ssh $RUNNER_HOST "sudo tee $INSTALL_DIR/health.sh > /dev/null << 'HEALTH' #!/bin/bash LOG_TAG=\"peertube-runner-health\" if ! systemctl is-active --quiet peertube-runner; then logger -t \$LOG_TAG \"Runner not active, restarting...\" systemctl restart peertube-runner sleep 10 fi if ! pgrep -f \"peertube-runner server\" > /dev/null; then logger -t \$LOG_TAG \"Runner process not found, restarting service...\" systemctl restart peertube-runner fi if ! nvidia-smi > /dev/null 2>&1; then logger -t \$LOG_TAG \"GPU not accessible, restarting nvidia-persistenced and runner...\" systemctl restart nvidia-persistenced 2>/dev/null sleep 5 systemctl restart peertube-runner fi HEALTH chmod +x $INSTALL_DIR/health.sh" # Add cron job (every 5 minutes) ssh $RUNNER_HOST "(crontab -l 2>/dev/null | grep -v peertube-runner-health; echo '*/5 * * * * $INSTALL_DIR/health.sh') | crontab -" ``` Verify: ```bash ssh $RUNNER_HOST 'crontab -l | grep peertube' ``` --- ## Step 8: Test Transcoding Upload a test video and verify the full pipeline works. ### Quick test via API ```bash # Download a short test video ssh $RUNNER_HOST 'curl -L -o /tmp/test-video.mp4 "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4" 2>/dev/null' # Upload to PeerTube (reuse ACCESS_TOKEN from Step 3) ssh $RUNNER_HOST "curl -s -X POST $PT_URL/api/v1/videos/upload \ -H 'Host: $PT_HOST_HEADER' \ -H 'Authorization: Bearer $ACCESS_TOKEN' \ -F 'videofile=@/tmp/test-video.mp4' \ -F 'name=Runner Test Video' \ -F 'channelId=1' \ -F 'privacy=1' \ -F 'waitTranscoding=true' | jq '{uuid, name, state}'" ``` ### Verify GPU is processing Within 30 seconds of upload: ```bash ssh $RUNNER_HOST 'nvidia-smi --query-gpu=utilization.gpu,utilization.encoder,temperature.gpu,power.draw --format=csv,noheader' ``` GPU utilization and encoder utilization should be non-zero. If encoder shows 0% but GPU shows activity, NVENC isn't being used — check ffmpeg encoder detection (Step 5). ### Check runner logs ```bash ssh $RUNNER_HOST 'journalctl -u peertube-runner -n 20 --no-pager | grep -i "transcod\|job\|error"' ``` Should show job pickup, transcoding progress, and completion. ### Clean up ```bash # Delete test video via API (optional) ssh $RUNNER_HOST "curl -s -X DELETE $PT_URL/api/v1/videos/ \ -H 'Host: $PT_HOST_HEADER' \ -H 'Authorization: Bearer $ACCESS_TOKEN'" ssh $RUNNER_HOST 'rm -f /tmp/test-video.mp4' ``` --- ## Verification Checklist ```bash echo "=== PeerTube Runner Check ===" echo "" echo "GPU: $(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null || echo 'MISSING')" echo "NVENC: $(ffmpeg -encoders 2>/dev/null | grep -c h264_nvenc) encoders" echo "Runner ver: $(peertube-runner --version 2>/dev/null || echo 'NOT INSTALLED')" echo "Registered: $(peertube-runner list-registered 2>/dev/null | grep -c 'http' || echo '0') instance(s)" echo "Service: $(systemctl is-active peertube-runner 2>/dev/null || echo 'NOT RUNNING')" echo "Health cron: $(crontab -l 2>/dev/null | grep -c peertube-runner || echo '0') entries" echo "ffmpeg procs: $(pgrep -c ffmpeg 2>/dev/null || echo '0') active" echo "GPU util: $(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader 2>/dev/null || echo 'N/A')" ``` --- ## Adding a Second PeerTube Instance To register the same runner with another PeerTube instance: ```bash peertube-runner register \ --url \ --registration-token \ --runner-name $RUNNER_NAME ``` The runner handles multiple registrations automatically — it polls all registered instances for jobs. ## Unregistering ```bash # List registrations peertube-runner list-registered # Unregister from a specific instance peertube-runner unregister --url $PT_URL ``` --- ## Troubleshooting ### Runner picks up jobs but transcoding fails immediately Check ffmpeg NVENC access: ```bash ffmpeg -y -f lavfi -i testsrc=duration=5:size=1280x720:rate=30 -c:v h264_nvenc /tmp/nvenc-test.mp4 ``` If this fails, NVENC isn't accessible to ffmpeg. Common causes: wrong driver version, missing libnvidia-encode, or GPU in use by another process that's holding all NVENC sessions. ### Runner shows 0 active jobs despite pending queue - WebSocket connection issue. Check: `journalctl -u peertube-runner | grep -i websocket` - Runner registered with wrong URL. Verify: `peertube-runner list-registered` - PeerTube remote runners not enabled. Check PeerTube config: `transcoding.remote_runners.enabled` must be `true` ### GPU utilization stuck at 100% / NVENC sessions maxed The RTX A4000 supports ~3 simultaneous NVENC sessions (consumer cards are limited to 3, pro cards vary). If all sessions are in use, new jobs queue on the runner side. This is normal — throughput is limited by NVENC session count, not GPU compute. To increase throughput: patch the NVENC session limit (search "nvidia nvenc patch") or add a second runner node. ### Runner keeps disconnecting / restarting - Check memory: `free -h`. The 20GB MemoryMax in the service file may be too low if processing many concurrent jobs. Increase if needed. - Check disk space: transcoding uses temp space. Ensure `/tmp` or the runner's working directory has sufficient free space (10GB+ recommended). - Network instability between runner and PeerTube. Use Tailscale IP instead of public URL for reliability. ### After PeerTube rebuild, runner can't connect Registration is tied to the PeerTube instance. After a rebuild: 1. Unregister: `peertube-runner unregister --url $PT_URL` 2. Generate new registration token on the new PeerTube 3. Re-register: `peertube-runner register --url $PT_URL --registration-token --runner-name $RUNNER_NAME` 4. Restart: `sudo systemctl restart peertube-runner` --- ## Quick Reference: Current Runners | Runner | Host | GPU | PeerTube Instance | Status | |--------|------|-----|-------------------|--------| | cortex-nvenc | cortex (VM 150 on TOC) | RTX A4000 16GB | stream.echo6.co (CT 110) | Active | --- ## Whisper Transcription Setup The runner also handles `video-transcription` jobs (auto-captioning via Whisper). A smart wrapper routes jobs based on audio duration: ### Smart Wrapper (`/usr/local/bin/whisper-smart` on cortex) - **Model:** `medium` (good accuracy, fits in VRAM on float16) - **GPU path (< 1 hour):** `--device cuda --compute_type float16` — ~3.7GB VRAM, fast - **CPU path (>= 1 hour):** `--device cpu --compute_type int8` — ~8-11GB RAM, slow but avoids VRAM exhaustion - **CPU serialization:** `flock --nonblock /tmp/whisper-cpu.lock` — only one CPU transcription at a time. If lock is held, wrapper exits 1 and the runner retries the job later. - **Concurrency:** Runner config set to `concurrency = 2` — allows one GPU + one CPU job in parallel ### Symlink chain ``` /usr/local/bin/whisper-ctranslate2 → /usr/local/bin/whisper-smart (the smart wrapper) /usr/local/bin/whisper-ctranslate2-real → /home/zvx/.local/bin/whisper-ctranslate2 (actual Python binary) ``` ### Key config - **Runner config:** `~/.config/peertube-runner-nodejs/default/config.toml` — `model = "medium"`, `concurrency = 2` - **PeerTube config:** `/var/www/peertube/config/production.yaml` on CT 110 — `model-name: medium` - **systemd MemoryMax:** `20G` (CPU int8 medium model peaks at ~11GB) ### Wrapper log ```bash tail -f /tmp/whisper-wrapper.log # Shows mode (GPU/CPU/CPU-BLOCKED), duration, args ``` --- *Last updated: 2026-02-17 — Added Whisper smart transcription setup, MemoryMax 12G→20G, concurrency 1→2*