echo6-docs/archive_receiver_discovery.ref

103 lines
4.5 KiB
Text
Raw Normal View History

# Archive Receiver Discovery
# Generated: 2026-04-09 (Phase 6.0, Question 6)
#
# NOTE: Hookshot is BLOCKED for E2BE rooms with MAS. This analysis
# covers the receiver requirements IF hookshot were used, AND the
# alternative approaches that avoid hookshot entirely.
## Hookshot Receiver Requirements (if hookshot were viable)
### Minimum Functionality
1. Listen on HTTP port (plain HTTP on internal Docker network is fine — no TLS needed)
2. Accept multipart/form-data PUT or POST requests
3. Verify X-Matrix-Hookshot-Token header against per-room shared secret
4. Parse the `event` part as JSON
5. Write to durable storage
6. Return 200 OK (hookshot retries on non-2xx)
### Authentication
Hookshot sends a per-webhook auth token in X-Matrix-Hookshot-Token header.
The receiver validates this token against a known list.
## Storage Format Comparison
| Format | Pros | Cons | Recommended For |
|--------|------|------|-----------------|
| JSONL files | Greppable, simple, no DB, easy backup | No query capability, no indexes, scattered across files | "Never look at it" archival |
| SQLite per room | Self-contained, portable, SQL queries | Multiple files to manage, concurrent write limits | Small-scale per-room analysis |
| Single SQLite | One file, SQL queries, simple backup | Write contention at scale, max ~10K writes/sec | Small-to-medium single-server |
| Postgres | Full SQL, concurrent writes, indexes, JSONB | Needs a running server, more ops overhead | Query-heavy, large-scale |
Given Matt's "I'll never look at the DB" feedback:
- **Primary: JSONL files** — append-only, one per day per room, greppable, zero ops
- **Secondary: Single SQLite** — for when he does need to query (and he will eventually)
Both can coexist. The receiver writes JSONL immediately, a nightly job imports into SQLite.
## Receiver Location Options
| Location | Pros | Cons |
|----------|------|------|
| Same Contabo host | Simplest networking, no cross-host latency | Adds load to already-busy server |
| Separate CT on Proxmox | Isolated, near /mnt/library storage | Cross-network traffic, more infrastructure |
| pi-nas (library host) | Direct /mnt/library access, no NFS | Pi is slow, limited CPU/RAM |
**Recommendation:** Separate CT on Proxmox (data node preferred — has 1TB NVMe + 1TB SATA).
- /mnt/library is NFS-mounted on data node CTs
- Lightweight Python service, minimal resources (512MB RAM, 1 core)
- Keeps archive processing off Contabo
## Alternative Approaches (No Hookshot)
### Approach A: Synapse-Level Only (Simplest)
No bot, no receiver. Just Synapse config changes:
```yaml
redaction_retention_period: null
experimental_features:
msc2815_enabled: true
```
Data stays in Synapse's Postgres forever. Query via:
- Synapse admin API: GET /_synapse/admin/v1/rooms/{room_id}/messages
- Direct Postgres: SELECT from event_json WHERE room_id = '...'
Export scripts run on Contabo, dump to /mnt/library via NFS or rsync.
Pros: Zero new infrastructure, zero ops burden, data already exists in DB
Cons: No real-time alerting, export is batch-only, tied to Synapse DB format
### Approach B: Custom matrix-nio Bot (Original Phase 6 Plan)
Python bot using matrix-nio with E2EE + MSC4190 support.
- Handles MAS login correctly (unlike hookshot)
- Decrypts E2BE rooms natively
- Writes to its own DB (independent of Synapse retention)
- Real-time capture with custom schema
Pros: Full control, real-time, independent archive, custom schema
Cons: More code to write and maintain, another service to monitor
### Approach C: Hybrid (Recommended)
Combine Approach A + lightweight export:
1. Enable `redaction_retention_period: null` + `msc2815_enabled: true`
→ Synapse retains everything, MSC2815 provides moderator access
2. Build a simple export script (NOT a bot, NOT a service):
- Runs nightly via cron
- Queries Synapse admin API for room events
- Writes JSONL + markdown exports to /mnt/library
- No E2EE handling needed — queries the server-side decrypted content
3. No new services, no bot accounts, no device verification
This avoids the hookshot E2EE+MAS blocker entirely AND avoids the complexity
of a custom matrix-nio bot. The Synapse admin API already has the data.
## CT Number for Receiver/Export Service (if needed)
Current CT assignments on data node: CT 130 (RECON)
Free CTs on data node: 131-149
If a dedicated CT is needed: CT 131 (next available on data node)
But with Approach C (hybrid), no dedicated CT is needed — the export script
runs on Contabo via cron alongside the existing backup job.