103 lines
4.5 KiB
Text
103 lines
4.5 KiB
Text
|
|
# Archive Receiver Discovery
|
||
|
|
# Generated: 2026-04-09 (Phase 6.0, Question 6)
|
||
|
|
#
|
||
|
|
# NOTE: Hookshot is BLOCKED for E2BE rooms with MAS. This analysis
|
||
|
|
# covers the receiver requirements IF hookshot were used, AND the
|
||
|
|
# alternative approaches that avoid hookshot entirely.
|
||
|
|
|
||
|
|
## Hookshot Receiver Requirements (if hookshot were viable)
|
||
|
|
|
||
|
|
### Minimum Functionality
|
||
|
|
1. Listen on HTTP port (plain HTTP on internal Docker network is fine — no TLS needed)
|
||
|
|
2. Accept multipart/form-data PUT or POST requests
|
||
|
|
3. Verify X-Matrix-Hookshot-Token header against per-room shared secret
|
||
|
|
4. Parse the `event` part as JSON
|
||
|
|
5. Write to durable storage
|
||
|
|
6. Return 200 OK (hookshot retries on non-2xx)
|
||
|
|
|
||
|
|
### Authentication
|
||
|
|
Hookshot sends a per-webhook auth token in X-Matrix-Hookshot-Token header.
|
||
|
|
The receiver validates this token against a known list.
|
||
|
|
|
||
|
|
## Storage Format Comparison
|
||
|
|
|
||
|
|
| Format | Pros | Cons | Recommended For |
|
||
|
|
|--------|------|------|-----------------|
|
||
|
|
| JSONL files | Greppable, simple, no DB, easy backup | No query capability, no indexes, scattered across files | "Never look at it" archival |
|
||
|
|
| SQLite per room | Self-contained, portable, SQL queries | Multiple files to manage, concurrent write limits | Small-scale per-room analysis |
|
||
|
|
| Single SQLite | One file, SQL queries, simple backup | Write contention at scale, max ~10K writes/sec | Small-to-medium single-server |
|
||
|
|
| Postgres | Full SQL, concurrent writes, indexes, JSONB | Needs a running server, more ops overhead | Query-heavy, large-scale |
|
||
|
|
|
||
|
|
Given Matt's "I'll never look at the DB" feedback:
|
||
|
|
- **Primary: JSONL files** — append-only, one per day per room, greppable, zero ops
|
||
|
|
- **Secondary: Single SQLite** — for when he does need to query (and he will eventually)
|
||
|
|
|
||
|
|
Both can coexist. The receiver writes JSONL immediately, a nightly job imports into SQLite.
|
||
|
|
|
||
|
|
## Receiver Location Options
|
||
|
|
|
||
|
|
| Location | Pros | Cons |
|
||
|
|
|----------|------|------|
|
||
|
|
| Same Contabo host | Simplest networking, no cross-host latency | Adds load to already-busy server |
|
||
|
|
| Separate CT on Proxmox | Isolated, near /mnt/library storage | Cross-network traffic, more infrastructure |
|
||
|
|
| pi-nas (library host) | Direct /mnt/library access, no NFS | Pi is slow, limited CPU/RAM |
|
||
|
|
|
||
|
|
**Recommendation:** Separate CT on Proxmox (data node preferred — has 1TB NVMe + 1TB SATA).
|
||
|
|
- /mnt/library is NFS-mounted on data node CTs
|
||
|
|
- Lightweight Python service, minimal resources (512MB RAM, 1 core)
|
||
|
|
- Keeps archive processing off Contabo
|
||
|
|
|
||
|
|
## Alternative Approaches (No Hookshot)
|
||
|
|
|
||
|
|
### Approach A: Synapse-Level Only (Simplest)
|
||
|
|
|
||
|
|
No bot, no receiver. Just Synapse config changes:
|
||
|
|
```yaml
|
||
|
|
redaction_retention_period: null
|
||
|
|
experimental_features:
|
||
|
|
msc2815_enabled: true
|
||
|
|
```
|
||
|
|
|
||
|
|
Data stays in Synapse's Postgres forever. Query via:
|
||
|
|
- Synapse admin API: GET /_synapse/admin/v1/rooms/{room_id}/messages
|
||
|
|
- Direct Postgres: SELECT from event_json WHERE room_id = '...'
|
||
|
|
|
||
|
|
Export scripts run on Contabo, dump to /mnt/library via NFS or rsync.
|
||
|
|
|
||
|
|
Pros: Zero new infrastructure, zero ops burden, data already exists in DB
|
||
|
|
Cons: No real-time alerting, export is batch-only, tied to Synapse DB format
|
||
|
|
|
||
|
|
### Approach B: Custom matrix-nio Bot (Original Phase 6 Plan)
|
||
|
|
|
||
|
|
Python bot using matrix-nio with E2EE + MSC4190 support.
|
||
|
|
- Handles MAS login correctly (unlike hookshot)
|
||
|
|
- Decrypts E2BE rooms natively
|
||
|
|
- Writes to its own DB (independent of Synapse retention)
|
||
|
|
- Real-time capture with custom schema
|
||
|
|
|
||
|
|
Pros: Full control, real-time, independent archive, custom schema
|
||
|
|
Cons: More code to write and maintain, another service to monitor
|
||
|
|
|
||
|
|
### Approach C: Hybrid (Recommended)
|
||
|
|
|
||
|
|
Combine Approach A + lightweight export:
|
||
|
|
1. Enable `redaction_retention_period: null` + `msc2815_enabled: true`
|
||
|
|
→ Synapse retains everything, MSC2815 provides moderator access
|
||
|
|
2. Build a simple export script (NOT a bot, NOT a service):
|
||
|
|
- Runs nightly via cron
|
||
|
|
- Queries Synapse admin API for room events
|
||
|
|
- Writes JSONL + markdown exports to /mnt/library
|
||
|
|
- No E2EE handling needed — queries the server-side decrypted content
|
||
|
|
3. No new services, no bot accounts, no device verification
|
||
|
|
|
||
|
|
This avoids the hookshot E2EE+MAS blocker entirely AND avoids the complexity
|
||
|
|
of a custom matrix-nio bot. The Synapse admin API already has the data.
|
||
|
|
|
||
|
|
## CT Number for Receiver/Export Service (if needed)
|
||
|
|
|
||
|
|
Current CT assignments on data node: CT 130 (RECON)
|
||
|
|
Free CTs on data node: 131-149
|
||
|
|
|
||
|
|
If a dedicated CT is needed: CT 131 (next available on data node)
|
||
|
|
But with Approach C (hybrid), no dedicated CT is needed — the export script
|
||
|
|
runs on Contabo via cron alongside the existing backup job.
|