Commit graph

18 commits

Author SHA1 Message Date
22dd6c19a7 Add cleanup-log.md: full triage through Phase 6k
First commit of the cleanup log to the repo — previously maintained
as an uncommitted working document across sessions.

31 original items triaged. 11 moved to Resolved section with phase
references (6a through 6k, PROJECT-BIBLE rewrite, pi-nas decommission).
5 new backlog items added (duplicate consolidation, legacy data/text
dirs, backup architecture, signal-archive, Phase 5a edge cases).
4,771 duplicate PDFs marked PARTIALLY RESOLVED (hash-match dupes
handled; same-content-different-bytes clusters split to new item).
2026-04-16 17:25:55 +00:00
b1c05c4d02 PROJECT-BIBLE: fix storage topology — library is LXC bind-mount, not NFS
- Section 2 topology diagram: 'Library (LXC bind) / data /mnt/data/library
  → /mnt/library/ (read/write, local SSD)'
- Section 10 Config table: library_root described as bind-mount root
- Section 13 Filesystem layout: /mnt/library annotated as LXC bind-mount
- Section 14 Refactor history: storage migration note added (NFS history
  preserved as historical context)
- Section 15 Operational runbook: replaced recon-backup.timer reference
  with planned/TBD note
- Section 16 Known Gotchas: new bullet on bind-mount file ownership and
  the absence of NFS / root_squash in the path
- Section 17 Credentials & Hosts: added data host row; rewrote pi-nas
  role to backup target (planned, not yet configured) reflecting the
  2026-04-15 wipe of /export/library
- Section 18 Open Follow-ups: added backup architecture entry capturing
  the missing rsync job and the now-available ~300G pi-nas headroom
2026-04-16 06:50:36 +00:00
d1cde5a56d PROJECT-BIBLE: bring refactor history current through Phase 6k
Updates:
- Fix Phase 5a description (was incorrectly describing the un-file)
- Fix Phase 5b description (2,259 drain cohort)
- Add Phase 6f (text processor)
- Add Phase 6f-2 (format normalizer)
- Add Phase 6g (Gemini null bug fix)
- Add Phase 6h (STATE 2 cleanup + PeerTube transcription trigger)
- Add Phase 6i (dashboard upload migration, multi-format)
- Add Phase 6j (library cleanup, 51G freed)
- Add Phase 6k (Phase 5a un-file, 16,340 transcripts restored)
- Update Open Follow-ups with backlog items identified through Phase 6k
- Update footer to reflect refactor feature-complete state
2026-04-16 05:21:17 +00:00
c9a8f1ecb5 Add PROJECT-BIBLE.md: canonical architectural reference for RECON
Consolidated orientation document for future sessions. Covers pipeline
lifecycle (acquire → dispatch → process → enrich/embed → file),
acquisition modules, dispatcher, per-type processors, filing,
StatusDB schema, config, service threads, dashboard/API, filesystem
layout, refactor history, runbook, known gotchas, and follow-ups.

Sourced from live code on CT 130 (/opt/recon/) including recon.py,
dispatcher.py, filing.py, status.py, the three processors,
acquisition/peertube.py, config.yaml, and api.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 04:41:03 +00:00
5b0d4eed90 Phase 6e doc: add 6e-2 revert note and 6e-3 destination fix
Document the api.py revert (6e-2) and the shadowlib download
destination fix (6e-3) that redirects all three sources from
/mnt/library/Acquired/[SUBDIR]/ to the new dispatcher hopper
at /opt/recon/data/acquired/pdf/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 03:42:40 +00:00
3a118064ee Phase 6c: code cleanup documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 23:46:56 +00:00
e83a8f7045 Phase 6b: dashboard bug fix documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 23:06:09 +00:00
263a81c1e2 Phase 6a: transcript organized-in-place documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 22:50:18 +00:00
1d4106643c Add Phase 5c-2 doc: service start, failure analysis, and recovery
Documents the initial 5c-2 failure (ignore_errors=True + root-owned
legacy files), the recovery procedure (hopper reconstitution, orphan
cleanup, processor fix), and the successful retry with pipeline drain
in progress.
2026-04-14 20:23:37 +00:00
581f0017f0 Phase 5c-1: service rewire (code only, not started)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:33:15 +00:00
6b49d4c107 Phase 5b: transcript unprocess — stage 2,259 skip_unclassified transcripts into hopper
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:19:35 +00:00
71dd0a1182 Phase 5a: transcript resweep (18855 transcripts)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 17:41:38 +00:00
Ubuntu
1d9727f26f Phase 4: PDF processor with layered metadata extraction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 16:59:59 +00:00
0747cb761f Phase 3: transcript processor end-to-end test doc
Documents dispatcher, transcript processor, text_dir resolution,
and full pipeline test results (172f39ae → skip_unclassified).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 15:44:25 +00:00
2a1d211d7c Phase 2: shared filing function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 15:04:13 +00:00
b697404df2 Phase 1: scaffolding (directories, config, text_dir column)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:48:27 +00:00
878bc2744a Phase 0: baseline capture
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 06:14:35 +00:00
aa195825e3 Initial design docs for RECON pipeline refactor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 06:08:06 +00:00