mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
Phase 6c: code cleanup documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
e83a8f7045
commit
3a118064ee
1 changed files with 110 additions and 0 deletions
110
phases/phase-6c-code-cleanup.md
Normal file
110
phases/phase-6c-code-cleanup.md
Normal file
|
|
@ -0,0 +1,110 @@
|
||||||
|
# Phase 6c: Code Cleanup
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
|
||||||
|
Remove dead code paths left over from the refactor. Investigation first,
|
||||||
|
deletion second — only remove what's confirmed dead.
|
||||||
|
|
||||||
|
## Investigation Findings
|
||||||
|
|
||||||
|
### Expected dead code vs reality
|
||||||
|
|
||||||
|
| Item | Expected status | Actual status |
|
||||||
|
|------|----------------|---------------|
|
||||||
|
| `scanner_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 |
|
||||||
|
| `peertube_scanner_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 |
|
||||||
|
| `crawler_scheduler_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 |
|
||||||
|
| `organizer_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 |
|
||||||
|
| Extract worker thread | Vestigial in cmd_service() | **Confirmed dead** — 0 items queued, silent 24h+ |
|
||||||
|
| `lib/crawler.py` | Legacy module | **Confirmed dead** — only used by CLI subcommand |
|
||||||
|
| `lib/web_scraper.py` | Legacy module | **ALIVE** — `chunk_text()` used by transcript_processor |
|
||||||
|
| `lib/new_pipeline.py` | Legacy module | **ALIVE** — active Stream B library management tool (1,637 lines, created Apr 13) |
|
||||||
|
| `lib/peertube_scraper.py` | Legacy module | **ALIVE** — only mechanism for transcript ingestion |
|
||||||
|
| `lib/extractor.py` | Dead module | **ALIVE** — used by `cmd_run` CLI for batch processing |
|
||||||
|
|
||||||
|
### Additional findings
|
||||||
|
|
||||||
|
- **24 `.bak` files** found across `/opt/recon/` (untracked, manual pre-edit safety backups from Feb-Apr 2026). All originals preserved in git history.
|
||||||
|
- **File ownership**: All 21 `.py` files + `recon.py` correctly owned by zvx. No corrections needed.
|
||||||
|
- **No TODO/DEPRECATED comments** found in any lib/ file.
|
||||||
|
- **All imports in recon.py** confirmed used (no dead imports at module level).
|
||||||
|
- **PeerTube transcript ingestion** has no automatic mechanism since Phase 5c-1 removed `peertube_scanner_loop`. Ingestion is manual only (CLI or dashboard API endpoint).
|
||||||
|
|
||||||
|
## What Was Removed
|
||||||
|
|
||||||
|
### recon.py edits (-89 lines, +3 lines)
|
||||||
|
|
||||||
|
1. **Extract worker thread** removed from `cmd_service()`:
|
||||||
|
- `from lib.extractor import run_extraction` import
|
||||||
|
- `extract_workers` variable
|
||||||
|
- `'extract': 0` from totals dict
|
||||||
|
- Extract `threading.Thread(target=stage_loop, ...)` from thread list
|
||||||
|
- Extract workers from startup log message
|
||||||
|
|
||||||
|
2. **`cmd_crawl` function** deleted (65 lines) — CLI handler for `recon crawl`
|
||||||
|
|
||||||
|
3. **Crawl argparse subparser** deleted (15 lines) — `recon crawl` subcommand registration
|
||||||
|
|
||||||
|
4. **Docstring** updated to remove `crawl` from subcommand list
|
||||||
|
|
||||||
|
### Files deleted
|
||||||
|
|
||||||
|
| File | Lines | Reason |
|
||||||
|
|------|-------|--------|
|
||||||
|
| `lib/crawler.py` | 432 | Only referenced by deleted `cmd_crawl` CLI subcommand |
|
||||||
|
|
||||||
|
### .bak files deleted (24 files, untracked)
|
||||||
|
|
||||||
|
| File | Size |
|
||||||
|
|------|------|
|
||||||
|
| `recon.py.bak-pre-streamb` | 48K |
|
||||||
|
| `recon.py.bak-pre-ux` | 35K |
|
||||||
|
| `recon.py.bak-pre-crawler` | 35K |
|
||||||
|
| `recon.py.bak.202602171647` | 33K |
|
||||||
|
| `config.yaml.bak-pre-crawler` | 4K |
|
||||||
|
| `config.yaml.bak-pre-streamb` | 13K |
|
||||||
|
| `lib/api.py.bak` + 5 more api.py backups | 498K total |
|
||||||
|
| `lib/embedder.py.bak` | 15K |
|
||||||
|
| `lib/enricher.py.bak` | 17K |
|
||||||
|
| `lib/extractor.py.bak` | 18K |
|
||||||
|
| `lib/status.py.bak-pre-ux` | 10K |
|
||||||
|
| `lib/status.py.bak-pre-streamb` | 13K |
|
||||||
|
| `scripts/validate.py.bak` | 6K |
|
||||||
|
| `scripts/rebuild_qdrant.py.bak` | 6K |
|
||||||
|
| `static/js/dashboard.js.bak` | 11K |
|
||||||
|
| `static/js/peertube.js.bak.20260223` | 5K |
|
||||||
|
| `templates/search.html.bak` | 2K |
|
||||||
|
| `templates/knowledge/dashboard.html.bak` | 3K |
|
||||||
|
|
||||||
|
## What Was Kept (and why)
|
||||||
|
|
||||||
|
| Module | Lines | Why kept |
|
||||||
|
|--------|-------|----------|
|
||||||
|
| `lib/web_scraper.py` | 324 | `transcript_processor.py` imports `chunk_text()` |
|
||||||
|
| `lib/new_pipeline.py` | 1,637 | Active Stream B library management CLI (created Apr 13) |
|
||||||
|
| `lib/peertube_scraper.py` | 580 | Only way to ingest PeerTube transcripts |
|
||||||
|
| `lib/extractor.py` | 601 | Used by `cmd_run` CLI for batch PDF processing |
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
| Check | Result |
|
||||||
|
|-------|--------|
|
||||||
|
| Compile (recon.py) | OK |
|
||||||
|
| Import (recon module) | OK |
|
||||||
|
| Import (dispatcher, filing, processors) | OK |
|
||||||
|
| cmd_service assertions | extract worker absent, dispatch_loop present, filing_worker_loop present |
|
||||||
|
| Zero crawler references in .py files | Confirmed |
|
||||||
|
| Service restart | Clean, active |
|
||||||
|
| Thread count | 13 tasks (was 14 — extract removed) |
|
||||||
|
| Threads started | enrich, embed, dispatcher, filing, progress, dashboard, metrics |
|
||||||
|
| Extract thread | Absent (confirmed by logs: no `[extract] Stage started`) |
|
||||||
|
| Errors (60s window) | 0 |
|
||||||
|
| DB rows | catalogue=29,812, documents=29,812 (unchanged) |
|
||||||
|
| Dashboard | Responsive |
|
||||||
|
| Hopper | Empty |
|
||||||
|
|
||||||
|
## Commit
|
||||||
|
|
||||||
|
- **Commit:** `efae402` on `refactor` branch
|
||||||
|
- **Diff:** 2 files changed, 3 insertions(+), 521 deletions(-)
|
||||||
|
- **Pushed to:** `forge.echo6.co/matt/recon` (origin/refactor)
|
||||||
Loading…
Add table
Add a link
Reference in a new issue