diff --git a/phases/phase-6b-dashboard-bug-fix.md b/phases/phase-6b-dashboard-bug-fix.md new file mode 100644 index 0000000..deaee3b --- /dev/null +++ b/phases/phase-6b-dashboard-bug-fix.md @@ -0,0 +1,122 @@ +# Phase 6b: Dashboard "Untitled / WEB" Bug Fix + +## Bug Description + +The RECON dashboard's "Recently Completed" table showed every transcript as: +- **Title:** "Untitled" (instead of the real video title) +- **Type:** WEB badge (instead of a transcript indicator) + +The Sources table also tagged `stream.echo6.co` as WEB type. + +## Root Causes + +### Bug 1: Title = "Untitled" + +The Recently Completed SQL query read `documents.book_title`: +```sql +SELECT book_title, ... FROM documents WHERE status = 'complete' ... +``` + +`book_title` is populated by the PDF metadata voting pipeline (Gemini +extracts title/author from PDF content). Transcripts never go through +metadata voting, so `book_title` is always NULL for them. The Python-side +fallback `r['book_title'] or 'Untitled'` then kicked in. + +The real title was available in `catalogue.filename` all along. + +### Bug 2: Type = "WEB" + +The type was computed by a SQL CASE expression: +```sql +CASE WHEN path LIKE 'http%' THEN 'web' ELSE 'pdf' END as type +``` + +Since transcript `documents.path` holds the PeerTube watch URL +(`https://stream.echo6.co/w/`), it matched `http%` and was +classified as `web`. The JavaScript then rendered `WEB`. + +There was no `transcript` type — only `web` and `pdf` existed. + +## Fix Applied + +### api.py — Recently Completed query + +```sql +-- Before: +SELECT book_title, ... + CASE WHEN path LIKE 'http%' THEN 'web' ELSE 'pdf' END as type +FROM documents WHERE status = 'complete' ... + +-- After: +SELECT COALESCE(d.book_title, c.filename) as title, ... + CASE + WHEN c.source = 'stream.echo6.co' THEN 'transcript' + WHEN d.path LIKE 'http%' THEN 'web' + ELSE 'pdf' + END as type +FROM documents d +JOIN catalogue c ON d.hash = c.hash +WHERE d.status = 'complete' ... +``` + +Python mapping updated: `r['title'] or 'Untitled'` (was `r['book_title']`). + +### api.py — Sources query + +Same CASE expression fix applied to the sources aggregation query (line 885). + +### dashboard.js — Badge rendering + +Added transcript case to both badge-rendering locations (sources table +and recent completions table): +```javascript +var badge = r.type === 'transcript' + ? 'TRANSCRIPT' + : r.type === 'web' + ? 'WEB' + : 'PDF'; +``` + +### recon.css — Badge style + +```css +.badge-transcript { background: #3b1f5e; color: #c084fc; padding: 2px 8px; + border-radius: var(--radius); font-size: 11px; } +``` + +Purple badge to differentiate from blue (web) and green (pdf). + +## Files Changed + +| File | Change | +|------|--------| +| `lib/api.py` | COALESCE title fallback, transcript type branch (2 queries) | +| `static/js/dashboard.js` | Transcript badge rendering (2 locations) | +| `static/css/recon.css` | `.badge-transcript` class | + +## Verification + +### Before (all transcripts) +- Title: "Untitled" +- Type: WEB (blue badge) + +### After (sample from API response) +| Title | Type | Concepts | Vectors | +|-------|------|----------|---------| +| Pond Power | transcript | 3 | 3 | +| Prolonged Field Care Podcast 205: Sufentanil | transcript | 6 | 6 | +| Rate and Rhythm: Sinus Bradycardia and Sinus Tachycardia | transcript | 6 | 6 | +| American Persimmon - Identifying Male and Female Flowers | transcript | 2 | 2 | + +### Sources table +- `stream.echo6.co`: type=transcript, catalogued=19133, complete=19132 +- PDF sources (35 entries): all correctly typed as `pdf` +- Web sources: 0 (no web-scraped content in current data) + +### PDFs unaffected +PDF sources continue to display with correct `book_title` and PDF type badge. + +## Commit + +- **Commit:** `70b80cb` on `refactor` branch +- **Pushed to:** `forge.echo6.co/matt/recon` (origin/refactor)