Files changed: failed_documents/README.md failed_documents/corrupt_pdfs.md failed_documents/deleted_peertube_videos.md failed_documents/drm_encrypted_ia_pdfs.md failed_documents/macos_resource_forks.md failed_documents/test_artifacts.md |
||
|---|---|---|
| .. | ||
| corrupt_pdfs.md | ||
| deleted_peertube_videos.md | ||
| drm_encrypted_ia_pdfs.md | ||
| macos_resource_forks.md | ||
| README.md | ||
| test_artifacts.md | ||
Failed Documents Cleanup Log
Date: 2026-04-14
Total rows purged: 56
Source: RECON pipeline documents table, status='failed'
All 56 entries failed during PDF extraction or transcript ingestion and never produced vectors, concepts, or usable text. They are permanently unrecoverable without manual intervention (re-acquisition, DRM removal, or file repair).
Category Breakdown
| Category | Count | File |
|---|---|---|
| DRM-encrypted Internet Archive PDFs | 18 | drm_encrypted_ia_pdfs.md |
| Corrupt/malformed PDFs | 13 | corrupt_pdfs.md |
macOS resource forks (._ files) |
22 | macos_resource_forks.md |
| Deleted PeerTube videos | 2 | deleted_peertube_videos.md |
| Test artifacts | 1 | test_artifacts.md |
| Total | 56 |
What Was Deleted
- 56 rows from
cataloguetable - 56 rows from
documentstable - Physical PDF files on
/mnt/library/(where they still existed) - Text directories under
/opt/recon/data/text/{hash}/ - Concept directories under
/opt/recon/data/concepts/{hash}/ - No Qdrant vectors existed (failed before embedding stage)
Regrab Candidates
The 18 DRM-encrypted PDFs are the only category worth re-acquiring. The Internet Archive identifiers are listed in drm_encrypted_ia_pdfs.md — these books can potentially be re-downloaded in non-DRM format using ia download <identifier> or borrowed via Open Library.