# Failed Documents Cleanup Log **Date:** 2026-04-14 **Total rows purged:** 56 **Source:** RECON pipeline `documents` table, `status='failed'` All 56 entries failed during PDF extraction or transcript ingestion and never produced vectors, concepts, or usable text. They are permanently unrecoverable without manual intervention (re-acquisition, DRM removal, or file repair). ## Category Breakdown | Category | Count | File | |----------|-------|------| | DRM-encrypted Internet Archive PDFs | 18 | [drm_encrypted_ia_pdfs.md](drm_encrypted_ia_pdfs.md) | | Corrupt/malformed PDFs | 13 | [corrupt_pdfs.md](corrupt_pdfs.md) | | macOS resource forks (`._` files) | 22 | [macos_resource_forks.md](macos_resource_forks.md) | | Deleted PeerTube videos | 2 | [deleted_peertube_videos.md](deleted_peertube_videos.md) | | Test artifacts | 1 | [test_artifacts.md](test_artifacts.md) | | **Total** | **56** | | ## What Was Deleted - 56 rows from `catalogue` table - 56 rows from `documents` table - Physical PDF files on `/mnt/library/` (where they still existed) - Text directories under `/opt/recon/data/text/{hash}/` - Concept directories under `/opt/recon/data/concepts/{hash}/` - No Qdrant vectors existed (failed before embedding stage) ## Regrab Candidates The 18 DRM-encrypted PDFs are the only category worth re-acquiring. The Internet Archive identifiers are listed in [drm_encrypted_ia_pdfs.md](drm_encrypted_ia_pdfs.md) — these books can potentially be re-downloaded in non-DRM format using `ia download ` or borrowed via Open Library.