echo6-docs/failed_documents
echo6-autocommit 1e65c3bbe8 auto: docs sync 2026-04-14T06:00:05+00:00
Files changed: failed_documents/README.md failed_documents/corrupt_pdfs.md failed_documents/deleted_peertube_videos.md failed_documents/drm_encrypted_ia_pdfs.md failed_documents/macos_resource_forks.md failed_documents/test_artifacts.md
2026-04-14 06:00:05 +00:00
..
corrupt_pdfs.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00
deleted_peertube_videos.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00
drm_encrypted_ia_pdfs.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00
macos_resource_forks.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00
README.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00
test_artifacts.md auto: docs sync 2026-04-14T06:00:05+00:00 2026-04-14 06:00:05 +00:00

Failed Documents Cleanup Log

Date: 2026-04-14 Total rows purged: 56 Source: RECON pipeline documents table, status='failed'

All 56 entries failed during PDF extraction or transcript ingestion and never produced vectors, concepts, or usable text. They are permanently unrecoverable without manual intervention (re-acquisition, DRM removal, or file repair).

Category Breakdown

Category Count File
DRM-encrypted Internet Archive PDFs 18 drm_encrypted_ia_pdfs.md
Corrupt/malformed PDFs 13 corrupt_pdfs.md
macOS resource forks (._ files) 22 macos_resource_forks.md
Deleted PeerTube videos 2 deleted_peertube_videos.md
Test artifacts 1 test_artifacts.md
Total 56

What Was Deleted

  • 56 rows from catalogue table
  • 56 rows from documents table
  • Physical PDF files on /mnt/library/ (where they still existed)
  • Text directories under /opt/recon/data/text/{hash}/
  • Concept directories under /opt/recon/data/concepts/{hash}/
  • No Qdrant vectors existed (failed before embedding stage)

Regrab Candidates

The 18 DRM-encrypted PDFs are the only category worth re-acquiring. The Internet Archive identifiers are listed in drm_encrypted_ia_pdfs.md — these books can potentially be re-downloaded in non-DRM format using ia download <identifier> or borrowed via Open Library.