auto: docs sync 2026-04-14T06:00:05+00:00
Files changed: failed_documents/README.md failed_documents/corrupt_pdfs.md failed_documents/deleted_peertube_videos.md failed_documents/drm_encrypted_ia_pdfs.md failed_documents/macos_resource_forks.md failed_documents/test_artifacts.md
This commit is contained in:
parent
fb7e8fbc3e
commit
1e65c3bbe8
6 changed files with 137 additions and 0 deletions
31
failed_documents/README.md
Normal file
31
failed_documents/README.md
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
# Failed Documents Cleanup Log
|
||||||
|
|
||||||
|
**Date:** 2026-04-14
|
||||||
|
**Total rows purged:** 56
|
||||||
|
**Source:** RECON pipeline `documents` table, `status='failed'`
|
||||||
|
|
||||||
|
All 56 entries failed during PDF extraction or transcript ingestion and never produced vectors, concepts, or usable text. They are permanently unrecoverable without manual intervention (re-acquisition, DRM removal, or file repair).
|
||||||
|
|
||||||
|
## Category Breakdown
|
||||||
|
|
||||||
|
| Category | Count | File |
|
||||||
|
|----------|-------|------|
|
||||||
|
| DRM-encrypted Internet Archive PDFs | 18 | [drm_encrypted_ia_pdfs.md](drm_encrypted_ia_pdfs.md) |
|
||||||
|
| Corrupt/malformed PDFs | 13 | [corrupt_pdfs.md](corrupt_pdfs.md) |
|
||||||
|
| macOS resource forks (`._` files) | 22 | [macos_resource_forks.md](macos_resource_forks.md) |
|
||||||
|
| Deleted PeerTube videos | 2 | [deleted_peertube_videos.md](deleted_peertube_videos.md) |
|
||||||
|
| Test artifacts | 1 | [test_artifacts.md](test_artifacts.md) |
|
||||||
|
| **Total** | **56** | |
|
||||||
|
|
||||||
|
## What Was Deleted
|
||||||
|
|
||||||
|
- 56 rows from `catalogue` table
|
||||||
|
- 56 rows from `documents` table
|
||||||
|
- Physical PDF files on `/mnt/library/` (where they still existed)
|
||||||
|
- Text directories under `/opt/recon/data/text/{hash}/`
|
||||||
|
- Concept directories under `/opt/recon/data/concepts/{hash}/`
|
||||||
|
- No Qdrant vectors existed (failed before embedding stage)
|
||||||
|
|
||||||
|
## Regrab Candidates
|
||||||
|
|
||||||
|
The 18 DRM-encrypted PDFs are the only category worth re-acquiring. The Internet Archive identifiers are listed in [drm_encrypted_ia_pdfs.md](drm_encrypted_ia_pdfs.md) — these books can potentially be re-downloaded in non-DRM format using `ia download <identifier>` or borrowed via Open Library.
|
||||||
27
failed_documents/corrupt_pdfs.md
Normal file
27
failed_documents/corrupt_pdfs.md
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
# Corrupt/Malformed PDFs
|
||||||
|
|
||||||
|
**Count:** 13
|
||||||
|
**Subcategories:**
|
||||||
|
- Truncated (EOF marker not found): 10
|
||||||
|
- Negative seek value: 2
|
||||||
|
- Missing /Root object: 1
|
||||||
|
|
||||||
|
**Failure reason:** These PDFs are structurally damaged — truncated during download, missing required PDF objects, or otherwise malformed. Both PyPDF2 and pdftotext/pdfinfo return 0 extractable pages. Manual repair is theoretically possible but not worth the effort for these titles.
|
||||||
|
|
||||||
|
## Entries
|
||||||
|
|
||||||
|
| # | Filename | Path | Hash | Size | Discovered | Error |
|
||||||
|
|---|----------|------|------|------|------------|-------|
|
||||||
|
| 1 | Depression Era Recipies.pdf | `/mnt/library/Survival-Companion-Library/Companion Survival Resource Library/Food, Nutrition & Recipes/Recipes/Depression Era Recipies.pdf` | `b9bcabfe1d0d9aac` | 11,483 | 2026-02-16 00:22:23 | EOF marker not found |
|
||||||
|
| 2 | EMP-1.pdf | `/mnt/library/Survival-Companion-Library/EMP/EMP-1.pdf` | `ebdfc16840b35ad6` | 1,589 | 2026-04-13 01:06:51 | EOF marker not found |
|
||||||
|
| 3 | Food Storage and Disaster Calendar.pdf | `/mnt/library/Survival-Companion-Library/Food Storage/Food Storage and Disaster Calendar.pdf` | `faf8212fea01d991` | 96,014 | 2026-02-16 00:22:23 | EOF marker not found |
|
||||||
|
| 4 | Food_storage_guide.pdf | `/mnt/library/Survival-Companion-Library/Food Storage/Food_storage_guide.pdf` | `48905846ef1f395f` | 1,525,293 | 2026-02-16 00:22:23 | EOF marker not found |
|
||||||
|
| 5 | Homemade C4 - A Recipe For Survival - Ragnar Benson.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/Homemade C4 - A Recipe For Survival - Ragnar Benson.pdf` | `68f01f006e5c05b2` | 8,146,967 | 2026-02-16 00:22:24 | '/Root' |
|
||||||
|
| 6 | Homemade Grenade Launchers - Ragnar Benson.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/Homemade Grenade Launchers - Ragnar Benson.pdf` | `3d82ec1f8e0cefae` | 6,172,870 | 2026-02-16 00:22:24 | negative seek value -1 |
|
||||||
|
| 7 | PPS_complete.pdf | `/mnt/library/Survival-Companion-Library/Medicine - Health - Hygiene - Sanitation/PPS_complete.pdf` | `30a694dbee39f98b` | 17,986 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 8 | Survivalist #01 - Premier Issue.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/American Survival Guide/Survivalist #01 - Premier Issue.pdf` | `c691de4341ac4ad0` | 33,372,463 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 9 | Survivalist #03 - Self-Reliance.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/American Survival Guide/Survivalist #03 - Self-Reliance.pdf` | `318b6a9749672666` | 53,833,074 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 10 | Survivalist #05 - Societal Collapse.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/American Survival Guide/Survivalist #05 - Societal Collapse.pdf` | `0c6505dcbaf7de70` | 55,616,920 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 11 | Survivalist #07 – When the Lights go Out!.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/American Survival Guide/Survivalist #07 – When the Lights go Out!.pdf` | `4b69384d20e64d31` | 75,493,362 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 12 | Survivalist #11 - Real Self Defense.pdf | `/mnt/library/Survival-Companion-Library/Books-Magazines/American Survival Guide/Survivalist #11 - Real Self Defense.pdf` | `bdcb548d8bfc99e0` | 86,043,525 | 2026-02-16 00:22:24 | EOF marker not found |
|
||||||
|
| 13 | fm3-22-68_2006.pdf | `/mnt/library/Survival-Companion-Library/Companion Survival Resource Library/Army Field Manuals/fm3-22-68_2006.pdf` | `7c6695360d1f03c1` | 8,126,464 | 2026-04-13 01:06:51 | negative seek value -1 |
|
||||||
11
failed_documents/deleted_peertube_videos.md
Normal file
11
failed_documents/deleted_peertube_videos.md
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
# Deleted PeerTube Videos
|
||||||
|
|
||||||
|
**Count:** 2
|
||||||
|
**Failure reason:** These PeerTube videos were deleted from the instance after RECON catalogued them but before enrichment/embedding completed. The transcript text was extracted but the pipeline later marked them as failed when it could no longer reach the source URL.
|
||||||
|
|
||||||
|
## Entries
|
||||||
|
|
||||||
|
| # | Title | Video UUID | Channel | Hash | Discovered | URL |
|
||||||
|
|---|-------|-----------|---------|------|------------|-----|
|
||||||
|
| 1 | I brought home a farm animal we’ve never had before | VLOG | `13761144-1bc7-46c4-895f-0d450b91007f` | Roots and Refuge Farm | `1cdc40b72b95db78` | 2026-04-05 15:41:48 | `https://stream.echo6.co/w/13761144-1bc7-46c4-895f-0d450b91007f` |
|
||||||
|
| 2 | Christmas Giveaway Day 3: CCNA, CCNA Cyber Ops, Amazon eGift and more! | `f67b4a3a-02f0-4b0e-a3a4-2f18b8e20833` | David Bombal | `0dc816878cae5a9c` | 2026-04-06 00:46:33 | `https://stream.echo6.co/w/f67b4a3a-02f0-4b0e-a3a4-2f18b8e20833` |
|
||||||
29
failed_documents/drm_encrypted_ia_pdfs.md
Normal file
29
failed_documents/drm_encrypted_ia_pdfs.md
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
# DRM-Encrypted Internet Archive PDFs
|
||||||
|
|
||||||
|
**Count:** 18
|
||||||
|
**Failure reason:** These are Adobe DRM (ACSM) encrypted PDFs downloaded from Internet Archive's lending library. The RECON pipeline's PyPDF2/pdftotext extractors cannot decrypt them — they require Adobe Digital Editions or equivalent DRM removal tooling.
|
||||||
|
|
||||||
|
**Regrab note:** Most of these titles are available on Internet Archive. The `ia_identifier` column can be used with `ia download <identifier>` to re-download, or the books can be borrowed via Open Library in a non-DRM format.
|
||||||
|
|
||||||
|
## Entries
|
||||||
|
|
||||||
|
| # | Filename | IA Identifier | Path | Hash | Size | Discovered | Error |
|
||||||
|
|---|----------|---------------|------|------|------|------------|-------|
|
||||||
|
| 1 | Root-Cellaring.pdf | `Root-Cellaring` | `/mnt/library/Root-Cellaring.pdf` | `12be9cab19173c9f` | 14,056,345 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 2 | Storeys-Guide-Raising-Beef-Cattle.pdf | `Storeys-Guide-Raising-Beef-Cattle` | `/mnt/library/Storeys-Guide-Raising-Beef-Cattle.pdf` | `6f5127e5e861cd7f` | 23,168,056 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 3 | Storeys-Guide-Raising-Pigs.pdf | `Storeys-Guide-Raising-Pigs` | `/mnt/library/Storeys-Guide-Raising-Pigs.pdf` | `d58acf724f6e75c0` | 18,779,175 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 4 | Storeys-Guide-Raising-Rabbits.pdf | `Storeys-Guide-Raising-Rabbits` | `/mnt/library/Storeys-Guide-Raising-Rabbits.pdf` | `cc83e27c348205c4` | 12,058,415 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 5 | Storeys-Guide-Raising-Sheep.pdf | `Storeys-Guide-Raising-Sheep` | `/mnt/library/Storeys-Guide-Raising-Sheep.pdf` | `a2740665539480f0` | 22,925,624 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 6 | The-Complete-Medicinal-Herbal.pdf | `The-Complete-Medicinal-Herbal` | `/mnt/library/The-Complete-Medicinal-Herbal.pdf` | `166024bed0da3899` | 33,058,636 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 7 | barefootarchitec00leng_encrypted.pdf | `barefootarchitec00leng` | `/mnt/library/Shelter-and-Construction/barefootarchitec00leng_encrypted.pdf` | `47929d6547c949bc` | 27,130,488 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 8 | beginnersguideto0000shol_o6v9_encrypted.pdf | `beginnersguideto0000shol_o6v9` | `/mnt/library/Acquired/Food/beginnersguideto0000shol_o6v9_encrypted.pdf` | `d18fa3ce98f8d572` | 8,203,078 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 9 | bestloveddepress0000unse_encrypted.pdf | `bestloveddepress0000unse` | `/mnt/library/Acquired/Food/bestloveddepress0000unse_encrypted.pdf` | `d54acd11ed6a2faa` | 5,503,960 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 10 | bushcraftoutdoor0000mors_encrypted.pdf | `bushcraftoutdoor0000mors` | `/mnt/library/Wilderness-Skills/bushcraftoutdoor0000mors_encrypted.pdf` | `7bcee33c6dfca5e9` | 13,798,508 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 11 | completemedicina00odyp_encrypted.pdf | `completemedicina00odyp` | `/mnt/library/Acquired/Medical/Herbalism/completemedicina00odyp_encrypted.pdf` | `be52efb423c699b8` | 25,304,735 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 12 | hamradiofordummi0000silv_encrypted.pdf | `hamradiofordummi0000silv` | `/mnt/library/Acquired/Skills/hamradiofordummi0000silv_encrypted.pdf` | `127ad91c8035e3a2` | 21,170,450 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 13 | howtostayalivein0000angi_encrypted.pdf | `howtostayalivein0000angi` | `/mnt/library/Wilderness-Skills/howtostayalivein0000angi_encrypted.pdf` | `99dbc9394ef38b9b` | 12,855,750 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 14 | justincasehowtob0000harr_encrypted.pdf | `justincasehowtob0000harr` | `/mnt/library/Scenario-Playbooks/justincasehowtob0000harr_encrypted.pdf` | `41e8a2158b53f604` | 16,135,469 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 15 | livingreadypocke0000hubb_encrypted.pdf | `livingreadypocke0000hubb` | `/mnt/library/Acquired/Skills/livingreadypocke0000hubb_encrypted.pdf` | `4224c213bf549716` | 6,689,915 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 16 | multitudewardemo00hard_encrypted.pdf | `multitudewardemo00hard` | `/mnt/library/Acquired/Scenario/multitudewardemo00hard_encrypted.pdf` | `2807e76d8393e018` | 37,795,358 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
|
| 17 | seedtoseedseedsa0000ashw_encrypted.pdf | `seedtoseedseedsa0000ashw` | `/mnt/library/Agriculture-and-Livestock/seedtoseedseedsa0000ashw_encrypted.pdf` | `75c393d852a75f2d` | 20,084,059 | 2026-03-19 01:12:29 | encryption handler |
|
||||||
|
| 18 | teamingwithmicro0000lowe_encrypted.pdf | `teamingwithmicro0000lowe` | `/mnt/library/Acquired/Permaculture/teamingwithmicro0000lowe_encrypted.pdf` | `9e4d5b170276627c` | 14,145,070 | 2026-04-13 01:06:51 | file not found (moved) |
|
||||||
29
failed_documents/macos_resource_forks.md
Normal file
29
failed_documents/macos_resource_forks.md
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
# macOS Resource Fork Files
|
||||||
|
|
||||||
|
**Count:** 22
|
||||||
|
**Failure reason:** These are macOS `._` resource fork / extended attribute sidecar files, not real PDFs. They are 4,096 bytes each (one filesystem block) and contain Apple-specific metadata. The RECON scanner picked them up because they end in `.pdf` but they have no extractable content.
|
||||||
|
|
||||||
|
## Paths
|
||||||
|
|
||||||
|
1. `/mnt/library/Survival-Companion-Library/Books-Magazines/._Life after Doomsday.pdf` (`b333d5d5e0796c3f`)
|
||||||
|
2. `/mnt/library/Survival-Companion-Library/Books-Magazines/._Survivalist #09 - Urban Survival.pdf` (`5e9bbf4c04347ac4`)
|
||||||
|
3. `/mnt/library/Survival-Companion-Library/Books-Magazines/._The Survival Medicine Handbook_ - Alton, Joseph.pdf` (`bd5519cb0db9e1be`)
|
||||||
|
4. `/mnt/library/Survival-Companion-Library/Books-Magazines/._The Survival handbook.pdf` (`80ef6bcc79982214`)
|
||||||
|
5. `/mnt/library/Survival-Companion-Library/Firearms - Defense/._Basic_Manual_On_Knife_Throwing_2003.pdf` (`7725b720e8f1799d`)
|
||||||
|
6. `/mnt/library/Survival-Companion-Library/Firearms - Defense/._Home And Family Security System.pdf` (`951327b09f27d881`)
|
||||||
|
7. `/mnt/library/Survival-Companion-Library/Firearms - Defense/._survival battery.pdf` (`9686c66a7c17d601`)
|
||||||
|
8. `/mnt/library/Survival-Companion-Library/Food Storage/._3monthfoodsupply.pdf` (`58f4ed9073d2e6d0`)
|
||||||
|
9. `/mnt/library/Survival-Companion-Library/Food Storage/._3monthsupplyoffoodschedule.pdf` (`c8c88f3cdd8dc88c`)
|
||||||
|
10. `/mnt/library/Survival-Companion-Library/Food Storage/._5 dollar a week food storage plan - Unknown.pdf` (`14c4c65f03e45128`)
|
||||||
|
11. `/mnt/library/Survival-Companion-Library/Food Storage/._Food+Prepping+Checklist.pdf` (`45222452dddcb0d1`)
|
||||||
|
12. `/mnt/library/Survival-Companion-Library/Food Storage/._canning.pdf` (`4e85ec1524f6ebe8`)
|
||||||
|
13. `/mnt/library/Survival-Companion-Library/General Survival/EXTREME FAMILY SURVIVAL/._Bonus-Riot_Safety_for_patriots.pdf` (`2cda0807321d0a8d`)
|
||||||
|
14. `/mnt/library/Survival-Companion-Library/General Survival/Family Survival Course/._survive_any_disaster_v4.pdf` (`492643d20f610ff3`)
|
||||||
|
15. `/mnt/library/Survival-Companion-Library/General Survival/Massive Download 2/._GunFlash.pdf` (`a2224706e1567229`)
|
||||||
|
16. `/mnt/library/Survival-Companion-Library/General Survival/Prepping for Pennies/._BONUS 3 - What to Stockpile.pdf` (`6331b36047b611a8`)
|
||||||
|
17. `/mnt/library/Survival-Companion-Library/General Survival/SurvivalSpin Stuff/._Camping+Supplies+Checklist.pdf` (`3413af6b7af47578`)
|
||||||
|
18. `/mnt/library/Survival-Companion-Library/Medicine - Health - Hygiene - Sanitation/._FINAL COPY PDF DOOMSDAY BOOK OF MEDICINE.pdf` (`7f5a1c8b840c8229`)
|
||||||
|
19. `/mnt/library/Survival-Companion-Library/Medicine - Health - Hygiene - Sanitation/._First Aid FM 4-25.pdf` (`e5583ff012cc17ba`)
|
||||||
|
20. `/mnt/library/Survival-Companion-Library/Military guides - Manuals/._ar350-30_Survival_Evasion_Resistance_Escape.pdf` (`2536226e1dcdab11`)
|
||||||
|
21. `/mnt/library/Survival-Companion-Library/Survival Uploads/._NATO-emergency-war-surgery.pdf` (`809748c7dafb7647`)
|
||||||
|
22. `/mnt/library/Survival-Companion-Library/Survival Uploads/Bugging Out/._BugoutBag.pdf` (`c4fae8882cdb1441`)
|
||||||
10
failed_documents/test_artifacts.md
Normal file
10
failed_documents/test_artifacts.md
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
# Test Artifacts
|
||||||
|
|
||||||
|
**Count:** 1
|
||||||
|
**Failure reason:** This was a CLI test file created during RECON development and subsequently deleted from the library. The catalogue/documents rows were never cleaned up.
|
||||||
|
|
||||||
|
## Entry
|
||||||
|
|
||||||
|
| Filename | Path | Hash | Size | Discovered | Error |
|
||||||
|
|----------|------|------|------|------------|-------|
|
||||||
|
| recon-test-cli.pdf | `/mnt/library/Technical/recon-test-cli.pdf` | `f95452d04916154d` | 381 | 2026-04-13 01:06:51 | File not found: /mnt/library/Technical/recon-test-cli.pdf |
|
||||||
Loading…
Add table
Add a link
Reference in a new issue