recon/lib/processors
Matt b741e217f6 fix: ZIM table extraction — pipe-delimited cells instead of concatenation
Pre-processes HTML tree before lxml .text_content() to prevent
element concatenation:
- <table> cells joined with ' | ' delimiter, rows with newlines
- <br> tags produce newlines
- <li> items get '- ' prefix and newline separation
- <dt>/<dd> definition list items get newline separation

Fixes ~868 mangled Qdrant points where table content was jammed
together (e.g. 'Freq51Primary1A==' instead of 'Freq51 | Primary | 1A==').

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-07 01:32:25 +00:00
..
__init__.py
pdf_processor.py Fix: Gemini "null" string bug in pdf_processor metadata voting 2026-04-15 23:30:59 +00:00
text_processor.py Phase 6f: text processor for .txt file ingestion 2026-04-15 22:39:31 +00:00
transcript_processor.py
zim_processor.py fix: ZIM table extraction — pipe-delimited cells instead of concatenation 2026-05-07 01:32:25 +00:00