mirror of
https://github.com/zvx-echo6/recon.git
synced 2026-05-20 14:44:54 +02:00
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for domain assignment. This unlocks assignment for ~10,120 items that had missing or legacy-only concept files on disk while Qdrant held the correct 18-domain taxonomy data. Changes: - domain_assigner.py: Replace _count_concept_domains (disk) with _count_domains_from_qdrant and _count_domains_from_qdrant_batch (Qdrant scroll queries). Add _get_qdrant_client helper. Remove pass 3 defensive re-run (Qdrant reads are consistent). Add no_concepts terminal status for zero-vector documents. - embedder.py: Post-embed hook passes existing qdrant client to compute_assignment, avoiding a second connection. - recon.py: Backfill creates one QdrantClient for the batch. SQL filter includes existing needs_reprocess items. Dry-run reports no_concepts as separate bucket. --reprocess-missing removes concept-dir deletion step (no longer reads from disk). - docs/domain-assignment.md: Algorithm references Qdrant, documents no_concepts status, removes pass 3 description. Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts, 0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
c04ccc5011
commit
3b37d96c4d
4 changed files with 186 additions and 135 deletions
|
|
@ -411,7 +411,7 @@ def embed_single(file_hash, db, config):
|
|||
from .domain_assigner import compute_assignment
|
||||
from .peertube_writer import push_category, extract_uuid
|
||||
from .recon_domains import DOMAIN_CATEGORY_MAP
|
||||
domain, status = compute_assignment(file_hash, db, config)
|
||||
domain, status = compute_assignment(file_hash, db, config, qdrant=qdrant)
|
||||
db.set_domain_assignment(file_hash, domain, status)
|
||||
if domain and status == 'assigned':
|
||||
cat_id = DOMAIN_CATEGORY_MAP[domain]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue