Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
# Domain Assignment — Algorithm & Operations Guide
## Overview
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
RECON's domain assignment feature maps each PeerTube video to one of 18 knowledge domains by analyzing the concept vectors stored in Qdrant. Assignments are pushed to PeerTube as category metadata via a custom plugin.
## Data Source
Domain counts are read from the `domain` payload field on concept vectors in Qdrant (`recon_knowledge_hybrid` collection on cortex:6333). Each concept vector has a `domain` string in its payload, set during enrichment and validated at embed time. This provides 100% coverage for all embedded documents with zero legacy domain residue.
Previously, domain counts were read from on-disk concept JSON files (`data/concepts/{hash}/window_*.json` ). This was replaced with Qdrant queries on 2026-04-28 because ~10,000 items had missing or legacy-only concept files on disk while Qdrant had the correct data.
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
## Algorithm
### Pass 1: Concept Domain Count (inline, per-document)
Runs automatically via post-embed hook when a video completes the pipeline, or in bulk via `--backfill` .
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
1. Query Qdrant for all points with `doc_hash` matching the document
2. Count `domain` payload occurrences, filtering to `VALID_DOMAINS` only
3. If zero concept vectors → `no_concepts` (terminal)
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
4. If single top domain → `assigned`
5. If tied → `tied_pass_1` (deferred to tiebreaker)
### Pass 2: Channel Tiebreaker (batch)
Runs via `assign-categories --tiebreaker-pass` .
For each `tied_pass_1` document:
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
1. Identify the tied domains from Qdrant
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
2. Look up the document's channel (`catalogue.category` )
3. **Mega-channel rule: ** If channel has >500 videos, skip tiebreaking → `tied_manual`
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
4. Query Qdrant for domain counts across all other videos in the same channel (single batch query with `MatchAny` filter)
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
5. Among the tied domains only, pick the one with the highest channel-wide concept count
6. If resolved → `tied_pass_2`
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
7. If still tied → `tied_manual` (alphabetical fallback assigned, flagged for review)
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
### Mega-Channel Rule
Channels with >500 videos (like the "Transcript" catch-all with ~9,200 videos) are not topically coherent. Scanning their concepts produces meaningless aggregate data. These go straight to `tied_manual` for dashboard review.
## Status Values
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
| Status | Meaning | Terminal? | Next Action |
|--------|---------|-----------|-------------|
| `assigned` | Clear winner from pass 1 | No | Push to PeerTube |
| `tied_pass_1` | Concept tie, awaiting tiebreaker | No | Run `--tiebreaker-pass` |
| `tied_pass_2` | Resolved by channel tiebreaker | No | Push to PeerTube |
| `tied_manual` | Needs human review | No | Review at `/peertube/review` |
| `no_concepts` | Zero concept vectors in Qdrant | **Yes ** | None — typically non-topical content (vlogs, giveaways, announcements) |
| `needs_reprocess` | Transient failure (Qdrant error) | No | Run `--reprocess-missing` |
| `manual_assigned` | Human override from dashboard | No | Already pushed |
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
**"Categorized" filter** = `{'assigned', 'tied_pass_2', 'manual_assigned'}`
## CLI Commands
```bash
cd /opt/recon && source venv/bin/activate
# Show current assignment status
python3 recon.py assign-categories
# Pass 1: backfill all unassigned complete stream documents
python3 recon.py assign-categories --backfill --dry-run
python3 recon.py assign-categories --backfill
# Pass 2: resolve ties via channel analysis
python3 recon.py assign-categories --tiebreaker-pass
# Push all assigned-but-unpushed categories to PeerTube API
python3 recon.py assign-categories --push-pending
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
# Re-queue items with transient failures for full re-processing
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
python3 recon.py assign-categories --reprocess-missing
# Limit processing count
python3 recon.py assign-categories --backfill --limit 100
```
## Dashboard Review
The review UI at `recon.echo6.co/peertube/review` shows only `tied_manual` items. Each row displays:
- Video title and channel
- Top concept domains with counts
- Dropdown to select the correct domain
- Assign button (pushes to PeerTube immediately)
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
Items with `no_concepts` or `needs_reprocess` status do NOT appear in the review UI.
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
## Pipeline Integration
New videos ingested via the PeerTube collector are automatically assigned a domain when they complete the embed stage. The post-embed hook in `embedder.py` :
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
1. Runs `compute_assignment()` (pass 1 only), reusing the embedder's existing Qdrant client
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
2. If clear winner: pushes category to PeerTube immediately
3. If tied: marks as `tied_pass_1` for the next tiebreaker batch run
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
4. If no concepts: marks as `no_concepts` (terminal)
5. On Qdrant error: logs warning and continues — does not block the pipeline
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
## Source Files
| File | Purpose |
|------|---------|
| `lib/recon_domains.py` | Domain↔Category ID mapping, VALID_DOMAINS |
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
| `lib/domain_assigner.py` | `compute_assignment()` + `run_tiebreaker_pass()` + Qdrant helpers |
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
| `lib/peertube_writer.py` | OAuth2 client, `push_category()` , `push_pending()` |
Switch domain assignment to Qdrant as source of truth
Replace on-disk concept file reads with Qdrant payload queries for
domain assignment. This unlocks assignment for ~10,120 items that had
missing or legacy-only concept files on disk while Qdrant held the
correct 18-domain taxonomy data.
Changes:
- domain_assigner.py: Replace _count_concept_domains (disk) with
_count_domains_from_qdrant and _count_domains_from_qdrant_batch
(Qdrant scroll queries). Add _get_qdrant_client helper. Remove
pass 3 defensive re-run (Qdrant reads are consistent). Add
no_concepts terminal status for zero-vector documents.
- embedder.py: Post-embed hook passes existing qdrant client to
compute_assignment, avoiding a second connection.
- recon.py: Backfill creates one QdrantClient for the batch. SQL
filter includes existing needs_reprocess items. Dry-run reports
no_concepts as separate bucket. --reprocess-missing removes
concept-dir deletion step (no longer reads from disk).
- docs/domain-assignment.md: Algorithm references Qdrant, documents
no_concepts status, removes pass 3 description.
Dry-run results: 20,453 assigned, 1,392 tied, 298 no_concepts,
0 needs_reprocess, 0 errors (previously 10,416 needs_reprocess).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 03:59:06 +00:00
| `lib/embedder.py` | Post-embed hook (passes qdrant client) |
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values,
CLI command reference, dashboard review guide
- migration-runbook.md: step-by-step deploy with pre-deploy backups,
8 STOP pause points for operator verification, staged push rollout,
quarantined --reprocess-missing procedure, 5 rollback procedures
- deploy-blast-radius.md: per-step risk reference with worst case,
detection signals, rollback procedures, and risk tiers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 00:06:49 +00:00
| `lib/status.py` | DB columns + helper methods |
| `lib/api.py` | Dashboard review routes |
| `recon.py` | CLI `assign-categories` command |