central/tests/test_geocoder_enricher.py

65 lines
2.3 KiB
Python
Raw Permalink Normal View History

feat(3-J): enrichment framework + GeocoderEnricher + NoOpBackend + FIRMS pilot First of three PRs for v0.5.0 (J: framework; K: real geocoder backends + doc revisions; L: operator events tab + per-adapter render + events-map fix). Design pivot: the Phase 2 "no enrichment, upstream verbatim" reading of Matt's principle is reframed — consumers can't do follow-up lookups, they only see what's on the wire, so whatever Central doesn't enrich is effectively missing downstream. Enrichment is now expected. The producer-doc §2/§10.1 rewrite lands in PR K; this PR builds the framework PR K documents. New package src/central/enrichment/: - base.py Enricher Protocol (name + async enrich(location) -> dict). - geocoder.py GeocoderEnricher + GeocoderBackend Protocol + the locked GEOCODER_FIELDS set (name, city, county, state, country, postal_code, timezone, landclass, elevation_m) + all_null_bundle(). - cache.py EnrichmentCache — stdlib sqlite3 off the event loop via asyncio.to_thread (no async-sqlite dep). Keyed on (enricher_name, lat_4dp, lon_4dp); per-enricher TTL (24h default); fresh connection per op (sqlite3 isn't thread-safe to share). Cache even all-null; never cache backend failures. - backends/no_op.py NoOpBackend — all-null bundle, the PR J default. Provenance: enrichment results land under event.data["_enriched"][<name>]; everything else in data stays upstream verbatim. Wiring: - adapter.py enrichment_locations: list[tuple[str,str]] = [] class attr. Empty (default) = publish as-is, no enrichment. - config_models.py EnrichmentConfig (enricher_class, backend_class, backend_settings, cache_ttl_s). Read once at startup. - supervisor.py build_enrichers() + apply_enrichment(); enrichment runs after dedup, before wrap_event, in the poll loop. Class-name registries for enricher/backend resolution (PR K extends). - firms.py enrichment_locations = [("latitude","longitude")] — pilot. Enrichment config is read once at supervisor startup; hot-reload is out of scope for PR J (noted in EnrichmentConfig + build_enrichers docstrings). Tests (16 new): - test_enrichment_framework.py (9): parent-dir/table init, cache miss->hit, TTL expiry, 4dp rounding, nearby-coord collapse, concurrent-set single-row, backend-failure all-null-not-cached (retries), success cached (one backend call), all-null cached. - test_geocoder_enricher.py (5): NoOp all-null, field-set == GEOCODER_FIELDS, null-coords short-circuit (no backend call), name=="geocoder", sequential same-coords single backend call. - test_firms.py (+2): enrichment_locations declared + paths resolve to floats in a real event (structural, not literal); event through supervisor apply_enrichment emerges with data._enriched.geocoder == all-null bundle. Verification: full pytest 495 passed (was 479; +16). grep for subject_for_event/_ADAPTER_REGISTRY clean. Module imports cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 04:39:49 +00:00
"""Tests for GeocoderEnricher with the default NoOpBackend."""
from typing import Any
import pytest
from central.enrichment.backends.no_op import NoOpBackend
from central.enrichment.cache import EnrichmentCache
from central.enrichment.geocoder import (
GEOCODER_FIELDS,
GeocoderEnricher,
all_null_bundle,
)
@pytest.mark.asyncio
async def test_noop_backend_returns_all_null_bundle():
enricher = GeocoderEnricher(NoOpBackend())
result = await enricher.enrich({"lat": 45.0, "lon": -116.0})
assert result == all_null_bundle()
assert all(v is None for v in result.values())
@pytest.mark.asyncio
async def test_field_set_matches_locked_protocol():
"""Every field in the locked GEOCODER_FIELDS set is present (all None for
NoOpBackend), and no extra keys leak through bidirectional equality."""
enricher = GeocoderEnricher(NoOpBackend())
result = await enricher.enrich({"lat": 1.0, "lon": 2.0})
assert set(result.keys()) == set(GEOCODER_FIELDS)
@pytest.mark.asyncio
async def test_missing_coords_returns_all_null_without_backend_call():
class _Tripwire:
async def reverse(self, lat: float, lon: float) -> dict[str, Any]:
raise AssertionError("backend must not be called for null coords")
enricher = GeocoderEnricher(_Tripwire())
assert await enricher.enrich({"lat": None, "lon": None}) == all_null_bundle() # type: ignore[dict-item]
assert await enricher.enrich({}) == all_null_bundle()
@pytest.mark.asyncio
async def test_enricher_name_is_geocoder():
"""The name keys the result under event.data['_enriched'][name]."""
assert GeocoderEnricher(NoOpBackend()).name == "geocoder"
@pytest.mark.asyncio
async def test_sequential_calls_same_coords_hit_cache(tmp_path):
class _CountingNoOp:
def __init__(self) -> None:
self.calls = 0
async def reverse(self, lat: float, lon: float) -> dict[str, Any]:
self.calls += 1
return all_null_bundle()
cache = EnrichmentCache(tmp_path / "c.db", ttl_s=3600)
backend = _CountingNoOp()
enricher = GeocoderEnricher(backend, cache=cache)
for _ in range(5):
await enricher.enrich({"lat": 33.5, "lon": -111.9})
assert backend.calls == 1, "repeated identical coords must collapse to one backend call"