central/tests/test_producer_doc.py

263 lines
10 KiB
Python
Raw Permalink Normal View History

docs(2-I): producer integration spec — docs/PRODUCER-INTEGRATION.md The producer-side contract for adapter authors, mirroring PR H's consumer spec. Self-contained — readers should not need to grep the codebase to understand what a new SourceAdapter subclass must implement. Bakes in the Phase 2 design principle ("Central takes it all and gives it all. It's up to the pipe to do with it what it will.") so future authors reject enrichment / silent-drop / opinionated-translation proposals on sight. The previously-proposed Phase 3 NWIS metadata-enrichment ticket is called out by name as an example of what gets rejected. 12-section outline locked with PM: design principle, quick start (clone swpc_kindex), SourceAdapter base class, settings, subject namespace, dedup keys, StreamEntry registry, removal/fall-off, anti-patterns, preview hook, acceptance gate. Sibling test (tests/test_producer_doc.py) mirrors test_consumer_doc.py discipline: - bidirectional == between SourceAdapter API and §4 method coverage - preview_for_settings contract verbatim against live docstring - top-level domain enumeration vs central.streams.STREAMS prefixes - §8 STREAMS snippet vs central.streams.STREAMS - anti-patterns adapter-name examples vs discover_adapters() No hardcoded stream / adapter / domain lists anywhere in the test — every expected value derives from central.streams, central.adapter_discovery, or central.adapter at runtime. Honest about the pre-existing `:` vs `|` dedup-key separator inconsistency (swpc_alerts and swpc_protons use `|`; everyone else uses `:`). Recommends `:` for new adapters without forcing a rename PR on the SWPC pair (separators are persisted in cursors.db rows). Acceptance bars: (a) grep -rn 'subject_for_event\|_ADAPTER_REGISTRY' src tests → empty (b) bidirectional override-method coverage asserted in test (c) tests/test_producer_doc.py → 6/6 pass (d) full pytest suite → 469 pass (was 463 pre-PR; +6 new) (e) doc length: 823 lines (within 500–1200 envelope) (f) code fences balanced; JSON/Python blocks parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:17:48 +00:00
"""Consistency tests for docs/PRODUCER-INTEGRATION.md.
The doc is the producer-side contract what an adapter author implements and
the conventions Central enforces around it. These tests catch drift between
the doc and the live code:
- Every overridable SourceAdapter method documented in §4 must exist on
central.adapter.SourceAdapter and vice versa.
- The preview_for_settings contract quoted in §11.1 must come from the
actual SourceAdapter.preview_for_settings docstring.
- The set of top-level domain tokens documented in §6.1 must equal the set
derived from central.streams.STREAMS subject_filter prefixes.
- The verbatim STREAMS snippet quoted in §8 must match the live registry.
Per the doc's own §10.4, NO hardcoded stream / adapter list literals: every
expected value derives from central.streams, central.adapter, or
central.adapter_discovery at runtime.
"""
import inspect
import re
from pathlib import Path
from central.adapter import SourceAdapter
from central.adapter_discovery import discover_adapters
feat(3-K): real geocoder backends + producer-doc reframe + consumer-doc enrichment Second of three PRs for v0.5.0 (J shipped the framework; this fills in real backends + documents the reframed design principle in-tree; L is the events tab + map fix, then tag). Backends (all satisfy GeocoderBackend; never raise, all-null on any failure): - NaviBackend — composed Navi /api/reverse/<lat>/<lon> (name/address + timezone + landclass + elevation in one call). Near-passthrough: response already matches the canonical 9-field shape. Best-effort warmup ping (Boise) on construction when a loop is running; config `headers` slot for a future Authorization: Bearer (config-only, no code change). Default base_url http://192.168.1.130:8440. - PhotonBackend — raw Photon /reverse?lat&lon&limit=1 (name/address only). Maps features[0].properties; postal_code <- postcode; timezone/landclass/ elevation_m null (Navi-composed-endpoint extras). - NominatimBackend — OSM Nominatim /reverse?format=jsonv2 (name/address only). Configurable rate limit (default 1/sec; 0 disables for self-hosted) + required User-Agent. Maps the address block; landclass/elevation_m/timezone null. Registered all three in supervisor _BACKEND_REGISTRY (resolved by EnrichmentConfig backend_class name). Docs — design pivot now in-tree: - PRODUCER §2 reframed: the verbatim Matt quote stays; the translation inverts. Central is the consumer's only data plane (consumers can't do follow-up lookups), so enrich deliberately and centrally, namespaced under _enriched, failing to null. "No enrichment" is gone. - PRODUCER §10.1 inverted: enrichment is expected; the anti-pattern is doing it OUTSIDE the framework (inline in poll(), bypassing cache + _enriched namespacing + the never-raise safety net). - PRODUCER new §13 Enrichment contract: Enricher / GeocoderEnricher / GeocoderBackend Protocols, NoOpBackend default, sqlite cache + TTL + cache-all-null + don't-cache-on-raise semantics, _enriched.<name> provenance, per-field coverage matrix (cross-checked against GEOCODER_FIELDS), and the landclass antimeridian known wrinkle. - CONSUMER FIRMS section: documents the data._enriched.geocoder bundle (9 fields), per-region coverage (US-full, non-US timezone+elevation), and the antimeridian landclass caveat. Tests: - test_navi/photon/nominatim_backend.py — happy-path field mapping, null handling, extra-key drop, network/timeout/non-200/malformed -> all-null (never raises), Nominatim rate-limit (disabled + spacing) + User-Agent. Env-gated live Navi smoke (NAVI_INTEGRATION_TEST=1; skipped by default — the 192.168.1.130 endpoint isn't reachable from CT104's segment). - test_producer_doc.py — +4: §2 verbatim quote present, §10.1 subsection exists, §13 names all four protocol types, §13 coverage matrix == GEOCODER_FIELDS (derived from code, not hardcoded). Verification: full pytest 525 passed, 1 skipped (was 495; +30 backend + 4 doc tests, -1 the env-gated skip). grep subject_for_event/_ADAPTER_REGISTRY clean. All three backends import + resolve via the registry. Flagged for later (NOT done here): adapters besides FIRMS that should declare enrichment_locations (nwis, eonet, gdacs, usgs_quake, wfigs_*) — that's PR L scope alongside the events tab. See PR description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 16:10:44 +00:00
from central.enrichment.geocoder import GEOCODER_FIELDS
docs(2-I): producer integration spec — docs/PRODUCER-INTEGRATION.md The producer-side contract for adapter authors, mirroring PR H's consumer spec. Self-contained — readers should not need to grep the codebase to understand what a new SourceAdapter subclass must implement. Bakes in the Phase 2 design principle ("Central takes it all and gives it all. It's up to the pipe to do with it what it will.") so future authors reject enrichment / silent-drop / opinionated-translation proposals on sight. The previously-proposed Phase 3 NWIS metadata-enrichment ticket is called out by name as an example of what gets rejected. 12-section outline locked with PM: design principle, quick start (clone swpc_kindex), SourceAdapter base class, settings, subject namespace, dedup keys, StreamEntry registry, removal/fall-off, anti-patterns, preview hook, acceptance gate. Sibling test (tests/test_producer_doc.py) mirrors test_consumer_doc.py discipline: - bidirectional == between SourceAdapter API and §4 method coverage - preview_for_settings contract verbatim against live docstring - top-level domain enumeration vs central.streams.STREAMS prefixes - §8 STREAMS snippet vs central.streams.STREAMS - anti-patterns adapter-name examples vs discover_adapters() No hardcoded stream / adapter / domain lists anywhere in the test — every expected value derives from central.streams, central.adapter_discovery, or central.adapter at runtime. Honest about the pre-existing `:` vs `|` dedup-key separator inconsistency (swpc_alerts and swpc_protons use `|`; everyone else uses `:`). Recommends `:` for new adapters without forcing a rename PR on the SWPC pair (separators are persisted in cursors.db rows). Acceptance bars: (a) grep -rn 'subject_for_event\|_ADAPTER_REGISTRY' src tests → empty (b) bidirectional override-method coverage asserted in test (c) tests/test_producer_doc.py → 6/6 pass (d) full pytest suite → 469 pass (was 463 pre-PR; +6 new) (e) doc length: 823 lines (within 500–1200 envelope) (f) code fences balanced; JSON/Python blocks parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:17:48 +00:00
from central.streams import STREAMS
feat(3-K): real geocoder backends + producer-doc reframe + consumer-doc enrichment Second of three PRs for v0.5.0 (J shipped the framework; this fills in real backends + documents the reframed design principle in-tree; L is the events tab + map fix, then tag). Backends (all satisfy GeocoderBackend; never raise, all-null on any failure): - NaviBackend — composed Navi /api/reverse/<lat>/<lon> (name/address + timezone + landclass + elevation in one call). Near-passthrough: response already matches the canonical 9-field shape. Best-effort warmup ping (Boise) on construction when a loop is running; config `headers` slot for a future Authorization: Bearer (config-only, no code change). Default base_url http://192.168.1.130:8440. - PhotonBackend — raw Photon /reverse?lat&lon&limit=1 (name/address only). Maps features[0].properties; postal_code <- postcode; timezone/landclass/ elevation_m null (Navi-composed-endpoint extras). - NominatimBackend — OSM Nominatim /reverse?format=jsonv2 (name/address only). Configurable rate limit (default 1/sec; 0 disables for self-hosted) + required User-Agent. Maps the address block; landclass/elevation_m/timezone null. Registered all three in supervisor _BACKEND_REGISTRY (resolved by EnrichmentConfig backend_class name). Docs — design pivot now in-tree: - PRODUCER §2 reframed: the verbatim Matt quote stays; the translation inverts. Central is the consumer's only data plane (consumers can't do follow-up lookups), so enrich deliberately and centrally, namespaced under _enriched, failing to null. "No enrichment" is gone. - PRODUCER §10.1 inverted: enrichment is expected; the anti-pattern is doing it OUTSIDE the framework (inline in poll(), bypassing cache + _enriched namespacing + the never-raise safety net). - PRODUCER new §13 Enrichment contract: Enricher / GeocoderEnricher / GeocoderBackend Protocols, NoOpBackend default, sqlite cache + TTL + cache-all-null + don't-cache-on-raise semantics, _enriched.<name> provenance, per-field coverage matrix (cross-checked against GEOCODER_FIELDS), and the landclass antimeridian known wrinkle. - CONSUMER FIRMS section: documents the data._enriched.geocoder bundle (9 fields), per-region coverage (US-full, non-US timezone+elevation), and the antimeridian landclass caveat. Tests: - test_navi/photon/nominatim_backend.py — happy-path field mapping, null handling, extra-key drop, network/timeout/non-200/malformed -> all-null (never raises), Nominatim rate-limit (disabled + spacing) + User-Agent. Env-gated live Navi smoke (NAVI_INTEGRATION_TEST=1; skipped by default — the 192.168.1.130 endpoint isn't reachable from CT104's segment). - test_producer_doc.py — +4: §2 verbatim quote present, §10.1 subsection exists, §13 names all four protocol types, §13 coverage matrix == GEOCODER_FIELDS (derived from code, not hardcoded). Verification: full pytest 525 passed, 1 skipped (was 495; +30 backend + 4 doc tests, -1 the env-gated skip). grep subject_for_event/_ADAPTER_REGISTRY clean. All three backends import + resolve via the registry. Flagged for later (NOT done here): adapters besides FIRMS that should declare enrichment_locations (nwis, eonet, gdacs, usgs_quake, wfigs_*) — that's PR L scope alongside the events tab. See PR description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 16:10:44 +00:00
# The verbatim design-principle quote that must stay in §2 (Matt, 2026-05-19).
_DESIGN_PRINCIPLE_QUOTE = (
"Central takes it all and gives it all. It's up to the pipe to do with it"
)
docs(2-I): producer integration spec — docs/PRODUCER-INTEGRATION.md The producer-side contract for adapter authors, mirroring PR H's consumer spec. Self-contained — readers should not need to grep the codebase to understand what a new SourceAdapter subclass must implement. Bakes in the Phase 2 design principle ("Central takes it all and gives it all. It's up to the pipe to do with it what it will.") so future authors reject enrichment / silent-drop / opinionated-translation proposals on sight. The previously-proposed Phase 3 NWIS metadata-enrichment ticket is called out by name as an example of what gets rejected. 12-section outline locked with PM: design principle, quick start (clone swpc_kindex), SourceAdapter base class, settings, subject namespace, dedup keys, StreamEntry registry, removal/fall-off, anti-patterns, preview hook, acceptance gate. Sibling test (tests/test_producer_doc.py) mirrors test_consumer_doc.py discipline: - bidirectional == between SourceAdapter API and §4 method coverage - preview_for_settings contract verbatim against live docstring - top-level domain enumeration vs central.streams.STREAMS prefixes - §8 STREAMS snippet vs central.streams.STREAMS - anti-patterns adapter-name examples vs discover_adapters() No hardcoded stream / adapter / domain lists anywhere in the test — every expected value derives from central.streams, central.adapter_discovery, or central.adapter at runtime. Honest about the pre-existing `:` vs `|` dedup-key separator inconsistency (swpc_alerts and swpc_protons use `|`; everyone else uses `:`). Recommends `:` for new adapters without forcing a rename PR on the SWPC pair (separators are persisted in cursors.db rows). Acceptance bars: (a) grep -rn 'subject_for_event\|_ADAPTER_REGISTRY' src tests → empty (b) bidirectional override-method coverage asserted in test (c) tests/test_producer_doc.py → 6/6 pass (d) full pytest suite → 469 pass (was 463 pre-PR; +6 new) (e) doc length: 823 lines (within 500–1200 envelope) (f) code fences balanced; JSON/Python blocks parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:17:48 +00:00
DOC_PATH = Path(__file__).resolve().parents[1] / "docs" / "PRODUCER-INTEGRATION.md"
def _doc_text() -> str:
assert DOC_PATH.is_file(), f"missing: {DOC_PATH}"
return DOC_PATH.read_text()
def _documented_override_methods(doc: str) -> set[str]:
"""Extract method names documented under '## 4. The SourceAdapter base class'.
Looks for the '**`async def <name>(...)`**' / '**`def <name>(...)`**'
method headings inside §4.
"""
section_re = re.compile(
r"^## 4\. The SourceAdapter base class\s*\n(.*?)(?=^## )",
re.DOTALL | re.MULTILINE,
)
m = section_re.search(doc)
assert m, "doc missing '## 4. The SourceAdapter base class' section"
section = m.group(1)
heading_re = re.compile(r"\*\*`(?:async\s+)?def\s+(\w+)\s*\(", re.MULTILINE)
return set(heading_re.findall(section))
def _sourceadapter_overridable_methods() -> set[str]:
"""Methods on SourceAdapter that an adapter author is expected to implement
or may override. Excludes Python internals (dunder), the constructor, and
private helpers.
"""
methods: set[str] = set()
for name, member in inspect.getmembers(SourceAdapter):
if name.startswith("_"):
continue
if not (inspect.isfunction(member) or inspect.iscoroutinefunction(member)):
continue
methods.add(name)
return methods
def _streams_domains() -> set[str]:
"""Top-level <domain> tokens derived from STREAMS subject filters
(central.<domain>.>).
"""
domain_re = re.compile(r"^central\.([a-z_]+)\.>$")
out: set[str] = set()
for s in STREAMS:
m = domain_re.match(s.subject_filter)
assert m, f"unexpected subject filter shape: {s.subject_filter!r}"
out.add(m.group(1))
return out
def _documented_domains(doc: str) -> set[str]:
"""Domain tokens enumerated in §6.1 as backtick literals (`wx`, `fire`, …)."""
section_re = re.compile(
r"`<domain>` is one of ([^.]+)\.",
re.DOTALL,
)
m = section_re.search(doc)
assert m, "doc missing the '`<domain>` is one of ...' enumeration in §6.1"
enum_text = m.group(1)
return set(re.findall(r"`([a-z_]+)`", enum_text))
def test_doc_exists():
assert DOC_PATH.is_file(), f"doc missing: {DOC_PATH}"
def test_documented_methods_match_sourceadapter_api():
"""Every override-able SourceAdapter method must appear in the §4 contract,
and the doc may not advertise methods that don't exist."""
doc_methods = _documented_override_methods(_doc_text())
code_methods = _sourceadapter_overridable_methods()
assert doc_methods == code_methods, (
f"override-method drift: "
f"doc-only={doc_methods - code_methods}, "
f"code-only={code_methods - doc_methods}"
)
def test_preview_hook_contract_matches_docstring():
"""The contract block quoted in §11.1 must come from the live
SourceAdapter.preview_for_settings docstring.
Normalizes both sides by collapsing whitespace and stripping the doc's
Markdown blockquote prefix (`> `).
"""
doc = _doc_text()
section_re = re.compile(
r"^### 11\.1[^\n]*\n(.*?)(?=^### |^## )",
re.DOTALL | re.MULTILINE,
)
m = section_re.search(doc)
assert m, "doc missing '### 11.1' subsection"
blockquote = "\n".join(
line[2:] if line.startswith("> ") else line.lstrip(">").lstrip()
for line in m.group(1).splitlines()
if line.lstrip().startswith(">")
)
docstring = inspect.getdoc(SourceAdapter.preview_for_settings) or ""
def norm(s: str) -> str:
# Strip markdown backticks; collapse whitespace.
return re.sub(r"\s+", " ", s.replace("`", "")).strip()
norm_block = norm(blockquote)
norm_doc = norm(docstring)
# Bidirectional: every non-empty sentence of the docstring must appear in
# the doc's blockquote, and the blockquote must not introduce new sentences
# the docstring lacks.
sentences = lambda s: {x.strip() for x in re.split(r"(?<=[.:])\s+", s) if x.strip()}
doc_sents = sentences(norm_block)
code_sents = sentences(norm_doc)
assert doc_sents == code_sents, (
f"preview_for_settings contract drift: "
f"doc-only={doc_sents - code_sents}, "
f"code-only={code_sents - doc_sents}"
)
def test_top_level_domains_match_streams_registry():
"""The §6.1 domain enumeration must equal the domain tokens derived from
central.streams.STREAMS bidirectional, no hardcoded list."""
doc_domains = _documented_domains(_doc_text())
code_domains = _streams_domains()
assert doc_domains == code_domains, (
f"domain-token drift: "
f"doc-only={doc_domains - code_domains}, "
f"code-only={code_domains - doc_domains}"
)
def test_streams_snippet_quotes_live_registry():
"""The §8 verbatim STREAMS snippet must agree with central.streams.STREAMS
on (name, subject_filter, event_bearing).
"""
doc = _doc_text()
section_re = re.compile(
r"^## 8\. The StreamEntry registry\s*\n(.*?)(?=^## )",
re.DOTALL | re.MULTILINE,
)
m = section_re.search(doc)
assert m, "doc missing '## 8. The StreamEntry registry' section"
section = m.group(1)
# Each documented entry: StreamEntry("NAME", "central.x.>"[, event_bearing=False])
entry_re = re.compile(
r'StreamEntry\(\s*"([A-Z_]+)"\s*,\s*"(central\.[a-z_]+\.>)"'
r'(?:\s*,\s*event_bearing\s*=\s*(False|True))?\s*\)',
)
doc_rows: set[tuple[str, str, bool]] = set()
for name, subj, eb in entry_re.findall(section):
event_bearing = (eb != "False") # default True if unspecified
doc_rows.add((name, subj, event_bearing))
code_rows = {(s.name, s.subject_filter, s.event_bearing) for s in STREAMS}
assert doc_rows == code_rows, (
f"STREAMS snippet drift: "
f"doc-only={doc_rows - code_rows}, code-only={code_rows - doc_rows}"
)
feat(3-K): real geocoder backends + producer-doc reframe + consumer-doc enrichment Second of three PRs for v0.5.0 (J shipped the framework; this fills in real backends + documents the reframed design principle in-tree; L is the events tab + map fix, then tag). Backends (all satisfy GeocoderBackend; never raise, all-null on any failure): - NaviBackend — composed Navi /api/reverse/<lat>/<lon> (name/address + timezone + landclass + elevation in one call). Near-passthrough: response already matches the canonical 9-field shape. Best-effort warmup ping (Boise) on construction when a loop is running; config `headers` slot for a future Authorization: Bearer (config-only, no code change). Default base_url http://192.168.1.130:8440. - PhotonBackend — raw Photon /reverse?lat&lon&limit=1 (name/address only). Maps features[0].properties; postal_code <- postcode; timezone/landclass/ elevation_m null (Navi-composed-endpoint extras). - NominatimBackend — OSM Nominatim /reverse?format=jsonv2 (name/address only). Configurable rate limit (default 1/sec; 0 disables for self-hosted) + required User-Agent. Maps the address block; landclass/elevation_m/timezone null. Registered all three in supervisor _BACKEND_REGISTRY (resolved by EnrichmentConfig backend_class name). Docs — design pivot now in-tree: - PRODUCER §2 reframed: the verbatim Matt quote stays; the translation inverts. Central is the consumer's only data plane (consumers can't do follow-up lookups), so enrich deliberately and centrally, namespaced under _enriched, failing to null. "No enrichment" is gone. - PRODUCER §10.1 inverted: enrichment is expected; the anti-pattern is doing it OUTSIDE the framework (inline in poll(), bypassing cache + _enriched namespacing + the never-raise safety net). - PRODUCER new §13 Enrichment contract: Enricher / GeocoderEnricher / GeocoderBackend Protocols, NoOpBackend default, sqlite cache + TTL + cache-all-null + don't-cache-on-raise semantics, _enriched.<name> provenance, per-field coverage matrix (cross-checked against GEOCODER_FIELDS), and the landclass antimeridian known wrinkle. - CONSUMER FIRMS section: documents the data._enriched.geocoder bundle (9 fields), per-region coverage (US-full, non-US timezone+elevation), and the antimeridian landclass caveat. Tests: - test_navi/photon/nominatim_backend.py — happy-path field mapping, null handling, extra-key drop, network/timeout/non-200/malformed -> all-null (never raises), Nominatim rate-limit (disabled + spacing) + User-Agent. Env-gated live Navi smoke (NAVI_INTEGRATION_TEST=1; skipped by default — the 192.168.1.130 endpoint isn't reachable from CT104's segment). - test_producer_doc.py — +4: §2 verbatim quote present, §10.1 subsection exists, §13 names all four protocol types, §13 coverage matrix == GEOCODER_FIELDS (derived from code, not hardcoded). Verification: full pytest 525 passed, 1 skipped (was 495; +30 backend + 4 doc tests, -1 the env-gated skip). grep subject_for_event/_ADAPTER_REGISTRY clean. All three backends import + resolve via the registry. Flagged for later (NOT done here): adapters besides FIRMS that should declare enrichment_locations (nwis, eonet, gdacs, usgs_quake, wfigs_*) — that's PR L scope alongside the events tab. See PR description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 16:10:44 +00:00
def _section(doc: str, header_re: str) -> str:
"""Return the body of the section whose header matches header_re, up to the
next same-or-higher-level header."""
m = re.search(header_re + r"\s*\n(.*?)(?=^## |\Z)", doc, re.DOTALL | re.MULTILINE)
assert m, f"doc missing section matching {header_re!r}"
return m.group(1)
def test_design_principle_quote_present_in_section_2():
"""§2 must still carry the verbatim Matt quote — the reframe changes the
translation beneath it, not the quote itself."""
section = _section(_doc_text(), r"^## 2\. The design principle")
assert _DESIGN_PRINCIPLE_QUOTE in section, "verbatim design-principle quote missing from §2"
def test_anti_pattern_10_1_section_exists():
"""§10.1 must still exist as a subsection (content reframed to
'enrichment outside the framework', structure preserved)."""
doc = _doc_text()
assert re.search(r"^### 10\.1 ", doc, re.MULTILINE), "doc missing '### 10.1' subsection"
def test_enrichment_contract_section_13_has_all_protocol_references():
"""New §13 must name all four enrichment contract types verbatim."""
section = _section(_doc_text(), r"^## 13\. Enrichment contract")
for ref in ("Enricher", "GeocoderEnricher", "GeocoderBackend", "NoOpBackend"):
assert ref in section, f"§13 missing reference to {ref!r}"
def test_enrichment_coverage_matrix_lists_exactly_geocoder_fields():
"""The §13 per-field coverage matrix must list exactly the canonical
GEOCODER_FIELDS derived from code, not hardcoded here."""
section = _section(_doc_text(), r"^## 13\. Enrichment contract")
# Matrix rows look like: | `field_name` | ... |
row_fields = set(re.findall(r"^\|\s*`([a-z_]+)`\s*\|", section, re.MULTILINE))
assert row_fields == set(GEOCODER_FIELDS), (
f"coverage-matrix field drift: "
f"doc-only={row_fields - set(GEOCODER_FIELDS)}, "
f"code-only={set(GEOCODER_FIELDS) - row_fields}"
)
docs(2-I): producer integration spec — docs/PRODUCER-INTEGRATION.md The producer-side contract for adapter authors, mirroring PR H's consumer spec. Self-contained — readers should not need to grep the codebase to understand what a new SourceAdapter subclass must implement. Bakes in the Phase 2 design principle ("Central takes it all and gives it all. It's up to the pipe to do with it what it will.") so future authors reject enrichment / silent-drop / opinionated-translation proposals on sight. The previously-proposed Phase 3 NWIS metadata-enrichment ticket is called out by name as an example of what gets rejected. 12-section outline locked with PM: design principle, quick start (clone swpc_kindex), SourceAdapter base class, settings, subject namespace, dedup keys, StreamEntry registry, removal/fall-off, anti-patterns, preview hook, acceptance gate. Sibling test (tests/test_producer_doc.py) mirrors test_consumer_doc.py discipline: - bidirectional == between SourceAdapter API and §4 method coverage - preview_for_settings contract verbatim against live docstring - top-level domain enumeration vs central.streams.STREAMS prefixes - §8 STREAMS snippet vs central.streams.STREAMS - anti-patterns adapter-name examples vs discover_adapters() No hardcoded stream / adapter / domain lists anywhere in the test — every expected value derives from central.streams, central.adapter_discovery, or central.adapter at runtime. Honest about the pre-existing `:` vs `|` dedup-key separator inconsistency (swpc_alerts and swpc_protons use `|`; everyone else uses `:`). Recommends `:` for new adapters without forcing a rename PR on the SWPC pair (separators are persisted in cursors.db rows). Acceptance bars: (a) grep -rn 'subject_for_event\|_ADAPTER_REGISTRY' src tests → empty (b) bidirectional override-method coverage asserted in test (c) tests/test_producer_doc.py → 6/6 pass (d) full pytest suite → 469 pass (was 463 pre-PR; +6 new) (e) doc length: 823 lines (within 500–1200 envelope) (f) code fences balanced; JSON/Python blocks parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:17:48 +00:00
def test_no_orphan_adapter_references_in_anti_patterns():
"""Anti-patterns section names two real adapter modules as examples
(firms, inciweb in §10.4). Those names must still resolve via
central.adapter_discovery protects against a silent rename leaving
dead example references in the doc.
"""
doc = _doc_text()
section_re = re.compile(
r"^## 10\. Anti-patterns.*?\n(.*?)(?=^## )",
re.DOTALL | re.MULTILINE,
)
m = section_re.search(doc)
assert m, "doc missing '## 10. Anti-patterns' section"
section = m.group(1)
quoted = set(re.findall(r'"([a-z][a-z_]*)"', section))
# Whitelist Python-syntax tokens that incidentally appear in the section;
# everything else in this set is asserted to be a real adapter name.
# Derived from STREAMS per §10.4 — stream names appear quoted as examples
# and would otherwise look like orphan adapter references.
syntax_tokens = {s.name for s in STREAMS}
docs(2-I): producer integration spec — docs/PRODUCER-INTEGRATION.md The producer-side contract for adapter authors, mirroring PR H's consumer spec. Self-contained — readers should not need to grep the codebase to understand what a new SourceAdapter subclass must implement. Bakes in the Phase 2 design principle ("Central takes it all and gives it all. It's up to the pipe to do with it what it will.") so future authors reject enrichment / silent-drop / opinionated-translation proposals on sight. The previously-proposed Phase 3 NWIS metadata-enrichment ticket is called out by name as an example of what gets rejected. 12-section outline locked with PM: design principle, quick start (clone swpc_kindex), SourceAdapter base class, settings, subject namespace, dedup keys, StreamEntry registry, removal/fall-off, anti-patterns, preview hook, acceptance gate. Sibling test (tests/test_producer_doc.py) mirrors test_consumer_doc.py discipline: - bidirectional == between SourceAdapter API and §4 method coverage - preview_for_settings contract verbatim against live docstring - top-level domain enumeration vs central.streams.STREAMS prefixes - §8 STREAMS snippet vs central.streams.STREAMS - anti-patterns adapter-name examples vs discover_adapters() No hardcoded stream / adapter / domain lists anywhere in the test — every expected value derives from central.streams, central.adapter_discovery, or central.adapter at runtime. Honest about the pre-existing `:` vs `|` dedup-key separator inconsistency (swpc_alerts and swpc_protons use `|`; everyone else uses `:`). Recommends `:` for new adapters without forcing a rename PR on the SWPC pair (separators are persisted in cursors.db rows). Acceptance bars: (a) grep -rn 'subject_for_event\|_ADAPTER_REGISTRY' src tests → empty (b) bidirectional override-method coverage asserted in test (c) tests/test_producer_doc.py → 6/6 pass (d) full pytest suite → 469 pass (was 463 pre-PR; +6 new) (e) doc length: 823 lines (within 500–1200 envelope) (f) code fences balanced; JSON/Python blocks parse Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 21:17:48 +00:00
candidate_adapter_names = quoted - {t.lower() for t in syntax_tokens}
known_adapters = set(discover_adapters().keys())
orphans = {n for n in candidate_adapter_names if n not in known_adapters}
assert not orphans, (
f"anti-patterns section references unknown adapter names: {orphans} "
f"(known adapters: {sorted(known_adapters)})"
)