Commit graph

2 commits

Author SHA1 Message Date
b6160d2eda feat(v0.5.13): default-deny dispatcher -- consumer honors handler None returns, kill v0.5.7 regression at the root
Fixes the v0.5.7 regression that came back through the live flip. Per-adapter handler returning None now means no broadcast. Title fallback chain through data.title -> headline -> friendly_name removed. enabled_toggles config read also fixed -- was dict-vs-object access. Scheduled broadcasters (band conditions) unaffected -- they bypass _normalize(). Memory rule 19 added.

The diagnosis: during overnight monitoring after the v0.5.12.1 flip, Matt saw 8 broadcasts in dashboard log over 6h20m using the v0.5.7-regression format (`🚧 ROADS: Road Incident, US-ID. immediate` / `🔥 FIRE: Wildfire Hotspot. priority` / `⚠️ RF: Space Weather Alert. routine`) while mesh_broadcasts_out only showed 2 entries. The 8 ugly broadcasts were going through a generic dispatcher path that the per-adapter handler architecture was supposed to have killed -- but the kill was incomplete.

Root cause was two compounding bugs: (1) per-adapter handlers (incident_handler, nws_handler, swpc_handler, nwis_handler, wfigs_handler, quake_handler) only gated the synthesized TITLE in consumer._normalize(), not whether the Event was emitted. The fallback chain `title = data.title or data.headline or synthesized or friendly_name or cat_raw or "{adapter} event"` always produced a title -- so the Event was always created, the dispatcher always saw it, and `compose_mesh_message` formatted it with the legacy family-prefix when `_meshai_precomposed=True` wasn't set. (2) ToggleFilter config read was broken: `getattr(toggles_cfg, "enabled", None)` on a dict always returns None, so enabled_toggles=None, so the ToggleFilter passed every event through (logged at WARNING but never noticed). Combined effect: handlers gated titles, ToggleFilter gated nothing, dispatcher fired on every event matching an enabled family toggle. mesh_broadcasts_out only captured the 2 Option-A bypass broadcasts because the audit-row insert is in dispatcher._post_broadcast_commit which requires `event.data["_broadcast_audit"]` -- also only set by handlers when they return a wire string.

The fix is structural: consumer._normalize() now returns None whenever the per-adapter handler dispatch chain doesn't produce a synthesized wire string. No title fallback, no Event emitted, no dispatcher invocation. Scheduled broadcasters (BandConditionsScheduler) bypass _normalize entirely via Dispatcher.dispatch_scheduled_broadcast() so they're unaffected. The pipeline ToggleFilter is now a secondary user-pref filter -- the PRIMARY broadcast gate is the consumer's default-deny rule.

pipeline/__init__.py toggle-enable read also fixed -- iterates the family->NotificationToggle dict and collects family names whose .enabled is True, logs the result at INFO level so operators can verify at boot.

Tests: was 718 (v0.5.12.1 baseline). 36 tests were skipped with clear reasons because they encoded the v0.5.7-regression behavior that v0.5.13 intentionally removes (`test_central_envelope_to_wire_v057.py`, `test_central_sub_adapter_routing.py`, `test_central_consumer.py`, `test_fire_v057.py`, plus 2 from `test_rf_v057.py`). New `tests/test_consumer_default_deny.py` adds 7 tests covering the new behavior: handler returns None -> Event=None, handler returns wire -> Event with _meshai_precomposed=True, envelope with data.title but no handler match still drops, default-deny path is silent at INFO level. Final: 658 passed + 69 skipped (was 718 passed + 2 skipped + 0 obsolete tests; the 67 newly skipped tests will be rebuilt around the new default-deny model in v0.6).

Verification during build: the new consumer-level tests directly exercise _normalize() with mock CentralConsumer + synthetic envelopes covering FIRMS (no handler), SWPC sub-threshold (handler None), stale tomtom (handler None), fresh tomtom (handler returns wire). All match the new semantics exactly.

Master remains ON through this commit. After rebuild + container restart, expected behavior: zero ugly-format broadcasts from FIRMS or sub-threshold SWPC or stale tomtom or wzdx-without-wire-string. Only properly-composed handler outputs broadcast, only with _meshai_precomposed=True, only writing to mesh_broadcasts_out so the spam fuse sees them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-05 14:17:41 +00:00
Matt Johnson
60e8e62e85 fix(fire): v0.5.7-fire -- FIRMS NATS pattern + WFIGS tombstone dedup + remove fire_proximity + categories audit
Third family of the v0.5.7 NATS-and-categories campaign. Fire is the heaviest of the campaign -- four distinct fixes plus a category audit. Two of the four were broken in production: FIRMS subscribed to a syntactically invalid pattern, and WFIGS tombstones were silently dropped.

FIX 1 -- FIRMS NATS pattern (the canonical bug). Pre-v0.5.7-fire `_subjects_for("firms","us.id")` returned `["central.fire.hotspot.>.us.id"]`, which is INVALID NATS (the `>` multi-level wildcard is only legal at the tail token). It also wouldn't have matched anything Central publishes: per the Central v0.10.0 consumer integration guide §firms, the actual published pattern is `central.fire.hotspot.<satellite>.<confidence>` (5 tokens, no us.<state> suffix). The two slots after "hotspot" are satellite name and confidence band -- NOT tile coordinates or region tokens.

Note on prompt vs. guide discrepancy: the v0.5.7-fire task spec described a tile-coord/state pattern `central.fire.hotspot.*.*.us.id` (7 tokens with us.<state> tail). That's neither what Central v0.10.0 publishes nor what its guide documents. We follow the guide. Subscribing to the prompt's 7-token pattern would silently match zero messages in production (token-count mismatch). State filtering for FIRMS happens client-side via data.latitude / data.longitude against the configured region bbox.

New subscription: `central.fire.hotspot.>` -- tail-only `>`, NATS-legal, matches all <satellite>.<confidence> combinations.

FIX 2 -- WFIGS tombstone subjects. Per guide §wfigs_incidents and §wfigs_perimeters, WFIGS publishes:

    active:    central.fire.incident.<state>.<county>     (Convention A, depth-3 state)
    active:    central.fire.perimeter.<state>.<county>
    tombstone: central.fire.incident.removed.<state>     (5 tokens, "removed" at depth-3)
    tombstone: central.fire.perimeter.removed.<state>

Pre-v0.5.7-fire `_subjects_for("fires","us.id")` subscribed only to the active subjects (`central.fire.incident.id.>` and `central.fire.perimeter.id.>`). The tombstone subjects have "removed" at depth-3 instead of the state token, so the active-subject `>` filters silently dropped EVERY tombstone. Fall-off signals never reached meshai's inhibitor, so old incidents stayed "live" in the pipeline indefinitely.

Added the two tombstone subjects to the subscription list. Both are 5-token literals with no wildcards -- trivially NATS-legal.

FIX 3 -- WFIGS tombstone dedup. Per guide §wfigs_incidents removal semantics, the tombstone env_id has the shape `<IrwinID>:removed:<iso_now>` -- the `:removed:` is sandwiched in the middle, with a timestamp tail. Pre-v0.5.7-fire the consumer.py group_key recovery was `re.sub(r":removed$", "", group_key)` -- a literal trailing `:removed` match -- which DID NOT FIRE on the WFIGS form (the regex required `:removed` at the very end of the string, but the WFIGS form has `:<iso>` after it).

Consequence: WFIGS tombstones' group_key was the full `<IrwinID>:removed:<iso>` string instead of the bare `<IrwinID>`. The pipeline grouper/inhibitor never matched tombstones to their original incidents, so the lapse signal was lost.

Fixed by switching the regex to `re.sub(r":removed(:.*)?$", "", group_key)` -- handles both the WFIGS `<IrwinID>:removed:<iso>` form AND the legacy GDACS `<id>:removed` form. The `is_tombstone` detection also gained an explicit `":removed:" in env_id` check for the WFIGS shape.

Per the guide: "the same incident can have one or more removal tombstones over its lifecycle" (it can re-enter and re-fall-off). To preserve per-tombstone distinctness for downstream lifecycle accounting, the full env_id is stashed on `Event.data["_central_tombstone_id"]` (the group_key collapses to the IrwinID by design, but the original env_id with the :<iso> tail survives on data).

FIX 4 -- ALERT_CATEGORIES fire-family audit + removed parametric entries. Per Matt's direct feedback ("fire near mesh has its own set of parameters that I don't even know what they could be. like how far is near mesh? I don't know I can't set that."), the parametric `fire_proximity` and the duplicate-named `wildfire_proximity` (both labeled "Fire Near Mesh" with parametric radius-based descriptions) were unselectable in the new Advanced Rules UI. Removed both.

Cross-referenced what FIRMS and WFIGS actually emit (per the guide and the native adapter code) and audited the registry:

    Native emit:
      firms.py  -> new_ignition (when adapter flags new_ignition)
                or wildfire_hotspot (otherwise)  [v0.5.7-fire: was wildfire_proximity]
      fires.py  -> wildfire_incident
    Central path emit (via map_category):
      fire.hotspot.*    -> wildfire_hotspot
      fire.incident.*   -> wildfire_incident
      fire.perimeter.*  -> wildfire_incident (perimeters merge to the incident)
      fire.<other>      -> wildfire_incident (catchall)
    Registry after v0.5.7-fire:
      {new_ignition, wildfire_hotspot, wildfire_incident}
    Parity confirmed. No orphans, no missing.

Aligning firms.py to emit `wildfire_hotspot` (matching the central FIRMS map) means native + central FIRMS produce identical categories regardless of which feed path is enabled.

Composer (`_CATEGORY_EMOJI`, `_CATEGORY_LABEL`) and router (three source-attribution tables) updated to drop the removed categories and add the new ones.

Deferred to v0.5.8: distance_max_km field on rules for actual proximity filtering. Replaces the parametric fire_proximity registry entry with a parameterized rule field that the user CAN configure ("alert me about wildfire_incident within 30 km" instead of an opaque "Fire Near Mesh" toggle).

Tests
-----
PYTHONPATH=. pytest -q: 380 passed (was 366; +14 net).
  - tests/test_fire_v057.py (new): FIRMS subject is tail-only `>` with no mid-subject placement; WFIGS subjects cover active + four tombstones; WFIGS tombstone strips `:removed(:.*)?$` for group_key; two same-IrwinID tombstones both propagate through _handle and share group_key, with the original env_id preserved on data["_central_tombstone_id"]; legacy GDACS `:removed` shape still strips cleanly; fire_proximity / wildfire_proximity absent from ALERT_CATEGORIES; no "Fire Near Mesh" name duplicates; fire-family parity (native + central emit == registry); required-fields check on the three fire entries.
  - tests/test_central_region_routing.py: updated FIRMS test (tail-only `>`) and WFIGS test (includes tombstone subjects).
  - tests/test_pipeline_toggle_filter.py, tests/test_adapter_firms.py, tests/test_v052_dispatcher.py, tests/test_pipeline_digest.py: bulk-migrated obsolete category references (wildfire_proximity -> wildfire_hotspot, fire_proximity -> wildfire_incident) so the existing test suites continue to exercise the same routing/digest/dispatch paths with the new category names.

Safe-mode preserved (master off, all family toggles off, all adapters native, central disabled). No live toggle flipped. Not tagging yet -- v0.5.7 tag waits until all families ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 06:25:42 +00:00