Commit graph

3 commits

Author SHA1 Message Date
c333a97344 feat(v0.6-2): dispatcher state persistence -- cold-start, cooldowns, dedup LRU to SQLite
Closes Rule-20 dispatcher gap from audit doc v0.6-phase1-audit.md finding #1.
Pre-this-commit the cold-start anchor, 4 drop counters, per-toggle cooldown
map, and dedup OrderedDict all lived in Dispatcher instance memory and were
lost on every container restart.

v5.sql adds three tables:
  - dispatcher_state (singleton id=1): cold_start_anchor + 4 drop counters
  - dispatcher_cooldowns ((toggle,category,region) keyed): last_fired_at
  - dispatcher_dedup ((source,event_id) keyed): seen_at

Dispatcher refactor:
  - __init__ calls _restore_from_db -- counters, cold-start anchor, cooldown
    map, and dedup LRU (most-recent 10k by seen_at) all rehydrated from the
    three new tables
  - write-through on every mutation: _persist_state for counter/anchor,
    _persist_cooldown for cooldown UPSERT + 2*cooldown_s prune,
    _persist_dedup for dedup INSERT OR REPLACE + 7-day cleanup
  - in-memory caches stay authoritative on the fast read path
  - cumulative-since-install counters (NOT since-boot); LLM will be able
    to answer "we have dropped 47 stale events this week" after commit #5
    (env_reporter) lands
  - graceful degrade: missing v5 tables / persistence outage falls back to
    fresh in-memory state without crashing the constructor

Tests:
  - tests/test_dispatcher_persistence.py (17 tests): state restore on init,
    counter+cooldown+dedup survival across simulated restart, cooldown rearm
    within 2x window, dedup LRU rebuild caps at 10k, 7-day cleanup on insert,
    INSERT OR REPLACE on duplicate source+event_id, v5 migration idempotent,
    synthetic storm (50 events) -> restart -> replay (5 incl 1 duplicate)
    with the duplicate dedup-rejected and counters NOT resetting
  - tests/conftest.py (new): autouse MESHAI_DB_PATH redirection to per-test
    tmp file, so the dispatcher_*  tables on production /data dont get
    polluted by tests that construct Dispatcher() without an explicit fixture
  - tests/test_notification_toggles.py: _dispatch helper wipes dedup/cooldown/
    state tables between calls (per-call independence preserved; pre-v0.6-2
    in-memory-only Dispatcher reset naturally per instance)

Test count: 680 -> 697 (+17 new, 0 regressions).

Refs audit doc v0.6-phase1-audit.md finding #1.
2026-06-05 16:35:40 +00:00
053d67db6e feat(v0.5.8b): persistence foundation + WFIGS handler + universal cold-start grace
Three integrated pieces that ship together because they were designed as one safety story: (1) PERSISTENCE FOUNDATION -- new meshai/persistence/ module with SQLite db.py, schema migration framework (v1), 13 tables covering all adapter event shapes (traffic_events, fires, firms_pixels, quake_events, nws_alerts, gauge_readings, swpc_events) + mesh state (mesh_nodes, mesh_telemetry, mesh_positions, mesh_messages_in, mesh_broadcasts_out, mesh_health_events) + cross-cutting event_log + schema_meta. WAL mode for reader concurrency, single-writer pattern, MESHAI_DB_PATH env var, mounted at /data/meshai.sqlite via existing docker-compose meshai_data volume. .gitignore updated. (2) WFIGS HANDLER -- meshai/central/wfigs_handler.py implements the first per-adapter handler that uses the persistence layer. Format: MEDIUM style with town/landclass/county fallback chain, lat/lon at 3-decimal precision, New:/Update: prefix. 8h-rate-limited change-detection per IRWIN via fires.last_broadcast_at. Skips tombstones and perimeters silently (logged to event_log with handled=0). Acres fallback chain DailyAcres -> IncidentSize -> raw.DiscoveryAcres -> raw.FinalAcres -> N/A. Pass-through Initial Attack auto-numbered names (IA 1, IA 2). (3) UNIVERSAL COLD-START GRACE -- meshai/notifications/pipeline/dispatcher.py grows a configurable grace window (cold_start_grace_seconds, default 60s, GUI-editable per Rule 17). Anchored to first-event-seen (not container boot), so the grace activates the moment broadcasts could fire. Suppresses mesh delivery during the window; handler-side persistence (fires UPSERT, event_log) still happens normally. New _cold_start_dropped counter exposed in dispatch_stats(). Designed to protect against JetStream backlog spam at toggle-flip time, applies universally to ALL adapters. (4) WFIGS HANDLER CALLBACK REFACTOR -- New:/Update: prefix now keys on fires.last_broadcast_at IS NULL (not row-missing), and last_broadcast_* field updates moved to a post-broadcast commit callback that the dispatcher invokes ONLY on successful delivery. This means: cold-start-suppressed events leave fires.last_broadcast_at NULL, so when they eventually broadcast post-grace, they correctly render as New: (first ACTUAL delivery for that IRWIN), not Update:. event_log.handled and mesh_broadcasts_out audit row also gated on the same callback -- decoupling persistence rows from broadcast rows for an honest audit trail. New tests: 15 in test_wfigs_handler.py, 15 in test_persistence.py, additional cold-start grace tests in test_dispatcher.py (+4 WFIGS callback scenarios). Synthetic probes wfigs-cleaned-samples.md (initial) and wfigs-cleaned-samples-v2.md (cold-start verification) generated against isolated temp SQLite databases. CT108 /data/meshai.sqlite untouched during build. Master stays off. No live toggle flips. Test count: was 535 (v0.5.7 baseline) -> 566 (persistence) -> 581 (wfigs handler) -> 589 expected (cold-start grace).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-05 03:54:04 +00:00
b90afc3a74 feat(notifications): v0.5.0 -- Master Toggles UX redesign + Central Connection GUI + grouped categories + region scoping
Per-family notification policy (PagerDuty/Grafana-style): each family gets a
severity threshold + region scope + a severity->channel routing matrix, so an
operator opts in per family rather than hand-writing rules.

SECTION 1 -- BACKEND
- config.py: new NotificationToggle dataclass (enabled, min_severity, regions,
  severity_channels{severity->[channel types]}, quiet_hours_override, + per-channel
  delivery config: broadcast_channel/node_ids/smtp_*/recipients/webhook_*).
  notifications.toggles is now a dict[family]->NotificationToggle with 8 family
  defaults (mesh_health, weather, fire, rf_propagation, roads, avalanche, seismic,
  tracking), all enabled=false (opt-in), min_severity=priority,
  severity_channels={priority:[mesh_broadcast], immediate:[mesh_broadcast, mesh_dm]},
  quiet_hours_override=true. (Old TogglesConfig.enabled was only read by
  build_pipeline via getattr -> degrades to ToggleFilter no-op, so the pipeline
  filter is unchanged; toggles now drive the Dispatcher instead.)
- region_scope:list added to NotificationRuleConfig; _matching_rules filters by
  event.region/regions ([] = all).
- Dispatcher: _dispatch_toggles runs IN PARALLEL to rule matching -- looks up
  get_toggle(event.category), checks enabled + region scope + severity threshold,
  then for each channel in severity_channels[event.severity] builds a synthetic
  rule (override_quiet set only for immediate when quiet_hours_override) and
  delivers. 'digest' channel is skipped in live dispatch (handled by accumulator).
- categories.py: get_toggle() prefix fallback maps the live phases-2.7-2.14
  categories (weather_warning, wildfire_incident, earthquake_event,
  traffic_congestion, geomagnetic/rf_*, stream_*, ...) to their family, fixing the
  v0.4 "category -> other" gap.
- config_loader.py: SECRET_FIELDS += notifications.toggles.*.smtp_password.
- _dataclass_to_dict now recurses dict-of-dataclasses, and the loader coerces the
  toggles dict -> NotificationToggle on both the full-load and section-PUT paths
  (so GUI save round-trips correctly).
- tests/test_notification_toggles.py (11): enabled/disabled, region filter
  (empty+populated+regions-list), severity threshold, per-severity channel routing,
  digest-skipped-live, quiet-hours-override immediate-only, category->family,
  rules+toggles both fire. Full suite: 294 passed (283 + 11).

SECTION 2 -- FRONTEND
- Notifications.tsx: MasterToggles component above the rules section -- 8 family
  cards (icon + enable toggle; collapsed summary 'OFF' or 'N regions, M channels at
  <sev>+'; expanded: severity threshold, severity x channel checkbox matrix,
  region list, quiet-hours-override toggle, per-channel config:
  broadcast_channel/DM node IDs/recipients/SMTP host+port/webhook URL).
- Environment.tsx: CentralConnectionPanel above the family tabs (url, durable,
  enabled) wired to environmental.central.
- npm run build clean (tsc strict); rebuilt static committed (index-CfYlhn4e.js).

SECTION 3 -- VERIFICATION
- py_compile + tsc strict clean; pytest 294 passed.
- Rebuilt prod: /notifications serves Master Toggles, /environment serves Central
  Connection (strings confirmed in the served bundle); 8 adapters, pipeline
  started, no tracebacks, healthy.
- GUI round-trip: enable weather toggle (min_severity=priority,
  regions=[Magic Valley], severity_channels.priority=[mesh_broadcast]) -> PUT
  {saved:true} -> notifications.yaml reflects it; env_feeds traffic.api_key stayed
  ${TOMTOM_API_KEY} (C.3.1 secret preservation holds). Restored to clean opt-in
  baseline.
- Synthetic NWS weather_warning/priority/Magic Valley -> routes through the weather
  toggle to mesh_broadcast; out-of-region and below-threshold events correctly
  dropped.

DEFERRED (noted for a follow-up, not blocking Matt's morning config): Section 2B
rules-editor polish -- grouped-by-family category checkboxes, region_scope
multi-select in the rule editor (backend field + filtering ARE in), tooltips, and
the fire-count Active/No-activity badge -- were not built tonight to keep the build
shippable and verified; the Advanced Rules section is otherwise unchanged and
still functional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 07:00:10 +00:00