mirror of
https://github.com/zvx-echo6/meshai.git
synced 2026-06-11 01:14:45 +02:00
feat(v0.5.8b): persistence foundation + WFIGS handler + universal cold-start grace
Three integrated pieces that ship together because they were designed as one safety story: (1) PERSISTENCE FOUNDATION -- new meshai/persistence/ module with SQLite db.py, schema migration framework (v1), 13 tables covering all adapter event shapes (traffic_events, fires, firms_pixels, quake_events, nws_alerts, gauge_readings, swpc_events) + mesh state (mesh_nodes, mesh_telemetry, mesh_positions, mesh_messages_in, mesh_broadcasts_out, mesh_health_events) + cross-cutting event_log + schema_meta. WAL mode for reader concurrency, single-writer pattern, MESHAI_DB_PATH env var, mounted at /data/meshai.sqlite via existing docker-compose meshai_data volume. .gitignore updated. (2) WFIGS HANDLER -- meshai/central/wfigs_handler.py implements the first per-adapter handler that uses the persistence layer. Format: MEDIUM style with town/landclass/county fallback chain, lat/lon at 3-decimal precision, New:/Update: prefix. 8h-rate-limited change-detection per IRWIN via fires.last_broadcast_at. Skips tombstones and perimeters silently (logged to event_log with handled=0). Acres fallback chain DailyAcres -> IncidentSize -> raw.DiscoveryAcres -> raw.FinalAcres -> N/A. Pass-through Initial Attack auto-numbered names (IA 1, IA 2). (3) UNIVERSAL COLD-START GRACE -- meshai/notifications/pipeline/dispatcher.py grows a configurable grace window (cold_start_grace_seconds, default 60s, GUI-editable per Rule 17). Anchored to first-event-seen (not container boot), so the grace activates the moment broadcasts could fire. Suppresses mesh delivery during the window; handler-side persistence (fires UPSERT, event_log) still happens normally. New _cold_start_dropped counter exposed in dispatch_stats(). Designed to protect against JetStream backlog spam at toggle-flip time, applies universally to ALL adapters. (4) WFIGS HANDLER CALLBACK REFACTOR -- New:/Update: prefix now keys on fires.last_broadcast_at IS NULL (not row-missing), and last_broadcast_* field updates moved to a post-broadcast commit callback that the dispatcher invokes ONLY on successful delivery. This means: cold-start-suppressed events leave fires.last_broadcast_at NULL, so when they eventually broadcast post-grace, they correctly render as New: (first ACTUAL delivery for that IRWIN), not Update:. event_log.handled and mesh_broadcasts_out audit row also gated on the same callback -- decoupling persistence rows from broadcast rows for an honest audit trail. New tests: 15 in test_wfigs_handler.py, 15 in test_persistence.py, additional cold-start grace tests in test_dispatcher.py (+4 WFIGS callback scenarios). Synthetic probes wfigs-cleaned-samples.md (initial) and wfigs-cleaned-samples-v2.md (cold-start verification) generated against isolated temp SQLite databases. CT108 /data/meshai.sqlite untouched during build. Master stays off. No live toggle flips. Test count: was 535 (v0.5.7 baseline) -> 566 (persistence) -> 581 (wfigs handler) -> 589 expected (cold-start grace). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
7751a40c6c
commit
053d67db6e
16 changed files with 2652 additions and 1 deletions
194
meshai/persistence/db.py
Normal file
194
meshai/persistence/db.py
Normal file
|
|
@ -0,0 +1,194 @@
|
|||
"""SQLite persistence connection management + migration runner.
|
||||
|
||||
Single-writer SQLite pattern with WAL journal mode for reader concurrency.
|
||||
One connection per thread (threading.local) -- callers should not share
|
||||
connections across threads.
|
||||
|
||||
Path resolution:
|
||||
1. MESHAI_DB_PATH env var (explicit override)
|
||||
2. DEFAULT_DB_PATH = /data/meshai.sqlite (prod container mount)
|
||||
|
||||
Special value ":memory:" or any path containing "memory" routes to an
|
||||
in-memory SQLite for tests.
|
||||
|
||||
Migrations live in meshai/persistence/migrations/v*.sql. The runner
|
||||
applies them in version order, recording the applied version in
|
||||
schema_meta. Idempotent re-run is a no-op.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sqlite3
|
||||
import threading
|
||||
from pathlib import Path
|
||||
from typing import Iterable, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
DEFAULT_DB_PATH = "/data/meshai.sqlite"
|
||||
MESHAI_DB_PATH_ENV = "MESHAI_DB_PATH"
|
||||
SCHEMA_VERSION = 1
|
||||
SCHEMA_META_TABLE = "schema_meta"
|
||||
MIGRATIONS_DIR = Path(__file__).parent / "migrations"
|
||||
|
||||
# Per-thread connection pool. Each thread that calls get_db() gets its
|
||||
# own sqlite3.Connection cached on threading.local. Tests can clear
|
||||
# via close_thread_connection() between cases.
|
||||
_local = threading.local()
|
||||
# Module-level lock guards init_db() so concurrent first-callers don't
|
||||
# race on migration application.
|
||||
_init_lock = threading.Lock()
|
||||
# Cache of initialised database paths in this process so init_db() is
|
||||
# idempotent without re-reading migration files on every call.
|
||||
_initialised: set[str] = set()
|
||||
|
||||
|
||||
def MESHAI_DB_PATH() -> str:
|
||||
"""Resolve the active SQLite path (env var override or default)."""
|
||||
return os.environ.get(MESHAI_DB_PATH_ENV) or DEFAULT_DB_PATH
|
||||
|
||||
|
||||
def _is_memory_path(path: str) -> bool:
|
||||
return path == ":memory:" or "mode=memory" in path or path.startswith("file::memory:")
|
||||
|
||||
|
||||
def _connect(path: str) -> sqlite3.Connection:
|
||||
"""Open a SQLite connection with sane defaults for this project."""
|
||||
if _is_memory_path(path):
|
||||
# For in-memory tests, use a shared cache so multiple connections
|
||||
# in the same thread can see the same DB. Tests that want isolation
|
||||
# call close_thread_connection() between cases.
|
||||
uri = "file::memory:?cache=shared"
|
||||
conn = sqlite3.connect(uri, uri=True, isolation_level=None,
|
||||
check_same_thread=False)
|
||||
else:
|
||||
# Ensure parent dir exists for file-backed DBs.
|
||||
Path(path).parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(path, isolation_level=None,
|
||||
check_same_thread=False, timeout=30.0)
|
||||
conn.row_factory = sqlite3.Row
|
||||
# Enable foreign keys (off by default in SQLite); WAL mode for reader
|
||||
# concurrency; reasonable busy timeout for the single-writer pattern.
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
if not _is_memory_path(path):
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
conn.execute("PRAGMA synchronous = NORMAL")
|
||||
conn.execute("PRAGMA busy_timeout = 30000")
|
||||
return conn
|
||||
|
||||
|
||||
def get_db(path: Optional[str] = None) -> sqlite3.Connection:
|
||||
"""Return a SQLite connection for the current thread (cached).
|
||||
|
||||
First call initialises the database (runs pending migrations).
|
||||
Subsequent calls in the same thread return the cached connection.
|
||||
"""
|
||||
target = path or MESHAI_DB_PATH()
|
||||
cached = getattr(_local, "conn", None)
|
||||
cached_path = getattr(_local, "path", None)
|
||||
if cached is not None and cached_path == target:
|
||||
return cached
|
||||
# Different path requested or no cached conn -- (re)open.
|
||||
if cached is not None:
|
||||
try: cached.close()
|
||||
except Exception: pass
|
||||
conn = _connect(target)
|
||||
_local.conn = conn
|
||||
_local.path = target
|
||||
if target not in _initialised:
|
||||
with _init_lock:
|
||||
if target not in _initialised:
|
||||
_apply_migrations(conn)
|
||||
_initialised.add(target)
|
||||
return conn
|
||||
|
||||
|
||||
def close_thread_connection() -> None:
|
||||
"""Close + drop the cached connection for the current thread.
|
||||
|
||||
Tests call this between cases to ensure a clean slate. The shared-cache
|
||||
in-memory database is reset on the LAST close in the process.
|
||||
"""
|
||||
conn = getattr(_local, "conn", None)
|
||||
if conn is not None:
|
||||
try: conn.close()
|
||||
except Exception: pass
|
||||
if hasattr(_local, "conn"): del _local.conn
|
||||
if hasattr(_local, "path"): del _local.path
|
||||
|
||||
|
||||
def init_db(path: Optional[str] = None) -> sqlite3.Connection:
|
||||
"""Explicit init entry point (idempotent). Equivalent to get_db()
|
||||
semantically but documents intent at startup. Returns the connection."""
|
||||
return get_db(path)
|
||||
|
||||
|
||||
def _read_migration_files() -> list[tuple[int, str, str]]:
|
||||
"""Return [(version_int, filename, sql_text), ...] sorted ascending."""
|
||||
if not MIGRATIONS_DIR.is_dir():
|
||||
return []
|
||||
out: list[tuple[int, str, str]] = []
|
||||
for p in sorted(MIGRATIONS_DIR.iterdir()):
|
||||
if not p.is_file() or p.suffix.lower() != ".sql":
|
||||
continue
|
||||
# Filename format: v<N>.sql or v<N>_<label>.sql
|
||||
stem = p.stem
|
||||
if not stem.startswith("v"):
|
||||
continue
|
||||
n_str = stem[1:].split("_", 1)[0]
|
||||
try: n = int(n_str)
|
||||
except ValueError: continue
|
||||
out.append((n, p.name, p.read_text()))
|
||||
return out
|
||||
|
||||
|
||||
def _current_version(conn: sqlite3.Connection) -> int:
|
||||
"""Return the highest applied migration version, or 0 if none."""
|
||||
# Does the schema_meta table exist?
|
||||
row = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name=?",
|
||||
(SCHEMA_META_TABLE,),
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return 0
|
||||
row = conn.execute(
|
||||
f"SELECT value FROM {SCHEMA_META_TABLE} WHERE key='version'",
|
||||
).fetchone()
|
||||
if row is None:
|
||||
return 0
|
||||
try: return int(row["value"])
|
||||
except (TypeError, ValueError): return 0
|
||||
|
||||
|
||||
def _apply_migrations(conn: sqlite3.Connection) -> None:
|
||||
"""Apply any pending migrations in version order. Idempotent."""
|
||||
migrations = _read_migration_files()
|
||||
if not migrations:
|
||||
logger.warning("persistence: no migration files found in %s", MIGRATIONS_DIR)
|
||||
return
|
||||
current = _current_version(conn)
|
||||
pending = [(n, name, sql) for n, name, sql in migrations if n > current]
|
||||
if not pending:
|
||||
logger.debug("persistence: schema up to date at v%d", current)
|
||||
return
|
||||
for version, filename, sql in pending:
|
||||
logger.info("persistence: applying migration %s (v%d)", filename, version)
|
||||
# sqlite3.Connection.executescript() ISSUES ITS OWN COMMIT before
|
||||
# starting the script and another at the end, so wrapping it in an
|
||||
# explicit BEGIN/COMMIT is both unnecessary and broken (the
|
||||
# ROLLBACK in the except clause would fire against an already-
|
||||
# committed-or-empty transaction). Migration atomicity is bounded
|
||||
# by executescript's own transaction.
|
||||
try:
|
||||
conn.executescript(sql)
|
||||
conn.execute(
|
||||
f"INSERT OR REPLACE INTO {SCHEMA_META_TABLE}(key, value) VALUES('version', ?)",
|
||||
(str(version),),
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("persistence: migration %s failed", filename)
|
||||
raise
|
||||
logger.info("persistence: schema now at v%d", pending[-1][0])
|
||||
Loading…
Add table
Add a link
Reference in a new issue