meshai/tests/test_weather_v057.py
Matt Johnson b87696bf67 fix(weather): v0.5.7-weather -- NWS HTML strip + ALERT_CATEGORIES audit (NATS pattern already valid)
First family of the v0.5.7 NATS-and-categories campaign (Matt review of Central v0.10.0 meshai_integration_guide.md). Weather lands first because the NWS NATS pattern is already legal; the other five families need invalid mid-subject > rewrites that will ship per-family.

FIX 1 -- NWS NATS pattern validated. _subjects_for("nws", "us.id") -> ["central.wx.alert.us.id.>"]. The wildcard token > sits at the tail only (token index -1), so the subject is a legal NATS multi-level wildcard. No code change. Live introspection confirmed in-container.

FIX 2 -- NWS HTML strip in mesh composer. Per Central guide Surprise 3, data["description"] and data["instruction"] arrive as raw HTML (<p>, <br>, <strong>, &nbsp;, &mdash;, ...). Until now the composer fed event.title / event.summary straight to LoRa, so any future title/summary populated from those fields would have leaked literal markup onto the wire.

Added strip_html_tags(text) -> str in meshai/notifications/renderers/composer.py. Block-level tags (br, p, div, li, tr, h1-h6) become a single space so adjacent paragraphs do not fuse; all other tags are removed; HTML entities are decoded via html.unescape; whitespace is collapsed. Applied in _primary_identifier (title and summary paths) and _region_segment BEFORE byte-budget truncation, so the 150 B cap counts real glyphs, not markup. Universal (not NWS-gated) since strip is a no-op on plain text -- protects against future adapters that surface raw HTML too.

FIX 3 -- ALERT_CATEGORIES weather audit. Cross-referenced ALERT_CATEGORIES{toggle="weather"} against meshai/env/nws.py:_derive_category() emission set:

  nws.py emits:        weather_warning, weather_watch, weather_advisory, weather_statement
  registry weather:    weather_warning, weather_watch, weather_advisory, weather_statement

Parity. No additions, no removals. The v0.5.2 stream_* migration to the seismic family (USGS hydro under the GUI Geohazards tab) is already reflected; weather is clean at 4 entries. Added a comment block above the weather section pointing at test_alert_categories_weather_complete which now enforces this parity going forward -- if a new branch is added to _derive_category(), the test fails and forces a matching registry entry.

Tests
-----
PYTHONPATH=. pytest -q: 345 passed (was 328; +17 new in tests/test_weather_v057.py).
  - strip_html_tags: simple tags, br/paragraph -> space, entity decode (&amp; &nbsp; &mdash;), nested/attrs, plain-text no-op, empty input, whitespace collapse.
  - compose_mesh_message integration: HTML in title scrubbed; HTML in summary fallback scrubbed; 150 B budget still holds.
  - Weather parity: reflection-based scan of NWSAlertsAdapter._derive_category() vs registry; both must match.
  - Required-fields check on the four weather entries.

Safe-mode preserved (master off, all family toggles off, all adapters native, central disabled). No live toggle flipped. Not tagging yet -- v0.5.7 tag waits until all families ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 06:00:10 +00:00

172 lines
5.8 KiB
Python

"""v0.5.7-weather: NWS HTML strip + ALERT_CATEGORIES weather audit.
Covers three things shipped in v0.5.7-weather:
1. strip_html_tags() — NWS data.description / data.instruction arrive as raw
HTML (per Central guide §Surprise 3). Verify tags are stripped, entities
decoded, paragraph breaks become spaces, plain text is a no-op.
2. compose_mesh_message() integration — an Event whose title contains HTML
produces a clean LoRa string (no literal <p>/<br>).
3. Weather category parity — ALERT_CATEGORIES{toggle=weather} is exactly the
set that nws.py._derive_category() can emit. Fail loudly if either side
drifts so the weather family stays "every event meshai sees is selectable".
"""
import inspect
import pytest
from meshai.notifications.categories import ALERT_CATEGORIES
from meshai.notifications.events import make_event
from meshai.notifications.renderers.composer import (
compose_mesh_message,
strip_html_tags,
)
# ---------- strip_html_tags() ----------------------------------------------
def test_strip_html_tags_removes_simple_tags():
assert strip_html_tags("<p>Severe</p>") == "Severe"
def test_strip_html_tags_br_becomes_space():
# <br> separates two sentences in NWS bodies; must not fuse.
assert strip_html_tags("hello<br>world") == "hello world"
def test_strip_html_tags_paragraph_break_becomes_space():
assert strip_html_tags("<p>hello</p><p>world</p>") == "hello world"
def test_strip_html_tags_decodes_entities():
assert strip_html_tags("Wind gusts 25 &amp; 35 mph") == "Wind gusts 25 & 35 mph"
# &nbsp; decodes to U+00A0 which the whitespace collapse normalizes to a
# regular space — tight ASCII whitespace is what we want on LoRa.
assert strip_html_tags("Twin Falls&nbsp;County") == "Twin Falls County"
assert strip_html_tags("12 &mdash; 35 mph") == "12 — 35 mph"
def test_strip_html_tags_nested_and_attrs():
raw = '<div class="alert"><p style="color:red">Tornado <strong>WARNING</strong></p></div>'
assert strip_html_tags(raw) == "Tornado WARNING"
def test_strip_html_tags_plain_text_noop():
assert strip_html_tags("Red Flag Warning until 04:00Z") == "Red Flag Warning until 04:00Z"
def test_strip_html_tags_empty_inputs():
assert strip_html_tags("") == ""
assert strip_html_tags(None) == "" # type: ignore[arg-type]
def test_strip_html_tags_collapses_whitespace():
raw = "<p>line 1</p>\n<p>line\t2</p>"
assert strip_html_tags(raw) == "line 1 line 2"
# ---------- compose_mesh_message integration -------------------------------
def test_compose_mesh_message_strips_html_in_title():
event = make_event(
source="nws",
category="weather_warning",
severity="priority",
title="<p>Severe Thunderstorm Warning</p>",
summary="",
region="Twin Falls",
)
line = compose_mesh_message(event)
# No literal markup escapes onto the wire.
assert "<" not in line
assert "</p>" not in line
assert "Severe Thunderstorm Warning" in line
def test_compose_mesh_message_strips_html_with_entities_and_br():
event = make_event(
source="nws",
category="weather_advisory",
severity="routine",
title="Wind Advisory&nbsp;&mdash;<br>SW gusts 50 mph",
summary="",
region="Magic Valley",
)
line = compose_mesh_message(event)
assert "<br>" not in line
assert "&nbsp;" not in line
assert "&mdash;" not in line
# Byte budget still holds.
assert len(line.encode("utf-8")) <= 150
def test_compose_mesh_message_html_fallthrough_to_summary():
# title empty -> summary path also strips HTML.
event = make_event(
source="nws",
category="weather_statement",
severity="routine",
title="",
summary="<p>Special Weather Statement</p>",
)
line = compose_mesh_message(event)
assert "<" not in line
assert "Special Weather Statement" in line
# ---------- ALERT_CATEGORIES weather audit ---------------------------------
def _nws_emitted_categories() -> set[str]:
"""Walk nws.py source for every literal returned by _derive_category().
Reflection-style audit: read the method body's source and collect the
quoted return values. Keeps the test honest if someone adds a 5th branch
without thinking about ALERT_CATEGORIES.
"""
from meshai.env.nws import NWSAlertsAdapter
src = inspect.getsource(NWSAlertsAdapter._derive_category)
import re
return set(re.findall(r'return\s+"([a-z_]+)"', src))
def test_nws_emits_exactly_four_weather_categories():
emitted = _nws_emitted_categories()
assert emitted == {
"weather_warning",
"weather_watch",
"weather_advisory",
"weather_statement",
}, f"nws.py emission set drifted: {emitted}"
def test_alert_categories_weather_complete():
"""Every weather category nws.py can emit must exist in ALERT_CATEGORIES
with toggle='weather'. Anything tagged toggle='weather' that nws.py
cannot emit is an orphan (no UI selectable event would ever surface it).
"""
registry_weather = {
cid for cid, info in ALERT_CATEGORIES.items()
if info.get("toggle") == "weather"
}
emitted = _nws_emitted_categories()
missing = emitted - registry_weather
orphans = registry_weather - emitted
assert not missing, f"nws.py emits categories missing from ALERT_CATEGORIES: {missing}"
assert not orphans, f"ALERT_CATEGORIES has orphan weather entries: {orphans}"
@pytest.mark.parametrize(
"cat",
["weather_warning", "weather_watch", "weather_advisory", "weather_statement"],
)
def test_weather_categories_have_required_fields(cat):
info = ALERT_CATEGORIES[cat]
assert info["toggle"] == "weather"
assert info["name"]
assert info["description"]
assert info["default_severity"] in {"routine", "priority", "immediate"}
assert info["example_message"]