v0.10.6: extract mile_marker from itd_511 comment field as _enriched.mile_marker (#94)

itd_511's free-text Comment field carries a milepost in roughly a third of
the live samples ('milepost 32.5', 'MP 80 to MP 81', etc.). meshai's roads
integration needs that as a structured field; wzdx and tomtom_incidents
already speak in structured mile-post / from-to so itd_511 is the only
adapter that needs the regex extraction layer.

Design (per Step-0 review):
- Shared module src/central/enrichment/mile_marker.py exporting
  extract(text) -> {value, source, confidence} | None. Pure regex, no I/O,
  re-usable by future per-state-DOT adapters (Wyoming, Montana, ...).
- itd_511 calls extract on the Comment in _build_event_record; result lands
  under the established _enriched namespace (NOT a new _enrichment one),
  keyed 'mile_marker'. Same convention the supervisor's geocoder uses, same
  merge semantics apply_enrichment already supports. Absent when no match
  (no null placeholder) so subscribers can tell 'not mentioned' from
  'extraction found nothing'.
- Confidence tiers: 'high' (single unambiguous MP/milepost/MM match),
  'medium' (multiple matches like 'MP 80 to MP 81' -- first wins), 'low'
  (bare 'mile N' only; consumers can ignore).

Tests:
- tests/test_enrichment_mile_marker.py: 36 cases parametrized over the 15
  real ITD comments I pulled from CENTRAL_TRAFFIC, including the critical
  red-herring classes the regex must reject (phone numbers, project key
  numbers, state-highway numbers, date/time numbers). Crafted samples
  cover M.P. / MM / milemarker / bare-mile patterns not in live ITD data
  but required by spec for future DOT adapters.
- tests/test_itd_511.py: 2 integration tests confirming the bundle is
  attached on a milepost-bearing Comment and absent otherwise.

Pure enrichment, no schema-breaking changes; meshai's renderer picks it up
additively.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
malice 2026-06-07 21:38:04 -06:00 committed by GitHub
commit e807750a72
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 354 additions and 27 deletions

View file

@ -396,3 +396,54 @@ def test_tenacity_decorator_has_explicit_no_log_hooks():
assert retrying.after is after_nothing
assert retrying.before is before_nothing
assert retrying.reraise is True
# --- v0.10.6: mile_marker enrichment on incident events ---------------------
def _rec_with_comment(comment: str | None) -> dict:
"""Minimal /get/event record with a settable Comment field."""
return {
"SourceId": "test-mm-1",
"EventType": "accidentsAndIncidents",
"Comment": comment,
"Latitude": 43.6,
"Longitude": -116.2,
"Severity": "Minor",
}
def test_build_event_attaches_mile_marker_when_comment_has_milepost(adapter):
"""Comment with a milepost keyword -> _enriched.mile_marker populated.
v0.10.6: the adapter calls central.enrichment.mile_marker.extract on
the Comment field; the result lands under the existing _enriched
namespace (same convention the supervisor's geocoder uses), keyed by
'mile_marker'. Asserts the bundle is present and matches the
{value, source, confidence} contract.
"""
rec = _rec_with_comment(
"Crash on westbound I-84 at milepost 54. One right lane blocked."
)
e = adapter._build_event_record(rec)
assert e is not None
bundle = e.data.get("_enriched", {}).get("mile_marker")
assert bundle is not None, "expected _enriched.mile_marker on milepost-bearing comment"
assert bundle["value"] == 54.0
assert bundle["source"] == "comment_regex"
assert bundle["confidence"] == "high"
def test_build_event_omits_mile_marker_when_comment_has_none(adapter):
"""No MP/mile keyword -> _enriched.mile_marker ABSENT (no null placeholder).
Subscribers can therefore distinguish 'no MP mentioned' from
'extraction ran and found nothing'. Also covers the missing-Comment path.
"""
no_match = adapter._build_event_record(_rec_with_comment("Bridge Repair"))
assert no_match is not None
assert "mile_marker" not in no_match.data.get("_enriched", {})
missing = adapter._build_event_record(_rec_with_comment(None))
assert missing is not None
assert "mile_marker" not in missing.data.get("_enriched", {})