Commit graph

1 commit

Author SHA1 Message Date
e807750a72
v0.10.6: extract mile_marker from itd_511 comment field as _enriched.mile_marker (#94)
itd_511's free-text Comment field carries a milepost in roughly a third of
the live samples ('milepost 32.5', 'MP 80 to MP 81', etc.). meshai's roads
integration needs that as a structured field; wzdx and tomtom_incidents
already speak in structured mile-post / from-to so itd_511 is the only
adapter that needs the regex extraction layer.

Design (per Step-0 review):
- Shared module src/central/enrichment/mile_marker.py exporting
  extract(text) -> {value, source, confidence} | None. Pure regex, no I/O,
  re-usable by future per-state-DOT adapters (Wyoming, Montana, ...).
- itd_511 calls extract on the Comment in _build_event_record; result lands
  under the established _enriched namespace (NOT a new _enrichment one),
  keyed 'mile_marker'. Same convention the supervisor's geocoder uses, same
  merge semantics apply_enrichment already supports. Absent when no match
  (no null placeholder) so subscribers can tell 'not mentioned' from
  'extraction found nothing'.
- Confidence tiers: 'high' (single unambiguous MP/milepost/MM match),
  'medium' (multiple matches like 'MP 80 to MP 81' -- first wins), 'low'
  (bare 'mile N' only; consumers can ignore).

Tests:
- tests/test_enrichment_mile_marker.py: 36 cases parametrized over the 15
  real ITD comments I pulled from CENTRAL_TRAFFIC, including the critical
  red-herring classes the regex must reject (phone numbers, project key
  numbers, state-highway numbers, date/time numbers). Crafted samples
  cover M.P. / MM / milemarker / bare-mile patterns not in live ITD data
  but required by spec for future DOT adapters.
- tests/test_itd_511.py: 2 integration tests confirming the bundle is
  attached on a milepost-bearing Comment and absent otherwise.

Pure enrichment, no schema-breaking changes; meshai's renderer picks it up
additively.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:38:04 -06:00