mirror of
https://github.com/zvx-echo6/central.git
synced 2026-06-10 20:04:43 +02:00
v0.10.6: extract mile_marker from itd_511 comment field as _enriched.mile_marker (#94)
itd_511's free-text Comment field carries a milepost in roughly a third of
the live samples ('milepost 32.5', 'MP 80 to MP 81', etc.). meshai's roads
integration needs that as a structured field; wzdx and tomtom_incidents
already speak in structured mile-post / from-to so itd_511 is the only
adapter that needs the regex extraction layer.
Design (per Step-0 review):
- Shared module src/central/enrichment/mile_marker.py exporting
extract(text) -> {value, source, confidence} | None. Pure regex, no I/O,
re-usable by future per-state-DOT adapters (Wyoming, Montana, ...).
- itd_511 calls extract on the Comment in _build_event_record; result lands
under the established _enriched namespace (NOT a new _enrichment one),
keyed 'mile_marker'. Same convention the supervisor's geocoder uses, same
merge semantics apply_enrichment already supports. Absent when no match
(no null placeholder) so subscribers can tell 'not mentioned' from
'extraction found nothing'.
- Confidence tiers: 'high' (single unambiguous MP/milepost/MM match),
'medium' (multiple matches like 'MP 80 to MP 81' -- first wins), 'low'
(bare 'mile N' only; consumers can ignore).
Tests:
- tests/test_enrichment_mile_marker.py: 36 cases parametrized over the 15
real ITD comments I pulled from CENTRAL_TRAFFIC, including the critical
red-herring classes the regex must reject (phone numbers, project key
numbers, state-highway numbers, date/time numbers). Crafted samples
cover M.P. / MM / milemarker / bare-mile patterns not in live ITD data
but required by spec for future DOT adapters.
- tests/test_itd_511.py: 2 integration tests confirming the bundle is
attached on a milepost-bearing Comment and absent otherwise.
Pure enrichment, no schema-breaking changes; meshai's renderer picks it up
additively.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b17d8bcd54
commit
e807750a72
4 changed files with 354 additions and 27 deletions
|
|
@ -396,3 +396,54 @@ def test_tenacity_decorator_has_explicit_no_log_hooks():
|
|||
assert retrying.after is after_nothing
|
||||
assert retrying.before is before_nothing
|
||||
assert retrying.reraise is True
|
||||
|
||||
|
||||
# --- v0.10.6: mile_marker enrichment on incident events ---------------------
|
||||
|
||||
|
||||
def _rec_with_comment(comment: str | None) -> dict:
|
||||
"""Minimal /get/event record with a settable Comment field."""
|
||||
return {
|
||||
"SourceId": "test-mm-1",
|
||||
"EventType": "accidentsAndIncidents",
|
||||
"Comment": comment,
|
||||
"Latitude": 43.6,
|
||||
"Longitude": -116.2,
|
||||
"Severity": "Minor",
|
||||
}
|
||||
|
||||
|
||||
def test_build_event_attaches_mile_marker_when_comment_has_milepost(adapter):
|
||||
"""Comment with a milepost keyword -> _enriched.mile_marker populated.
|
||||
|
||||
v0.10.6: the adapter calls central.enrichment.mile_marker.extract on
|
||||
the Comment field; the result lands under the existing _enriched
|
||||
namespace (same convention the supervisor's geocoder uses), keyed by
|
||||
'mile_marker'. Asserts the bundle is present and matches the
|
||||
{value, source, confidence} contract.
|
||||
"""
|
||||
rec = _rec_with_comment(
|
||||
"Crash on westbound I-84 at milepost 54. One right lane blocked."
|
||||
)
|
||||
e = adapter._build_event_record(rec)
|
||||
assert e is not None
|
||||
bundle = e.data.get("_enriched", {}).get("mile_marker")
|
||||
assert bundle is not None, "expected _enriched.mile_marker on milepost-bearing comment"
|
||||
assert bundle["value"] == 54.0
|
||||
assert bundle["source"] == "comment_regex"
|
||||
assert bundle["confidence"] == "high"
|
||||
|
||||
|
||||
def test_build_event_omits_mile_marker_when_comment_has_none(adapter):
|
||||
"""No MP/mile keyword -> _enriched.mile_marker ABSENT (no null placeholder).
|
||||
|
||||
Subscribers can therefore distinguish 'no MP mentioned' from
|
||||
'extraction ran and found nothing'. Also covers the missing-Comment path.
|
||||
"""
|
||||
no_match = adapter._build_event_record(_rec_with_comment("Bridge Repair"))
|
||||
assert no_match is not None
|
||||
assert "mile_marker" not in no_match.data.get("_enriched", {})
|
||||
|
||||
missing = adapter._build_event_record(_rec_with_comment(None))
|
||||
assert missing is not None
|
||||
assert "mile_marker" not in missing.data.get("_enriched", {})
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue