meshai/MEMORY_IMPLEMENTATION_GUIDE.md
Matt fd3f995ebb Initial commit: MeshAI - LLM-powered Meshtastic assistant
Features:
- Multi-backend LLM support (OpenAI, Anthropic, Google)
- Rolling summary memory for token optimization (~70-80% reduction)
- Per-user conversation history with SQLite persistence
- Bang commands (!help, !ping, !reset, !status, !weather)
- Meshtastic integration via serial or TCP
- Message chunking for mesh network constraints (150 char limit)
- Rate limiting to prevent network congestion
- Rich TUI configurator
- Docker support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 11:53:46 -07:00

18 KiB

Quick Implementation Guide: Rolling Summary Memory

TL;DR

Problem: Sending full conversation history every request wastes tokens and latency.

Solution: Rolling summary approach - keep recent messages + LLM-generated summary of older messages.

Result: ~83% token reduction for long conversations, zero dependencies, works with current stack.


Architecture

SQLite History (per user)
    ↓
Messages 1-10: Summarized → "User asked about weather, discussed outdoor plans"
Messages 11-18: Sent raw  → Full context
    ↓
LLM receives: System prompt + Summary + Recent 8 messages
    ↓
Response generated

Files to Create/Modify

1. Create meshai/memory.py

"""Lightweight rolling summary memory manager."""

import time
from dataclasses import dataclass
from typing import Optional

from openai import AsyncOpenAI


@dataclass
class ConversationSummary:
    """Summary of conversation history."""

    summary: str
    last_updated: float
    message_count: int


class RollingSummaryMemory:
    """Manages conversation summaries with recent message window.

    Strategy:
    - Keep last N message pairs (window_size) in full
    - Summarize everything before the window
    - Update summary when old messages accumulate

    Example (window_size=4):
        Messages 1-10: Summarized to "User discussed weather and plans"
        Messages 11-18: Kept in full (last 4 pairs)
        Context sent: [Summary] + [Messages 11-18]
    """

    def __init__(
        self,
        client: AsyncOpenAI,
        model: str,
        window_size: int = 4,
        summarize_threshold: int = 8,
    ):
        """Initialize rolling summary memory.

        Args:
            client: AsyncOpenAI client for generating summaries
            model: Model name to use for summarization
            window_size: Number of recent message pairs to keep in full
            summarize_threshold: Messages to accumulate before re-summarizing
        """
        self._client = client
        self._model = model
        self._window_size = window_size
        self._summarize_threshold = summarize_threshold

        # In-memory cache of summaries (loaded from DB on startup)
        self._summaries: dict[str, ConversationSummary] = {}

    async def get_context_messages(
        self,
        user_id: str,
        full_history: list[dict],
    ) -> tuple[Optional[str], list[dict]]:
        """Get optimized context: summary + recent messages.

        Args:
            user_id: User identifier
            full_history: Full message history from database

        Returns:
            Tuple of (summary_text, recent_messages)
            summary_text is None if conversation is short
        """
        # Short conversation - no summary needed
        if len(full_history) <= self._window_size * 2:
            return None, full_history

        # Split into old (to summarize) and recent (keep raw)
        split_point = -(self._window_size * 2)
        old_messages = full_history[:split_point]
        recent_messages = full_history[split_point:]

        # Get or create summary
        summary = await self._get_or_create_summary(user_id, old_messages)

        return summary.summary, recent_messages

    async def _get_or_create_summary(
        self,
        user_id: str,
        messages: list[dict],
    ) -> ConversationSummary:
        """Get cached summary or create new one."""
        # Check cache
        if user_id in self._summaries:
            cached = self._summaries[user_id]

            # Reuse if message count is close
            if abs(cached.message_count - len(messages)) < self._summarize_threshold:
                return cached

        # Generate new summary
        summary_text = await self._summarize(messages)

        summary = ConversationSummary(
            summary=summary_text,
            last_updated=time.time(),
            message_count=len(messages),
        )

        self._summaries[user_id] = summary
        return summary

    async def _summarize(self, messages: list[dict]) -> str:
        """Generate summary using LLM."""
        # Format conversation
        conversation = "\n".join(
            [f"{msg['role'].upper()}: {msg['content']}" for msg in messages]
        )

        prompt = f"""Summarize this conversation in 2-3 concise sentences. Focus on:
- Main topics discussed
- Important context or user preferences
- Key information to remember

Conversation:
{conversation}

Summary (2-3 sentences):"""

        try:
            response = await self._client.chat.completions.create(
                model=self._model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.3,
            )

            return response.choices[0].message.content.strip()

        except Exception as e:
            # Fallback
            return f"Previous conversation: {len(messages)} messages about various topics."

    def load_summary(self, user_id: str, summary: ConversationSummary) -> None:
        """Load summary from database into cache."""
        self._summaries[user_id] = summary

    def clear_summary(self, user_id: str) -> None:
        """Clear cached summary for user."""
        self._summaries.pop(user_id, None)

2. Modify meshai/history.py

Add summary storage methods:

# Add to ConversationHistory class

async def initialize(self) -> None:
    """Initialize database and create tables."""
    self._db = await aiosqlite.connect(self._db_path)

    # Existing conversations table
    await self._db.execute("""
        CREATE TABLE IF NOT EXISTS conversations (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            user_id TEXT NOT NULL,
            role TEXT NOT NULL,
            content TEXT NOT NULL,
            timestamp REAL NOT NULL
        )
    """)

    await self._db.execute("""
        CREATE INDEX IF NOT EXISTS idx_user_timestamp
        ON conversations (user_id, timestamp)
    """)

    # NEW: Summaries table
    await self._db.execute("""
        CREATE TABLE IF NOT EXISTS conversation_summaries (
            user_id TEXT PRIMARY KEY,
            summary TEXT NOT NULL,
            message_count INTEGER NOT NULL,
            updated_at REAL NOT NULL
        )
    """)

    await self._db.commit()
    logger.info(f"Conversation history initialized at {self._db_path}")


async def store_summary(
    self, user_id: str, summary: str, message_count: int
) -> None:
    """Store conversation summary.

    Args:
        user_id: Node ID of user
        summary: Summary text
        message_count: Number of messages summarized
    """
    if not self._db:
        raise RuntimeError("Database not initialized")

    async with self._lock:
        await self._db.execute(
            """
            INSERT OR REPLACE INTO conversation_summaries
            (user_id, summary, message_count, updated_at)
            VALUES (?, ?, ?, ?)
            """,
            (user_id, summary, message_count, time.time()),
        )
        await self._db.commit()


async def get_summary(self, user_id: str) -> Optional[dict]:
    """Get conversation summary for user.

    Args:
        user_id: Node ID of user

    Returns:
        Dict with 'summary', 'message_count', 'updated_at' or None
    """
    if not self._db:
        raise RuntimeError("Database not initialized")

    async with self._lock:
        cursor = await self._db.execute(
            """
            SELECT summary, message_count, updated_at
            FROM conversation_summaries
            WHERE user_id = ?
            """,
            (user_id,),
        )
        row = await cursor.fetchone()

    if not row:
        return None

    return {
        "summary": row[0],
        "message_count": row[1],
        "updated_at": row[2],
    }


async def clear_summary(self, user_id: str) -> None:
    """Clear summary for user (e.g., on history reset).

    Args:
        user_id: Node ID of user
    """
    if not self._db:
        raise RuntimeError("Database not initialized")

    async with self._lock:
        await self._db.execute(
            "DELETE FROM conversation_summaries WHERE user_id = ?",
            (user_id,),
        )
        await self._db.commit()

3. Modify meshai/backends/openai_backend.py

Integrate memory manager:

"""OpenAI-compatible LLM backend with rolling summary memory."""

import logging
from typing import Optional

from openai import AsyncOpenAI

from ..config import LLMConfig
from ..memory import RollingSummaryMemory
from .base import LLMBackend

logger = logging.getLogger(__name__)


class OpenAIBackend(LLMBackend):
    """OpenAI-compatible backend with intelligent memory management."""

    def __init__(self, config: LLMConfig, api_key: str):
        """Initialize OpenAI backend.

        Args:
            config: LLM configuration
            api_key: API key to use
        """
        self.config = config
        self._client = AsyncOpenAI(
            api_key=api_key,
            base_url=config.base_url,
        )

        # Initialize rolling summary memory
        self._memory = RollingSummaryMemory(
            client=self._client,
            model=config.model,
            window_size=4,  # Keep last 4 exchanges (8 messages)
            summarize_threshold=8,  # Re-summarize after 8 new messages
        )

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str = None,  # NEW: optional for backward compatibility
        max_tokens: int = 300,
    ) -> str:
        """Generate a response using OpenAI-compatible API.

        Args:
            messages: Conversation history
            system_prompt: System prompt
            user_id: User identifier (for memory management)
            max_tokens: Maximum tokens to generate

        Returns:
            Generated response
        """
        # If no user_id, use old behavior (send full history)
        if not user_id:
            full_messages = [{"role": "system", "content": system_prompt}]
            full_messages.extend(messages)
        else:
            # Use memory manager to optimize context
            summary, recent_messages = await self._memory.get_context_messages(
                user_id=user_id,
                full_history=messages,
            )

            # Build optimized message list
            if summary:
                # Long conversation: system + summary + recent
                enhanced_system = f"""{system_prompt}

Previous conversation summary: {summary}"""
                full_messages = [{"role": "system", "content": enhanced_system}]
                full_messages.extend(recent_messages)

                logger.debug(
                    f"Using summary + {len(recent_messages)} recent messages "
                    f"(total history: {len(messages)})"
                )
            else:
                # Short conversation: system + all messages
                full_messages = [{"role": "system", "content": system_prompt}]
                full_messages.extend(messages)

        try:
            response = await self._client.chat.completions.create(
                model=self.config.model,
                messages=full_messages,
                max_tokens=max_tokens,
                temperature=0.7,
            )

            content = response.choices[0].message.content
            return content.strip() if content else ""

        except Exception as e:
            logger.error(f"OpenAI API error: {e}")
            raise

    def load_summary_cache(self, user_id: str, summary_data: dict) -> None:
        """Load summary into memory cache (called on startup).

        Args:
            user_id: User identifier
            summary_data: Dict with 'summary', 'message_count', 'updated_at'
        """
        from ..memory import ConversationSummary

        summary = ConversationSummary(
            summary=summary_data["summary"],
            message_count=summary_data["message_count"],
            last_updated=summary_data["updated_at"],
        )
        self._memory.load_summary(user_id, summary)

    def clear_summary_cache(self, user_id: str) -> None:
        """Clear summary cache for user."""
        self._memory.clear_summary(user_id)

    # ... rest of methods unchanged ...

4. Modify meshai/responder.py

Pass user_id to backend and persist summaries:

# In the generate_response method

async def generate_response(self, user_id: str, message: str) -> str:
    """Generate LLM response with optimized memory."""

    # Add user message to history
    await self.history.add_message(user_id, "user", message)

    # Get conversation history
    history = await self.history.get_history_for_llm(user_id)

    # Generate response with user_id for memory management
    response = await self.backend.generate(
        messages=history,
        system_prompt=self.system_prompt,
        user_id=user_id,  # NEW: enables memory optimization
        max_tokens=300,
    )

    # Add assistant response to history
    await self.history.add_message(user_id, "assistant", response)

    # Persist summary if one was created
    # The memory manager caches it, we need to save to DB
    summary_data = await self._get_current_summary(user_id)
    if summary_data:
        await self.history.store_summary(
            user_id,
            summary_data["summary"],
            summary_data["message_count"],
        )

    return response


async def _get_current_summary(self, user_id: str) -> Optional[dict]:
    """Get current summary from memory manager if it exists."""
    # Access the memory manager's cache
    if hasattr(self.backend, "_memory"):
        summary = self.backend._memory._summaries.get(user_id)
        if summary:
            return {
                "summary": summary.summary,
                "message_count": summary.message_count,
                "updated_at": summary.last_updated,
            }
    return None

5. Modify meshai/commands/reset.py

Clear summaries when resetting history:

async def execute(self, sender_id: str, args: list[str]) -> str:
    """Reset conversation history."""
    count = await self.responder.history.clear_history(sender_id)

    # NEW: Also clear summary
    await self.responder.history.clear_summary(sender_id)
    if hasattr(self.responder.backend, "clear_summary_cache"):
        self.responder.backend.clear_summary_cache(sender_id)

    return f"Cleared {count} messages from your history."

Configuration

Add to meshai/config.py:

@dataclass
class MemoryConfig:
    """Memory management configuration."""

    # Rolling summary settings
    window_size: int = 4  # Recent message pairs to keep
    summarize_threshold: int = 8  # Messages before re-summarizing

    # When to enable summaries
    min_messages_for_summary: int = 10  # Start summarizing after this many

Testing

# Test script
import asyncio
from meshai.backends.openai_backend import OpenAIBackend
from meshai.config import LLMConfig

async def test():
    config = LLMConfig(
        backend="openai",
        base_url="http://192.168.1.239:8000/v1",
        model="gpt-4o-mini"
    )

    backend = OpenAIBackend(config, "your-key")

    # Simulate long conversation
    messages = []
    for i in range(20):
        messages.append({"role": "user", "content": f"Question {i}"})
        messages.append({"role": "assistant", "content": f"Answer {i}"})

    # Generate - should use summary
    response = await backend.generate(
        messages=messages,
        system_prompt="You are helpful.",
        user_id="!test123",
        max_tokens=100
    )

    print(f"Response: {response}")
    print(f"Sent {len(messages)} messages, but only ~10 used in context")

asyncio.run(test())

Expected Results

Token Usage Comparison

Before (full history):

User message 1-20: ~2000 tokens
System prompt: ~50 tokens
Total: ~2050 tokens per request

After (with summary):

System prompt: ~50 tokens
Summary: ~100 tokens
Recent 8 messages: ~400 tokens
Total: ~550 tokens per request

Savings: ~73% token reduction

Performance Impact

  • Summary generation: ~1-2s every 8-10 messages (amortized)
  • Regular requests: No added latency
  • Storage: ~100 bytes per summary in SQLite

Tuning Parameters

window_size

  • Smaller (2-3): More aggressive summarization, max token savings
  • Larger (5-6): More context, less summarization
  • Recommended: 4 (last 4 exchanges = 8 messages)

summarize_threshold

  • Smaller (4-6): Frequent re-summarization, more current
  • Larger (10-12): Less summarization overhead
  • Recommended: 8 (re-summarize after 8 new messages)

For MeshAI specifically:

  • Messages are tiny (150 chars max)
  • window_size=4 gives ~600 chars of recent context
  • summarize_threshold=8 balances overhead vs accuracy

Migration Path

  1. Phase 1: Add code, test with new users
  2. Phase 2: Run in parallel (old + new backend)
  3. Phase 3: Migrate existing users (generate summaries for existing history)
  4. Phase 4: Remove old full-history code path

No data loss - summaries stored in DB, can regenerate anytime.


Maintenance

Monitor summary quality:

-- Check summaries
SELECT user_id, summary, message_count, updated_at
FROM conversation_summaries
ORDER BY updated_at DESC;

Regenerate summary:

# Clear cache + DB, will regenerate on next request
await history.clear_summary(user_id)
backend.clear_summary_cache(user_id)

Adjust if summaries too short/long:

  • Modify prompt in _summarize()
  • Adjust max_tokens=150 for summaries
  • Change temperature (lower = more consistent)

Future Enhancements

  1. Hybrid approach: Summary + semantic search for very long histories
  2. User preferences: Store separate from summary (e.g., "likes weather in metric")
  3. Multi-level summaries: Summarize summaries for years-long conversations
  4. Summary quality scoring: Validate summaries maintain key information

But start simple - this gets 80% of the benefit with 20% of the complexity.