mirror of https://github.com/zvx-echo6/meshai.git synced 2026-05-21 23:24:44 +02:00

Matt fd3f995ebb Initial commit: MeshAI - LLM-powered Meshtastic assistant

Features:
- Multi-backend LLM support (OpenAI, Anthropic, Google)
- Rolling summary memory for token optimization (~70-80% reduction)
- Per-user conversation history with SQLite persistence
- Bang commands (!help, !ping, !reset, !status, !weather)
- Meshtastic integration via serial or TCP
- Message chunking for mesh network constraints (150 char limit)
- Rate limiting to prevent network congestion
- Rich TUI configurator
- Docker support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-15 11:53:46 -07:00

29 KiB

Raw Blame History

LLM Conversation Memory Research for MeshAI

Current Implementation Analysis

Current approach: MeshAI stuffs full conversation history into every LLM API call

Storage: SQLite via aiosqlite
Retrieval: get_history_for_llm() returns all messages (up to max_messages_per_user * 2)
Backend: OpenAI-compatible API (works with LiteLLM, local models)
Context: 150 char max per message, per-user conversations

Problem: Inefficient - sends entire history even when unnecessary, wastes tokens and latency.

1. LangChain Memory Modules

Installation

pip install langchain langchain-community langchain-openai

A. ConversationBufferMemory (Simplest)

What it does: Stores raw messages in memory, returns all messages.

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

# Initialize
llm = ChatOpenAI(
    base_url="http://192.168.1.239:8000/v1",  # LiteLLM
    api_key="your-key",
    model="gpt-4o-mini"
)

memory = ConversationBufferMemory()

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False
)

# Use it
response = chain.predict(input="What's the weather?")
print(response)

# Access history
print(memory.load_memory_variables({}))
# {'history': 'Human: What's the weather?\nAI: ...'}

Integration with MeshAI:

# In meshai/backends/openai_backend.py
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

class OpenAIBackendWithMemory(LLMBackend):
    def __init__(self, config: LLMConfig, api_key: str):
        self.config = config
        self._llm = ChatOpenAI(
            base_url=config.base_url,
            api_key=api_key,
            model=config.model,
            temperature=0.7,
            max_tokens=300
        )
        # Per-user memory storage
        self._user_memories: dict[str, ConversationBufferMemory] = {}

    def _get_memory(self, user_id: str) -> ConversationBufferMemory:
        if user_id not in self._user_memories:
            self._user_memories[user_id] = ConversationBufferMemory()
        return self._user_memories[user_id]

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str,  # NEW: need user_id for memory
        max_tokens: int = 300,
    ) -> str:
        memory = self._get_memory(user_id)

        # Create chain with memory
        chain = ConversationChain(
            llm=self._llm,
            memory=memory,
            verbose=False
        )

        # Extract last user message
        last_msg = messages[-1]["content"]

        # Generate with memory
        response = await chain.apredict(input=last_msg)
        return response.strip()

Pros:

Dead simple, drop-in replacement
Works with any OpenAI-compatible API
No external dependencies
LangChain handles message formatting

Cons:

Still sends full history (no real efficiency gain)
Stores everything in RAM (lost on restart)
Need to manage per-user memory dicts
Adds LangChain dependency (~50MB)

Verdict: Not worth it - adds complexity without solving core problem.

B. ConversationBufferWindowMemory (Better)

What it does: Only keeps last N messages in context.

from langchain.memory import ConversationBufferWindowMemory

# Keep only last 5 interactions (10 messages = 5 pairs)
memory = ConversationBufferWindowMemory(k=5)

chain = ConversationChain(
    llm=llm,
    memory=memory
)

# Only last 5 exchanges sent to LLM
response = chain.predict(input="Hello")

Integration:

class OpenAIBackendWithWindow(LLMBackend):
    def __init__(self, config: LLMConfig, api_key: str):
        self.config = config
        self._llm = ChatOpenAI(
            base_url=config.base_url,
            api_key=api_key,
            model=config.model
        )
        # Per-user windowed memory
        self._user_memories: dict[str, ConversationBufferWindowMemory] = {}
        self._window_size = 5  # Last 5 exchanges

    def _get_memory(self, user_id: str) -> ConversationBufferWindowMemory:
        if user_id not in self._user_memories:
            self._user_memories[user_id] = ConversationBufferWindowMemory(
                k=self._window_size
            )
        return self._user_memories[user_id]

Pros:

Simple sliding window approach
Reduces token usage automatically
Works with any OpenAI-compatible API
Configurable window size

Cons:

Still in-memory only (lost on restart)
Forgets old context completely
Need to integrate with existing SQLite storage
Adds LangChain dependency

Verdict: Better than full buffer, but loses long-term context.

C. ConversationSummaryMemory (Most Interesting)

What it does: Uses LLM to summarize conversation, keeps summary + recent messages.

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)

chain = ConversationChain(
    llm=llm,
    memory=memory
)

# After multiple messages, memory contains:
# - Summary of old conversation
# - Recent raw messages
response = chain.predict(input="What did we talk about?")
# AI can reference both summary and recent context

Integration with SQLite persistence:

from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

class OpenAIBackendWithSummary(LLMBackend):
    def __init__(self, config: LLMConfig, api_key: str, history: ConversationHistory):
        self.config = config
        self.history = history  # Existing SQLite history

        self._llm = ChatOpenAI(
            base_url=config.base_url,
            api_key=api_key,
            model=config.model
        )

        # Per-user summaries (load from DB)
        self._user_summaries: dict[str, str] = {}
        self._window_size = 4  # Keep last 4 messages raw

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str,
        max_tokens: int = 300,
    ) -> str:
        # Get full history from SQLite
        full_history = await self.history.get_history(user_id)

        if len(full_history) <= self._window_size * 2:
            # Small conversation, just use raw messages
            context_messages = messages
        else:
            # Large conversation: summarize old + keep recent
            old_messages = full_history[:-self._window_size * 2]
            recent_messages = full_history[-self._window_size * 2:]

            # Get or create summary
            summary = await self._get_summary(user_id, old_messages)

            # Build context: system + summary + recent messages
            context_messages = [
                {"role": "system", "content": f"{system_prompt}\n\nConversation summary: {summary}"}
            ]
            context_messages.extend([
                {"role": msg.role, "content": msg.content}
                for msg in recent_messages
            ])

        # Generate response
        response = await self._client.chat.completions.create(
            model=self.config.model,
            messages=context_messages,
            max_tokens=max_tokens,
            temperature=0.7,
        )

        return response.choices[0].message.content.strip()

    async def _get_summary(self, user_id: str, messages: list) -> str:
        """Summarize old messages using LLM."""
        if user_id in self._user_summaries:
            return self._user_summaries[user_id]

        # Create summary prompt
        conversation_text = "\n".join([
            f"{msg.role}: {msg.content}" for msg in messages
        ])

        summary_prompt = f"""Summarize this conversation in 2-3 sentences, focusing on key topics and user preferences:

{conversation_text}

Summary:"""

        response = await self._client.chat.completions.create(
            model=self.config.model,
            messages=[{"role": "user", "content": summary_prompt}],
            max_tokens=150,
            temperature=0.3,
        )

        summary = response.choices[0].message.content.strip()

        # Store in SQLite
        await self._store_summary(user_id, summary)
        self._user_summaries[user_id] = summary

        return summary

    async def _store_summary(self, user_id: str, summary: str):
        """Store summary in SQLite for persistence."""
        # Add new table for summaries
        await self.history._db.execute("""
            CREATE TABLE IF NOT EXISTS conversation_summaries (
                user_id TEXT PRIMARY KEY,
                summary TEXT NOT NULL,
                updated_at REAL NOT NULL
            )
        """)

        await self.history._db.execute("""
            INSERT OR REPLACE INTO conversation_summaries (user_id, summary, updated_at)
            VALUES (?, ?, ?)
        """, (user_id, summary, time.time()))

        await self.history._db.commit()

Pros:

Best balance: compact summary + recent context
Significantly reduces token usage for long conversations
Works with existing OpenAI-compatible APIs
Preserves long-term context
Can persist summaries in SQLite

Cons:

Costs extra tokens to generate summaries
Adds latency when summarizing
Need to decide when to re-summarize
Still requires LangChain

Verdict: BEST LANGCHAIN OPTION for MeshAI - balances efficiency and context retention.

2. LlamaIndex

Installation

pip install llama-index llama-index-llms-openai

Chat Memory

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

# Initialize
llm = OpenAI(
    api_base="http://192.168.1.239:8000/v1",
    api_key="your-key",
    model="gpt-4o-mini"
)

# Create memory buffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

# Add messages
memory.put(ChatMessage(role="user", content="Hello"))
memory.put(ChatMessage(role="assistant", content="Hi there!"))

# Get messages for LLM
messages = memory.get()

# Generate with context
response = llm.chat(messages)

Integration:

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

class LlamaIndexBackend(LLMBackend):
    def __init__(self, config: LLMConfig, api_key: str):
        self.config = config
        self._llm = OpenAI(
            api_base=config.base_url,
            api_key=api_key,
            model=config.model
        )

        # Per-user memory buffers
        self._user_memories: dict[str, ChatMemoryBuffer] = {}
        self._token_limit = 1500

    def _get_memory(self, user_id: str) -> ChatMemoryBuffer:
        if user_id not in self._user_memories:
            self._user_memories[user_id] = ChatMemoryBuffer.from_defaults(
                token_limit=self._token_limit
            )
        return self._user_memories[user_id]

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str,
        max_tokens: int = 300,
    ) -> str:
        memory = self._get_memory(user_id)

        # Add new message to memory
        user_msg = messages[-1]["content"]
        memory.put(ChatMessage(role="user", content=user_msg))

        # Get messages within token limit
        context_messages = memory.get()

        # Add system prompt
        full_messages = [ChatMessage(role="system", content=system_prompt)]
        full_messages.extend(context_messages)

        # Generate
        response = self._llm.chat(full_messages)

        # Store assistant response
        memory.put(ChatMessage(role="assistant", content=response.message.content))

        return response.message.content

Pros:

Token-aware buffering (auto-prunes to stay under limit)
Simple API
Works with OpenAI-compatible backends
Better than manual message counting

Cons:

In-memory only (need custom persistence)
Heavy dependency (~100MB)
Overkill for simple chat
Less mature than LangChain

Verdict: Token limiting is nice, but not worth the dependency weight.

3. MemGPT / Letta (Self-Editing Memory)

Installation

pip install letta

Usage

What it does: Agent manages its own memory, decides what to keep/forget/summarize.

from letta import create_client

client = create_client()

# Create agent with memory management
agent = client.create_agent(
    name="meshai_agent",
    llm_config={
        "model": "gpt-4o-mini",
        "model_endpoint": "http://192.168.1.239:8000/v1"
    },
    embedding_config={
        "embedding_endpoint_type": "openai",
        "embedding_model": "text-embedding-ada-002"
    }
)

# Agent manages memory automatically
response = client.send_message(
    agent_id=agent.id,
    message="What's the weather?",
    role="user"
)

print(response.messages[-1].text)

Architecture:

Core memory: Persistent facts the agent always sees
Recall memory: Searchable vector store of past conversations
Archival memory: Long-term storage

Pros:

Most sophisticated memory system
Agent decides what's important
Built-in vector search
Handles very long conversations

Cons:

HEAVY (~200MB+ with dependencies)
Requires vector embeddings (extra API calls/costs)
Complex setup and learning curve
Overkill for 150-char mesh messages
Opinionated architecture (hard to integrate)

Verdict: Way too heavy for MeshAI. Only worth it for complex, long-form agents.

4. Vector Stores (Semantic Memory)

ChromaDB (Simplest)

pip install chromadb

import chromadb
from chromadb.config import Settings

# Initialize
client = chromadb.Client(Settings(
    persist_directory="/path/to/meshai/memory",
    anonymized_telemetry=False
))

# Create collection per user
collection = client.get_or_create_collection(
    name=f"user_{user_id}",
    metadata={"user_id": user_id}
)

# Add messages
collection.add(
    documents=["What's the weather in Seattle?"],
    metadatas=[{"role": "user", "timestamp": time.time()}],
    ids=["msg_1"]
)

# Semantic search for relevant past messages
results = collection.query(
    query_texts=["weather"],
    n_results=3
)

# Use retrieved messages as context
relevant_context = results['documents'][0]

Integration:

import chromadb
from chromadb.config import Settings

class VectorMemoryBackend(LLMBackend):
    def __init__(self, config: LLMConfig, api_key: str, db_path: str):
        self.config = config
        self._client = AsyncOpenAI(
            api_key=api_key,
            base_url=config.base_url,
        )

        # ChromaDB for semantic memory
        self._chroma = chromadb.Client(Settings(
            persist_directory=db_path,
            anonymized_telemetry=False
        ))

        self._window_size = 4  # Keep last 4 messages raw

    def _get_collection(self, user_id: str):
        return self._chroma.get_or_create_collection(
            name=f"user_{user_id.replace('!', '_')}"  # Sanitize ID
        )

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str,
        max_tokens: int = 300,
    ) -> str:
        collection = self._get_collection(user_id)

        # Get current query
        current_query = messages[-1]["content"]

        # Search for semantically similar past messages
        try:
            results = collection.query(
                query_texts=[current_query],
                n_results=3,
                where={"role": "assistant"}  # Get past responses
            )
            relevant_history = results['documents'][0] if results['documents'] else []
        except:
            relevant_history = []

        # Build context: system + relevant history + recent messages
        context = system_prompt
        if relevant_history:
            context += "\n\nRelevant past exchanges:\n"
            context += "\n".join(relevant_history[:2])  # Top 2 relevant

        context_messages = [{"role": "system", "content": context}]
        context_messages.extend(messages[-self._window_size*2:])  # Recent messages

        # Generate
        response = await self._client.chat.completions.create(
            model=self.config.model,
            messages=context_messages,
            max_tokens=max_tokens,
            temperature=0.7,
        )

        reply = response.choices[0].message.content.strip()

        # Store in vector DB
        msg_id = f"{user_id}_{int(time.time()*1000)}"
        collection.add(
            documents=[f"User: {current_query}\nAssistant: {reply}"],
            metadatas=[{"role": "assistant", "timestamp": time.time()}],
            ids=[msg_id]
        )

        return reply

Pros:

Semantic search - finds relevant past context
Works great for sparse conversations
Persistent storage
Lightweight (~20MB)
No extra API calls (uses local embeddings)

Cons:

Adds dependency
Embedding computation overhead
May surface irrelevant "similar" messages
Overkill for very short conversations

Verdict: Interesting for long-term memory, but maybe overkill for 150-char messages.

Qdrant (Production Alternative)

pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Can run in-memory or with server
client = QdrantClient(path="/path/to/meshai/qdrant")

# Create collection
client.create_collection(
    collection_name="meshai_memory",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Store with embedding (from OpenAI or local model)
client.upsert(
    collection_name="meshai_memory",
    points=[
        PointStruct(
            id=msg_id,
            vector=embedding,  # 1536-dim from text-embedding-ada-002
            payload={"user_id": user_id, "content": content, "role": role}
        )
    ]
)

# Search
results = client.search(
    collection_name="meshai_memory",
    query_vector=query_embedding,
    query_filter={"user_id": user_id},
    limit=3
)

Pros:

Production-ready, fast
Better than ChromaDB for scale
Rich filtering options
Can run in-memory or server mode

Cons:

More complex than ChromaDB
Still requires embeddings
Heavier dependency

Verdict: Better than ChromaDB for production, but still overkill for MeshAI's use case.

5. Simple Rolling Summary (RECOMMENDED)

The lightest, most practical approach for MeshAI.

Implementation

import asyncio
import time
from dataclasses import dataclass
from typing import Optional
from openai import AsyncOpenAI

@dataclass
class ConversationSummary:
    """Summary of conversation history."""
    summary: str
    last_updated: float
    message_count: int

class SimpleRollingSummary:
    """Lightweight rolling summary memory manager."""

    def __init__(
        self,
        client: AsyncOpenAI,
        model: str,
        window_size: int = 4,  # Recent messages to keep raw
        summarize_threshold: int = 10,  # Messages before summarizing
    ):
        self._client = client
        self._model = model
        self._window_size = window_size
        self._summarize_threshold = summarize_threshold

        # Per-user summaries (would be in SQLite in production)
        self._summaries: dict[str, ConversationSummary] = {}

    async def get_context_messages(
        self,
        user_id: str,
        full_history: list[dict],  # From SQLite
    ) -> list[dict]:
        """Get optimized context messages (summary + recent)."""

        # If conversation is short, just return it
        if len(full_history) <= self._window_size * 2:
            return full_history

        # Split into old and recent
        old_messages = full_history[:-self._window_size * 2]
        recent_messages = full_history[-self._window_size * 2:]

        # Get or create summary of old messages
        summary = await self._get_or_create_summary(user_id, old_messages)

        # Return summary as system message + recent raw messages
        context = [
            {"role": "system", "content": f"Previous conversation summary: {summary.summary}"}
        ]
        context.extend(recent_messages)

        return context

    async def _get_or_create_summary(
        self,
        user_id: str,
        messages: list[dict],
    ) -> ConversationSummary:
        """Get existing summary or create new one."""

        # Check if we have a recent summary
        if user_id in self._summaries:
            existing = self._summaries[user_id]

            # If summary covers roughly the same messages, reuse it
            if abs(existing.message_count - len(messages)) < self._summarize_threshold:
                return existing

        # Create new summary
        summary_text = await self._summarize(messages)

        summary = ConversationSummary(
            summary=summary_text,
            last_updated=time.time(),
            message_count=len(messages)
        )

        self._summaries[user_id] = summary
        return summary

    async def _summarize(self, messages: list[dict]) -> str:
        """Summarize a list of messages using the LLM."""

        # Format conversation
        conversation = "\n".join([
            f"{msg['role'].upper()}: {msg['content']}"
            for msg in messages
        ])

        prompt = f"""Summarize this conversation in 2-3 concise sentences. Focus on:
- Main topics discussed
- Any important user preferences or context
- Key information that should be remembered

Conversation:
{conversation}

Summary (2-3 sentences):"""

        try:
            response = await self._client.chat.completions.create(
                model=self._model,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=150,
                temperature=0.3,
            )

            return response.choices[0].message.content.strip()

        except Exception as e:
            # Fallback: simple truncation if summarization fails
            return f"Previous conversation covered {len(messages)} messages."

Integration with MeshAI

# In meshai/backends/openai_backend.py

class OpenAIBackend(LLMBackend):
    """OpenAI-compatible backend with rolling summary memory."""

    def __init__(self, config: LLMConfig, api_key: str):
        self.config = config
        self._client = AsyncOpenAI(
            api_key=api_key,
            base_url=config.base_url,
        )

        # Add rolling summary manager
        self._memory = SimpleRollingSummary(
            client=self._client,
            model=config.model,
            window_size=4,  # Keep last 4 exchanges (8 messages)
            summarize_threshold=10,  # Summarize after 10 messages
        )

    async def generate(
        self,
        messages: list[dict],
        system_prompt: str,
        user_id: str,  # NEW: need user_id
        max_tokens: int = 300,
    ) -> str:
        """Generate with optimized context."""

        # Get optimized context (summary + recent)
        context_messages = await self._memory.get_context_messages(
            user_id=user_id,
            full_history=messages,
        )

        # Add system prompt
        full_messages = [{"role": "system", "content": system_prompt}]
        full_messages.extend(context_messages)

        # Generate
        response = await self._client.chat.completions.create(
            model=self.config.model,
            messages=full_messages,
            max_tokens=max_tokens,
            temperature=0.7,
        )

        return response.choices[0].message.content.strip()

Persist Summaries in SQLite

# Add to meshai/history.py

async def store_summary(self, user_id: str, summary: str, message_count: int) -> None:
    """Store conversation summary."""
    if not self._db:
        raise RuntimeError("Database not initialized")

    async with self._lock:
        await self._db.execute("""
            CREATE TABLE IF NOT EXISTS conversation_summaries (
                user_id TEXT PRIMARY KEY,
                summary TEXT NOT NULL,
                message_count INTEGER NOT NULL,
                updated_at REAL NOT NULL
            )
        """)

        await self._db.execute("""
            INSERT OR REPLACE INTO conversation_summaries
            (user_id, summary, message_count, updated_at)
            VALUES (?, ?, ?, ?)
        """, (user_id, summary, message_count, time.time()))

        await self._db.commit()

async def get_summary(self, user_id: str) -> Optional[ConversationSummary]:
    """Retrieve conversation summary."""
    if not self._db:
        raise RuntimeError("Database not initialized")

    async with self._lock:
        cursor = await self._db.execute("""
            SELECT summary, message_count, updated_at
            FROM conversation_summaries
            WHERE user_id = ?
        """, (user_id,))

        row = await cursor.fetchone()

    if not row:
        return None

    return ConversationSummary(
        summary=row[0],
        message_count=row[1],
        last_updated=row[2]
    )

Pros:

NO external dependencies
Works with existing SQLite storage
Significantly reduces token usage
Simple to understand and maintain
Preserves recent context + summarized history
Configurable window and threshold

Cons:

Costs tokens to generate summaries
Slight latency when summarizing
Need to tune window/threshold params

Verdict: BEST OPTION for MeshAI - simple, effective, no dependencies.

Comparison Matrix

Approach	Dependencies	Complexity	Token Savings	Persistence	OpenAI-Compatible
LangChain BufferMemory	langchain (~50MB)	Low	None	No	Yes
LangChain WindowMemory	langchain (~50MB)	Low	Medium	No	Yes
LangChain SummaryMemory	langchain (~50MB)	Medium	High	No (DIY)	Yes
LlamaIndex	llama-index (~100MB)	Medium	Medium	No (DIY)	Yes
MemGPT/Letta	letta (~200MB)	Very High	Very High	Yes	Yes (complex)
ChromaDB	chromadb (~20MB)	Medium	Medium	Yes	Yes
Qdrant	qdrant (~30MB)	High	Medium	Yes	Yes
Rolling Summary (DIY)	None	Low	High	Yes (SQLite)	Yes

RECOMMENDATION

Use Simple Rolling Summary (Option 5) for MeshAI because:

Zero dependencies - No LangChain, LlamaIndex, or vector stores
Works with current stack - Uses existing AsyncOpenAI client and SQLite
Significant efficiency gains - Keeps last 4-6 exchanges + summary of older messages
Persistent - Summaries stored in SQLite, survive restarts
Simple to tune - Two params: window_size and summarize_threshold
OpenAI-compatible - Works with LiteLLM, local models, anything
Lightweight - ~100 lines of code

Implementation Steps

Add SimpleRollingSummary class (shown above)
Add summary table to SQLite schema
Modify OpenAIBackend.generate() to use _memory.get_context_messages()
Add summary storage methods to ConversationHistory
Configure: window_size=4 (8 messages), summarize_threshold=10

Expected Performance

Before (full history):

20 message pairs = ~3000 tokens sent every request
Latency: higher, costs more

After (rolling summary):

Summary (~100 tokens) + 4 recent pairs (~400 tokens) = ~500 tokens
83% token reduction for long conversations
Faster responses, lower costs

When to Consider Alternatives

Vector stores (ChromaDB): If you need semantic search across users or topics
LangChain SummaryMemory: If you want a batteries-included solution (accept dependency)
MemGPT: If conversations become complex multi-day dialogues (they won't on mesh)

Example Usage

# Initialize
backend = OpenAIBackend(config, api_key)

# First few messages - full history sent
await backend.generate(
    messages=[
        {"role": "user", "content": "What's the weather?"},
        {"role": "assistant", "content": "It's sunny!"},
        {"role": "user", "content": "Should I bring an umbrella?"},
        {"role": "assistant", "content": "No need, it's clear!"},
        # ... 6 more exchanges ...
    ],
    system_prompt="You are a helpful assistant.",
    user_id="!abc123",
)

# After 10+ messages - summary + recent sent
# Context sent to LLM:
# [
#   {"role": "system", "content": "Previous conversation summary: User asked about weather and outdoor activities. Confirmed sunny weather, no rain expected."},
#   {"role": "user", "content": "Should I bring an umbrella?"},
#   {"role": "assistant", "content": "No need, it's clear!"},
#   ... (last 4 exchanges)
# ]

Code Files to Modify

meshai/memory.py (NEW) - Add SimpleRollingSummary class
meshai/history.py - Add summary storage methods + table schema
meshai/backends/openai_backend.py - Integrate memory manager
meshai/responder.py - Pass user_id to backend.generate()
meshai/config.py - Add config for window_size, summarize_threshold

Let me know if you want me to implement this!

29 KiB Raw Blame History

LLM Conversation Memory Research for MeshAI

Current Implementation Analysis

1. LangChain Memory Modules

Installation

A. ConversationBufferMemory (Simplest)

B. ConversationBufferWindowMemory (Better)

C. ConversationSummaryMemory (Most Interesting)

2. LlamaIndex

Installation

Chat Memory

3. MemGPT / Letta (Self-Editing Memory)

Installation

Usage

4. Vector Stores (Semantic Memory)

ChromaDB (Simplest)

Qdrant (Production Alternative)

5. Simple Rolling Summary (RECOMMENDED)

Implementation

Integration with MeshAI

Persist Summaries in SQLite

Comparison Matrix

RECOMMENDATION

Implementation Steps

Expected Performance

When to Consider Alternatives

Example Usage

Code Files to Modify

29 KiB

Raw Blame History