meshai/MEMORY_RESEARCH.md

1024 lines
29 KiB
Markdown
Raw Normal View History

# LLM Conversation Memory Research for MeshAI
## Current Implementation Analysis
**Current approach:** MeshAI stuffs full conversation history into every LLM API call
- Storage: SQLite via aiosqlite
- Retrieval: `get_history_for_llm()` returns all messages (up to `max_messages_per_user * 2`)
- Backend: OpenAI-compatible API (works with LiteLLM, local models)
- Context: 150 char max per message, per-user conversations
**Problem:** Inefficient - sends entire history even when unnecessary, wastes tokens and latency.
---
## 1. LangChain Memory Modules
### Installation
```bash
pip install langchain langchain-community langchain-openai
```
### A. ConversationBufferMemory (Simplest)
**What it does:** Stores raw messages in memory, returns all messages.
```python
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
# Initialize
llm = ChatOpenAI(
base_url="http://192.168.1.239:8000/v1", # LiteLLM
api_key="your-key",
model="gpt-4o-mini"
)
memory = ConversationBufferMemory()
chain = ConversationChain(
llm=llm,
memory=memory,
verbose=False
)
# Use it
response = chain.predict(input="What's the weather?")
print(response)
# Access history
print(memory.load_memory_variables({}))
# {'history': 'Human: What's the weather?\nAI: ...'}
```
**Integration with MeshAI:**
```python
# In meshai/backends/openai_backend.py
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
class OpenAIBackendWithMemory(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model,
temperature=0.7,
max_tokens=300
)
# Per-user memory storage
self._user_memories: dict[str, ConversationBufferMemory] = {}
def _get_memory(self, user_id: str) -> ConversationBufferMemory:
if user_id not in self._user_memories:
self._user_memories[user_id] = ConversationBufferMemory()
return self._user_memories[user_id]
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str, # NEW: need user_id for memory
max_tokens: int = 300,
) -> str:
memory = self._get_memory(user_id)
# Create chain with memory
chain = ConversationChain(
llm=self._llm,
memory=memory,
verbose=False
)
# Extract last user message
last_msg = messages[-1]["content"]
# Generate with memory
response = await chain.apredict(input=last_msg)
return response.strip()
```
**Pros:**
- Dead simple, drop-in replacement
- Works with any OpenAI-compatible API
- No external dependencies
- LangChain handles message formatting
**Cons:**
- Still sends full history (no real efficiency gain)
- Stores everything in RAM (lost on restart)
- Need to manage per-user memory dicts
- Adds LangChain dependency (~50MB)
**Verdict:** Not worth it - adds complexity without solving core problem.
---
### B. ConversationBufferWindowMemory (Better)
**What it does:** Only keeps last N messages in context.
```python
from langchain.memory import ConversationBufferWindowMemory
# Keep only last 5 interactions (10 messages = 5 pairs)
memory = ConversationBufferWindowMemory(k=5)
chain = ConversationChain(
llm=llm,
memory=memory
)
# Only last 5 exchanges sent to LLM
response = chain.predict(input="Hello")
```
**Integration:**
```python
class OpenAIBackendWithWindow(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user windowed memory
self._user_memories: dict[str, ConversationBufferWindowMemory] = {}
self._window_size = 5 # Last 5 exchanges
def _get_memory(self, user_id: str) -> ConversationBufferWindowMemory:
if user_id not in self._user_memories:
self._user_memories[user_id] = ConversationBufferWindowMemory(
k=self._window_size
)
return self._user_memories[user_id]
```
**Pros:**
- Simple sliding window approach
- Reduces token usage automatically
- Works with any OpenAI-compatible API
- Configurable window size
**Cons:**
- Still in-memory only (lost on restart)
- Forgets old context completely
- Need to integrate with existing SQLite storage
- Adds LangChain dependency
**Verdict:** Better than full buffer, but loses long-term context.
---
### C. ConversationSummaryMemory (Most Interesting)
**What it does:** Uses LLM to summarize conversation, keeps summary + recent messages.
```python
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)
chain = ConversationChain(
llm=llm,
memory=memory
)
# After multiple messages, memory contains:
# - Summary of old conversation
# - Recent raw messages
response = chain.predict(input="What did we talk about?")
# AI can reference both summary and recent context
```
**Integration with SQLite persistence:**
```python
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
class OpenAIBackendWithSummary(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str, history: ConversationHistory):
self.config = config
self.history = history # Existing SQLite history
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user summaries (load from DB)
self._user_summaries: dict[str, str] = {}
self._window_size = 4 # Keep last 4 messages raw
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
# Get full history from SQLite
full_history = await self.history.get_history(user_id)
if len(full_history) <= self._window_size * 2:
# Small conversation, just use raw messages
context_messages = messages
else:
# Large conversation: summarize old + keep recent
old_messages = full_history[:-self._window_size * 2]
recent_messages = full_history[-self._window_size * 2:]
# Get or create summary
summary = await self._get_summary(user_id, old_messages)
# Build context: system + summary + recent messages
context_messages = [
{"role": "system", "content": f"{system_prompt}\n\nConversation summary: {summary}"}
]
context_messages.extend([
{"role": msg.role, "content": msg.content}
for msg in recent_messages
])
# Generate response
response = await self._client.chat.completions.create(
model=self.config.model,
messages=context_messages,
max_tokens=max_tokens,
temperature=0.7,
)
return response.choices[0].message.content.strip()
async def _get_summary(self, user_id: str, messages: list) -> str:
"""Summarize old messages using LLM."""
if user_id in self._user_summaries:
return self._user_summaries[user_id]
# Create summary prompt
conversation_text = "\n".join([
f"{msg.role}: {msg.content}" for msg in messages
])
summary_prompt = f"""Summarize this conversation in 2-3 sentences, focusing on key topics and user preferences:
{conversation_text}
Summary:"""
response = await self._client.chat.completions.create(
model=self.config.model,
messages=[{"role": "user", "content": summary_prompt}],
max_tokens=150,
temperature=0.3,
)
summary = response.choices[0].message.content.strip()
# Store in SQLite
await self._store_summary(user_id, summary)
self._user_summaries[user_id] = summary
return summary
async def _store_summary(self, user_id: str, summary: str):
"""Store summary in SQLite for persistence."""
# Add new table for summaries
await self.history._db.execute("""
CREATE TABLE IF NOT EXISTS conversation_summaries (
user_id TEXT PRIMARY KEY,
summary TEXT NOT NULL,
updated_at REAL NOT NULL
)
""")
await self.history._db.execute("""
INSERT OR REPLACE INTO conversation_summaries (user_id, summary, updated_at)
VALUES (?, ?, ?)
""", (user_id, summary, time.time()))
await self.history._db.commit()
```
**Pros:**
- Best balance: compact summary + recent context
- Significantly reduces token usage for long conversations
- Works with existing OpenAI-compatible APIs
- Preserves long-term context
- Can persist summaries in SQLite
**Cons:**
- Costs extra tokens to generate summaries
- Adds latency when summarizing
- Need to decide when to re-summarize
- Still requires LangChain
**Verdict:** BEST LANGCHAIN OPTION for MeshAI - balances efficiency and context retention.
---
## 2. LlamaIndex
### Installation
```bash
pip install llama-index llama-index-llms-openai
```
### Chat Memory
```python
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# Initialize
llm = OpenAI(
api_base="http://192.168.1.239:8000/v1",
api_key="your-key",
model="gpt-4o-mini"
)
# Create memory buffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
# Add messages
memory.put(ChatMessage(role="user", content="Hello"))
memory.put(ChatMessage(role="assistant", content="Hi there!"))
# Get messages for LLM
messages = memory.get()
# Generate with context
response = llm.chat(messages)
```
**Integration:**
```python
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
class LlamaIndexBackend(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = OpenAI(
api_base=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user memory buffers
self._user_memories: dict[str, ChatMemoryBuffer] = {}
self._token_limit = 1500
def _get_memory(self, user_id: str) -> ChatMemoryBuffer:
if user_id not in self._user_memories:
self._user_memories[user_id] = ChatMemoryBuffer.from_defaults(
token_limit=self._token_limit
)
return self._user_memories[user_id]
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
memory = self._get_memory(user_id)
# Add new message to memory
user_msg = messages[-1]["content"]
memory.put(ChatMessage(role="user", content=user_msg))
# Get messages within token limit
context_messages = memory.get()
# Add system prompt
full_messages = [ChatMessage(role="system", content=system_prompt)]
full_messages.extend(context_messages)
# Generate
response = self._llm.chat(full_messages)
# Store assistant response
memory.put(ChatMessage(role="assistant", content=response.message.content))
return response.message.content
```
**Pros:**
- Token-aware buffering (auto-prunes to stay under limit)
- Simple API
- Works with OpenAI-compatible backends
- Better than manual message counting
**Cons:**
- In-memory only (need custom persistence)
- Heavy dependency (~100MB)
- Overkill for simple chat
- Less mature than LangChain
**Verdict:** Token limiting is nice, but not worth the dependency weight.
---
## 3. MemGPT / Letta (Self-Editing Memory)
### Installation
```bash
pip install letta
```
### Usage
**What it does:** Agent manages its own memory, decides what to keep/forget/summarize.
```python
from letta import create_client
client = create_client()
# Create agent with memory management
agent = client.create_agent(
name="meshai_agent",
llm_config={
"model": "gpt-4o-mini",
"model_endpoint": "http://192.168.1.239:8000/v1"
},
embedding_config={
"embedding_endpoint_type": "openai",
"embedding_model": "text-embedding-ada-002"
}
)
# Agent manages memory automatically
response = client.send_message(
agent_id=agent.id,
message="What's the weather?",
role="user"
)
print(response.messages[-1].text)
```
**Architecture:**
- Core memory: Persistent facts the agent always sees
- Recall memory: Searchable vector store of past conversations
- Archival memory: Long-term storage
**Pros:**
- Most sophisticated memory system
- Agent decides what's important
- Built-in vector search
- Handles very long conversations
**Cons:**
- HEAVY (~200MB+ with dependencies)
- Requires vector embeddings (extra API calls/costs)
- Complex setup and learning curve
- Overkill for 150-char mesh messages
- Opinionated architecture (hard to integrate)
**Verdict:** Way too heavy for MeshAI. Only worth it for complex, long-form agents.
---
## 4. Vector Stores (Semantic Memory)
### ChromaDB (Simplest)
```bash
pip install chromadb
```
```python
import chromadb
from chromadb.config import Settings
# Initialize
client = chromadb.Client(Settings(
persist_directory="/path/to/meshai/memory",
anonymized_telemetry=False
))
# Create collection per user
collection = client.get_or_create_collection(
name=f"user_{user_id}",
metadata={"user_id": user_id}
)
# Add messages
collection.add(
documents=["What's the weather in Seattle?"],
metadatas=[{"role": "user", "timestamp": time.time()}],
ids=["msg_1"]
)
# Semantic search for relevant past messages
results = collection.query(
query_texts=["weather"],
n_results=3
)
# Use retrieved messages as context
relevant_context = results['documents'][0]
```
**Integration:**
```python
import chromadb
from chromadb.config import Settings
class VectorMemoryBackend(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str, db_path: str):
self.config = config
self._client = AsyncOpenAI(
api_key=api_key,
base_url=config.base_url,
)
# ChromaDB for semantic memory
self._chroma = chromadb.Client(Settings(
persist_directory=db_path,
anonymized_telemetry=False
))
self._window_size = 4 # Keep last 4 messages raw
def _get_collection(self, user_id: str):
return self._chroma.get_or_create_collection(
name=f"user_{user_id.replace('!', '_')}" # Sanitize ID
)
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
collection = self._get_collection(user_id)
# Get current query
current_query = messages[-1]["content"]
# Search for semantically similar past messages
try:
results = collection.query(
query_texts=[current_query],
n_results=3,
where={"role": "assistant"} # Get past responses
)
relevant_history = results['documents'][0] if results['documents'] else []
except:
relevant_history = []
# Build context: system + relevant history + recent messages
context = system_prompt
if relevant_history:
context += "\n\nRelevant past exchanges:\n"
context += "\n".join(relevant_history[:2]) # Top 2 relevant
context_messages = [{"role": "system", "content": context}]
context_messages.extend(messages[-self._window_size*2:]) # Recent messages
# Generate
response = await self._client.chat.completions.create(
model=self.config.model,
messages=context_messages,
max_tokens=max_tokens,
temperature=0.7,
)
reply = response.choices[0].message.content.strip()
# Store in vector DB
msg_id = f"{user_id}_{int(time.time()*1000)}"
collection.add(
documents=[f"User: {current_query}\nAssistant: {reply}"],
metadatas=[{"role": "assistant", "timestamp": time.time()}],
ids=[msg_id]
)
return reply
```
**Pros:**
- Semantic search - finds relevant past context
- Works great for sparse conversations
- Persistent storage
- Lightweight (~20MB)
- No extra API calls (uses local embeddings)
**Cons:**
- Adds dependency
- Embedding computation overhead
- May surface irrelevant "similar" messages
- Overkill for very short conversations
**Verdict:** Interesting for long-term memory, but maybe overkill for 150-char messages.
---
### Qdrant (Production Alternative)
```bash
pip install qdrant-client
```
```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Can run in-memory or with server
client = QdrantClient(path="/path/to/meshai/qdrant")
# Create collection
client.create_collection(
collection_name="meshai_memory",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Store with embedding (from OpenAI or local model)
client.upsert(
collection_name="meshai_memory",
points=[
PointStruct(
id=msg_id,
vector=embedding, # 1536-dim from text-embedding-ada-002
payload={"user_id": user_id, "content": content, "role": role}
)
]
)
# Search
results = client.search(
collection_name="meshai_memory",
query_vector=query_embedding,
query_filter={"user_id": user_id},
limit=3
)
```
**Pros:**
- Production-ready, fast
- Better than ChromaDB for scale
- Rich filtering options
- Can run in-memory or server mode
**Cons:**
- More complex than ChromaDB
- Still requires embeddings
- Heavier dependency
**Verdict:** Better than ChromaDB for production, but still overkill for MeshAI's use case.
---
## 5. Simple Rolling Summary (RECOMMENDED)
**The lightest, most practical approach for MeshAI.**
### Implementation
```python
import asyncio
import time
from dataclasses import dataclass
from typing import Optional
from openai import AsyncOpenAI
@dataclass
class ConversationSummary:
"""Summary of conversation history."""
summary: str
last_updated: float
message_count: int
class SimpleRollingSummary:
"""Lightweight rolling summary memory manager."""
def __init__(
self,
client: AsyncOpenAI,
model: str,
window_size: int = 4, # Recent messages to keep raw
summarize_threshold: int = 10, # Messages before summarizing
):
self._client = client
self._model = model
self._window_size = window_size
self._summarize_threshold = summarize_threshold
# Per-user summaries (would be in SQLite in production)
self._summaries: dict[str, ConversationSummary] = {}
async def get_context_messages(
self,
user_id: str,
full_history: list[dict], # From SQLite
) -> list[dict]:
"""Get optimized context messages (summary + recent)."""
# If conversation is short, just return it
if len(full_history) <= self._window_size * 2:
return full_history
# Split into old and recent
old_messages = full_history[:-self._window_size * 2]
recent_messages = full_history[-self._window_size * 2:]
# Get or create summary of old messages
summary = await self._get_or_create_summary(user_id, old_messages)
# Return summary as system message + recent raw messages
context = [
{"role": "system", "content": f"Previous conversation summary: {summary.summary}"}
]
context.extend(recent_messages)
return context
async def _get_or_create_summary(
self,
user_id: str,
messages: list[dict],
) -> ConversationSummary:
"""Get existing summary or create new one."""
# Check if we have a recent summary
if user_id in self._summaries:
existing = self._summaries[user_id]
# If summary covers roughly the same messages, reuse it
if abs(existing.message_count - len(messages)) < self._summarize_threshold:
return existing
# Create new summary
summary_text = await self._summarize(messages)
summary = ConversationSummary(
summary=summary_text,
last_updated=time.time(),
message_count=len(messages)
)
self._summaries[user_id] = summary
return summary
async def _summarize(self, messages: list[dict]) -> str:
"""Summarize a list of messages using the LLM."""
# Format conversation
conversation = "\n".join([
f"{msg['role'].upper()}: {msg['content']}"
for msg in messages
])
prompt = f"""Summarize this conversation in 2-3 concise sentences. Focus on:
- Main topics discussed
- Any important user preferences or context
- Key information that should be remembered
Conversation:
{conversation}
Summary (2-3 sentences):"""
try:
response = await self._client.chat.completions.create(
model=self._model,
messages=[{"role": "user", "content": prompt}],
max_tokens=150,
temperature=0.3,
)
return response.choices[0].message.content.strip()
except Exception as e:
# Fallback: simple truncation if summarization fails
return f"Previous conversation covered {len(messages)} messages."
```
### Integration with MeshAI
```python
# In meshai/backends/openai_backend.py
class OpenAIBackend(LLMBackend):
"""OpenAI-compatible backend with rolling summary memory."""
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._client = AsyncOpenAI(
api_key=api_key,
base_url=config.base_url,
)
# Add rolling summary manager
self._memory = SimpleRollingSummary(
client=self._client,
model=config.model,
window_size=4, # Keep last 4 exchanges (8 messages)
summarize_threshold=10, # Summarize after 10 messages
)
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str, # NEW: need user_id
max_tokens: int = 300,
) -> str:
"""Generate with optimized context."""
# Get optimized context (summary + recent)
context_messages = await self._memory.get_context_messages(
user_id=user_id,
full_history=messages,
)
# Add system prompt
full_messages = [{"role": "system", "content": system_prompt}]
full_messages.extend(context_messages)
# Generate
response = await self._client.chat.completions.create(
model=self.config.model,
messages=full_messages,
max_tokens=max_tokens,
temperature=0.7,
)
return response.choices[0].message.content.strip()
```
### Persist Summaries in SQLite
```python
# Add to meshai/history.py
async def store_summary(self, user_id: str, summary: str, message_count: int) -> None:
"""Store conversation summary."""
if not self._db:
raise RuntimeError("Database not initialized")
async with self._lock:
await self._db.execute("""
CREATE TABLE IF NOT EXISTS conversation_summaries (
user_id TEXT PRIMARY KEY,
summary TEXT NOT NULL,
message_count INTEGER NOT NULL,
updated_at REAL NOT NULL
)
""")
await self._db.execute("""
INSERT OR REPLACE INTO conversation_summaries
(user_id, summary, message_count, updated_at)
VALUES (?, ?, ?, ?)
""", (user_id, summary, message_count, time.time()))
await self._db.commit()
async def get_summary(self, user_id: str) -> Optional[ConversationSummary]:
"""Retrieve conversation summary."""
if not self._db:
raise RuntimeError("Database not initialized")
async with self._lock:
cursor = await self._db.execute("""
SELECT summary, message_count, updated_at
FROM conversation_summaries
WHERE user_id = ?
""", (user_id,))
row = await cursor.fetchone()
if not row:
return None
return ConversationSummary(
summary=row[0],
message_count=row[1],
last_updated=row[2]
)
```
**Pros:**
- NO external dependencies
- Works with existing SQLite storage
- Significantly reduces token usage
- Simple to understand and maintain
- Preserves recent context + summarized history
- Configurable window and threshold
**Cons:**
- Costs tokens to generate summaries
- Slight latency when summarizing
- Need to tune window/threshold params
**Verdict:** BEST OPTION for MeshAI - simple, effective, no dependencies.
---
## Comparison Matrix
| Approach | Dependencies | Complexity | Token Savings | Persistence | OpenAI-Compatible |
|----------|-------------|------------|---------------|-------------|-------------------|
| **LangChain BufferMemory** | langchain (~50MB) | Low | None | No | Yes |
| **LangChain WindowMemory** | langchain (~50MB) | Low | Medium | No | Yes |
| **LangChain SummaryMemory** | langchain (~50MB) | Medium | High | No (DIY) | Yes |
| **LlamaIndex** | llama-index (~100MB) | Medium | Medium | No (DIY) | Yes |
| **MemGPT/Letta** | letta (~200MB) | Very High | Very High | Yes | Yes (complex) |
| **ChromaDB** | chromadb (~20MB) | Medium | Medium | Yes | Yes |
| **Qdrant** | qdrant (~30MB) | High | Medium | Yes | Yes |
| **Rolling Summary (DIY)** | None | Low | High | Yes (SQLite) | Yes |
---
## RECOMMENDATION
**Use Simple Rolling Summary (Option 5)** for MeshAI because:
1. **Zero dependencies** - No LangChain, LlamaIndex, or vector stores
2. **Works with current stack** - Uses existing AsyncOpenAI client and SQLite
3. **Significant efficiency gains** - Keeps last 4-6 exchanges + summary of older messages
4. **Persistent** - Summaries stored in SQLite, survive restarts
5. **Simple to tune** - Two params: `window_size` and `summarize_threshold`
6. **OpenAI-compatible** - Works with LiteLLM, local models, anything
7. **Lightweight** - ~100 lines of code
### Implementation Steps
1. Add `SimpleRollingSummary` class (shown above)
2. Add summary table to SQLite schema
3. Modify `OpenAIBackend.generate()` to use `_memory.get_context_messages()`
4. Add summary storage methods to `ConversationHistory`
5. Configure: `window_size=4` (8 messages), `summarize_threshold=10`
### Expected Performance
**Before (full history):**
- 20 message pairs = ~3000 tokens sent every request
- Latency: higher, costs more
**After (rolling summary):**
- Summary (~100 tokens) + 4 recent pairs (~400 tokens) = ~500 tokens
- **83% token reduction** for long conversations
- Faster responses, lower costs
### When to Consider Alternatives
- **Vector stores (ChromaDB)**: If you need semantic search across users or topics
- **LangChain SummaryMemory**: If you want a batteries-included solution (accept dependency)
- **MemGPT**: If conversations become complex multi-day dialogues (they won't on mesh)
---
## Example Usage
```python
# Initialize
backend = OpenAIBackend(config, api_key)
# First few messages - full history sent
await backend.generate(
messages=[
{"role": "user", "content": "What's the weather?"},
{"role": "assistant", "content": "It's sunny!"},
{"role": "user", "content": "Should I bring an umbrella?"},
{"role": "assistant", "content": "No need, it's clear!"},
# ... 6 more exchanges ...
],
system_prompt="You are a helpful assistant.",
user_id="!abc123",
)
# After 10+ messages - summary + recent sent
# Context sent to LLM:
# [
# {"role": "system", "content": "Previous conversation summary: User asked about weather and outdoor activities. Confirmed sunny weather, no rain expected."},
# {"role": "user", "content": "Should I bring an umbrella?"},
# {"role": "assistant", "content": "No need, it's clear!"},
# ... (last 4 exchanges)
# ]
```
---
## Code Files to Modify
1. **`meshai/memory.py`** (NEW) - Add `SimpleRollingSummary` class
2. **`meshai/history.py`** - Add summary storage methods + table schema
3. **`meshai/backends/openai_backend.py`** - Integrate memory manager
4. **`meshai/responder.py`** - Pass `user_id` to backend.generate()
5. **`meshai/config.py`** - Add config for window_size, summarize_threshold
Let me know if you want me to implement this!