Features: - Multi-backend LLM support (OpenAI, Anthropic, Google) - Rolling summary memory for token optimization (~70-80% reduction) - Per-user conversation history with SQLite persistence - Bang commands (!help, !ping, !reset, !status, !weather) - Meshtastic integration via serial or TCP - Message chunking for mesh network constraints (150 char limit) - Rate limiting to prevent network congestion - Rich TUI configurator - Docker support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
29 KiB
LLM Conversation Memory Research for MeshAI
Current Implementation Analysis
Current approach: MeshAI stuffs full conversation history into every LLM API call
- Storage: SQLite via aiosqlite
- Retrieval:
get_history_for_llm()returns all messages (up tomax_messages_per_user * 2) - Backend: OpenAI-compatible API (works with LiteLLM, local models)
- Context: 150 char max per message, per-user conversations
Problem: Inefficient - sends entire history even when unnecessary, wastes tokens and latency.
1. LangChain Memory Modules
Installation
pip install langchain langchain-community langchain-openai
A. ConversationBufferMemory (Simplest)
What it does: Stores raw messages in memory, returns all messages.
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
# Initialize
llm = ChatOpenAI(
base_url="http://192.168.1.239:8000/v1", # LiteLLM
api_key="your-key",
model="gpt-4o-mini"
)
memory = ConversationBufferMemory()
chain = ConversationChain(
llm=llm,
memory=memory,
verbose=False
)
# Use it
response = chain.predict(input="What's the weather?")
print(response)
# Access history
print(memory.load_memory_variables({}))
# {'history': 'Human: What's the weather?\nAI: ...'}
Integration with MeshAI:
# In meshai/backends/openai_backend.py
from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
class OpenAIBackendWithMemory(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model,
temperature=0.7,
max_tokens=300
)
# Per-user memory storage
self._user_memories: dict[str, ConversationBufferMemory] = {}
def _get_memory(self, user_id: str) -> ConversationBufferMemory:
if user_id not in self._user_memories:
self._user_memories[user_id] = ConversationBufferMemory()
return self._user_memories[user_id]
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str, # NEW: need user_id for memory
max_tokens: int = 300,
) -> str:
memory = self._get_memory(user_id)
# Create chain with memory
chain = ConversationChain(
llm=self._llm,
memory=memory,
verbose=False
)
# Extract last user message
last_msg = messages[-1]["content"]
# Generate with memory
response = await chain.apredict(input=last_msg)
return response.strip()
Pros:
- Dead simple, drop-in replacement
- Works with any OpenAI-compatible API
- No external dependencies
- LangChain handles message formatting
Cons:
- Still sends full history (no real efficiency gain)
- Stores everything in RAM (lost on restart)
- Need to manage per-user memory dicts
- Adds LangChain dependency (~50MB)
Verdict: Not worth it - adds complexity without solving core problem.
B. ConversationBufferWindowMemory (Better)
What it does: Only keeps last N messages in context.
from langchain.memory import ConversationBufferWindowMemory
# Keep only last 5 interactions (10 messages = 5 pairs)
memory = ConversationBufferWindowMemory(k=5)
chain = ConversationChain(
llm=llm,
memory=memory
)
# Only last 5 exchanges sent to LLM
response = chain.predict(input="Hello")
Integration:
class OpenAIBackendWithWindow(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user windowed memory
self._user_memories: dict[str, ConversationBufferWindowMemory] = {}
self._window_size = 5 # Last 5 exchanges
def _get_memory(self, user_id: str) -> ConversationBufferWindowMemory:
if user_id not in self._user_memories:
self._user_memories[user_id] = ConversationBufferWindowMemory(
k=self._window_size
)
return self._user_memories[user_id]
Pros:
- Simple sliding window approach
- Reduces token usage automatically
- Works with any OpenAI-compatible API
- Configurable window size
Cons:
- Still in-memory only (lost on restart)
- Forgets old context completely
- Need to integrate with existing SQLite storage
- Adds LangChain dependency
Verdict: Better than full buffer, but loses long-term context.
C. ConversationSummaryMemory (Most Interesting)
What it does: Uses LLM to summarize conversation, keeps summary + recent messages.
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)
chain = ConversationChain(
llm=llm,
memory=memory
)
# After multiple messages, memory contains:
# - Summary of old conversation
# - Recent raw messages
response = chain.predict(input="What did we talk about?")
# AI can reference both summary and recent context
Integration with SQLite persistence:
from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI
class OpenAIBackendWithSummary(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str, history: ConversationHistory):
self.config = config
self.history = history # Existing SQLite history
self._llm = ChatOpenAI(
base_url=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user summaries (load from DB)
self._user_summaries: dict[str, str] = {}
self._window_size = 4 # Keep last 4 messages raw
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
# Get full history from SQLite
full_history = await self.history.get_history(user_id)
if len(full_history) <= self._window_size * 2:
# Small conversation, just use raw messages
context_messages = messages
else:
# Large conversation: summarize old + keep recent
old_messages = full_history[:-self._window_size * 2]
recent_messages = full_history[-self._window_size * 2:]
# Get or create summary
summary = await self._get_summary(user_id, old_messages)
# Build context: system + summary + recent messages
context_messages = [
{"role": "system", "content": f"{system_prompt}\n\nConversation summary: {summary}"}
]
context_messages.extend([
{"role": msg.role, "content": msg.content}
for msg in recent_messages
])
# Generate response
response = await self._client.chat.completions.create(
model=self.config.model,
messages=context_messages,
max_tokens=max_tokens,
temperature=0.7,
)
return response.choices[0].message.content.strip()
async def _get_summary(self, user_id: str, messages: list) -> str:
"""Summarize old messages using LLM."""
if user_id in self._user_summaries:
return self._user_summaries[user_id]
# Create summary prompt
conversation_text = "\n".join([
f"{msg.role}: {msg.content}" for msg in messages
])
summary_prompt = f"""Summarize this conversation in 2-3 sentences, focusing on key topics and user preferences:
{conversation_text}
Summary:"""
response = await self._client.chat.completions.create(
model=self.config.model,
messages=[{"role": "user", "content": summary_prompt}],
max_tokens=150,
temperature=0.3,
)
summary = response.choices[0].message.content.strip()
# Store in SQLite
await self._store_summary(user_id, summary)
self._user_summaries[user_id] = summary
return summary
async def _store_summary(self, user_id: str, summary: str):
"""Store summary in SQLite for persistence."""
# Add new table for summaries
await self.history._db.execute("""
CREATE TABLE IF NOT EXISTS conversation_summaries (
user_id TEXT PRIMARY KEY,
summary TEXT NOT NULL,
updated_at REAL NOT NULL
)
""")
await self.history._db.execute("""
INSERT OR REPLACE INTO conversation_summaries (user_id, summary, updated_at)
VALUES (?, ?, ?)
""", (user_id, summary, time.time()))
await self.history._db.commit()
Pros:
- Best balance: compact summary + recent context
- Significantly reduces token usage for long conversations
- Works with existing OpenAI-compatible APIs
- Preserves long-term context
- Can persist summaries in SQLite
Cons:
- Costs extra tokens to generate summaries
- Adds latency when summarizing
- Need to decide when to re-summarize
- Still requires LangChain
Verdict: BEST LANGCHAIN OPTION for MeshAI - balances efficiency and context retention.
2. LlamaIndex
Installation
pip install llama-index llama-index-llms-openai
Chat Memory
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
# Initialize
llm = OpenAI(
api_base="http://192.168.1.239:8000/v1",
api_key="your-key",
model="gpt-4o-mini"
)
# Create memory buffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
# Add messages
memory.put(ChatMessage(role="user", content="Hello"))
memory.put(ChatMessage(role="assistant", content="Hi there!"))
# Get messages for LLM
messages = memory.get()
# Generate with context
response = llm.chat(messages)
Integration:
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
class LlamaIndexBackend(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._llm = OpenAI(
api_base=config.base_url,
api_key=api_key,
model=config.model
)
# Per-user memory buffers
self._user_memories: dict[str, ChatMemoryBuffer] = {}
self._token_limit = 1500
def _get_memory(self, user_id: str) -> ChatMemoryBuffer:
if user_id not in self._user_memories:
self._user_memories[user_id] = ChatMemoryBuffer.from_defaults(
token_limit=self._token_limit
)
return self._user_memories[user_id]
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
memory = self._get_memory(user_id)
# Add new message to memory
user_msg = messages[-1]["content"]
memory.put(ChatMessage(role="user", content=user_msg))
# Get messages within token limit
context_messages = memory.get()
# Add system prompt
full_messages = [ChatMessage(role="system", content=system_prompt)]
full_messages.extend(context_messages)
# Generate
response = self._llm.chat(full_messages)
# Store assistant response
memory.put(ChatMessage(role="assistant", content=response.message.content))
return response.message.content
Pros:
- Token-aware buffering (auto-prunes to stay under limit)
- Simple API
- Works with OpenAI-compatible backends
- Better than manual message counting
Cons:
- In-memory only (need custom persistence)
- Heavy dependency (~100MB)
- Overkill for simple chat
- Less mature than LangChain
Verdict: Token limiting is nice, but not worth the dependency weight.
3. MemGPT / Letta (Self-Editing Memory)
Installation
pip install letta
Usage
What it does: Agent manages its own memory, decides what to keep/forget/summarize.
from letta import create_client
client = create_client()
# Create agent with memory management
agent = client.create_agent(
name="meshai_agent",
llm_config={
"model": "gpt-4o-mini",
"model_endpoint": "http://192.168.1.239:8000/v1"
},
embedding_config={
"embedding_endpoint_type": "openai",
"embedding_model": "text-embedding-ada-002"
}
)
# Agent manages memory automatically
response = client.send_message(
agent_id=agent.id,
message="What's the weather?",
role="user"
)
print(response.messages[-1].text)
Architecture:
- Core memory: Persistent facts the agent always sees
- Recall memory: Searchable vector store of past conversations
- Archival memory: Long-term storage
Pros:
- Most sophisticated memory system
- Agent decides what's important
- Built-in vector search
- Handles very long conversations
Cons:
- HEAVY (~200MB+ with dependencies)
- Requires vector embeddings (extra API calls/costs)
- Complex setup and learning curve
- Overkill for 150-char mesh messages
- Opinionated architecture (hard to integrate)
Verdict: Way too heavy for MeshAI. Only worth it for complex, long-form agents.
4. Vector Stores (Semantic Memory)
ChromaDB (Simplest)
pip install chromadb
import chromadb
from chromadb.config import Settings
# Initialize
client = chromadb.Client(Settings(
persist_directory="/path/to/meshai/memory",
anonymized_telemetry=False
))
# Create collection per user
collection = client.get_or_create_collection(
name=f"user_{user_id}",
metadata={"user_id": user_id}
)
# Add messages
collection.add(
documents=["What's the weather in Seattle?"],
metadatas=[{"role": "user", "timestamp": time.time()}],
ids=["msg_1"]
)
# Semantic search for relevant past messages
results = collection.query(
query_texts=["weather"],
n_results=3
)
# Use retrieved messages as context
relevant_context = results['documents'][0]
Integration:
import chromadb
from chromadb.config import Settings
class VectorMemoryBackend(LLMBackend):
def __init__(self, config: LLMConfig, api_key: str, db_path: str):
self.config = config
self._client = AsyncOpenAI(
api_key=api_key,
base_url=config.base_url,
)
# ChromaDB for semantic memory
self._chroma = chromadb.Client(Settings(
persist_directory=db_path,
anonymized_telemetry=False
))
self._window_size = 4 # Keep last 4 messages raw
def _get_collection(self, user_id: str):
return self._chroma.get_or_create_collection(
name=f"user_{user_id.replace('!', '_')}" # Sanitize ID
)
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str,
max_tokens: int = 300,
) -> str:
collection = self._get_collection(user_id)
# Get current query
current_query = messages[-1]["content"]
# Search for semantically similar past messages
try:
results = collection.query(
query_texts=[current_query],
n_results=3,
where={"role": "assistant"} # Get past responses
)
relevant_history = results['documents'][0] if results['documents'] else []
except:
relevant_history = []
# Build context: system + relevant history + recent messages
context = system_prompt
if relevant_history:
context += "\n\nRelevant past exchanges:\n"
context += "\n".join(relevant_history[:2]) # Top 2 relevant
context_messages = [{"role": "system", "content": context}]
context_messages.extend(messages[-self._window_size*2:]) # Recent messages
# Generate
response = await self._client.chat.completions.create(
model=self.config.model,
messages=context_messages,
max_tokens=max_tokens,
temperature=0.7,
)
reply = response.choices[0].message.content.strip()
# Store in vector DB
msg_id = f"{user_id}_{int(time.time()*1000)}"
collection.add(
documents=[f"User: {current_query}\nAssistant: {reply}"],
metadatas=[{"role": "assistant", "timestamp": time.time()}],
ids=[msg_id]
)
return reply
Pros:
- Semantic search - finds relevant past context
- Works great for sparse conversations
- Persistent storage
- Lightweight (~20MB)
- No extra API calls (uses local embeddings)
Cons:
- Adds dependency
- Embedding computation overhead
- May surface irrelevant "similar" messages
- Overkill for very short conversations
Verdict: Interesting for long-term memory, but maybe overkill for 150-char messages.
Qdrant (Production Alternative)
pip install qdrant-client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Can run in-memory or with server
client = QdrantClient(path="/path/to/meshai/qdrant")
# Create collection
client.create_collection(
collection_name="meshai_memory",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Store with embedding (from OpenAI or local model)
client.upsert(
collection_name="meshai_memory",
points=[
PointStruct(
id=msg_id,
vector=embedding, # 1536-dim from text-embedding-ada-002
payload={"user_id": user_id, "content": content, "role": role}
)
]
)
# Search
results = client.search(
collection_name="meshai_memory",
query_vector=query_embedding,
query_filter={"user_id": user_id},
limit=3
)
Pros:
- Production-ready, fast
- Better than ChromaDB for scale
- Rich filtering options
- Can run in-memory or server mode
Cons:
- More complex than ChromaDB
- Still requires embeddings
- Heavier dependency
Verdict: Better than ChromaDB for production, but still overkill for MeshAI's use case.
5. Simple Rolling Summary (RECOMMENDED)
The lightest, most practical approach for MeshAI.
Implementation
import asyncio
import time
from dataclasses import dataclass
from typing import Optional
from openai import AsyncOpenAI
@dataclass
class ConversationSummary:
"""Summary of conversation history."""
summary: str
last_updated: float
message_count: int
class SimpleRollingSummary:
"""Lightweight rolling summary memory manager."""
def __init__(
self,
client: AsyncOpenAI,
model: str,
window_size: int = 4, # Recent messages to keep raw
summarize_threshold: int = 10, # Messages before summarizing
):
self._client = client
self._model = model
self._window_size = window_size
self._summarize_threshold = summarize_threshold
# Per-user summaries (would be in SQLite in production)
self._summaries: dict[str, ConversationSummary] = {}
async def get_context_messages(
self,
user_id: str,
full_history: list[dict], # From SQLite
) -> list[dict]:
"""Get optimized context messages (summary + recent)."""
# If conversation is short, just return it
if len(full_history) <= self._window_size * 2:
return full_history
# Split into old and recent
old_messages = full_history[:-self._window_size * 2]
recent_messages = full_history[-self._window_size * 2:]
# Get or create summary of old messages
summary = await self._get_or_create_summary(user_id, old_messages)
# Return summary as system message + recent raw messages
context = [
{"role": "system", "content": f"Previous conversation summary: {summary.summary}"}
]
context.extend(recent_messages)
return context
async def _get_or_create_summary(
self,
user_id: str,
messages: list[dict],
) -> ConversationSummary:
"""Get existing summary or create new one."""
# Check if we have a recent summary
if user_id in self._summaries:
existing = self._summaries[user_id]
# If summary covers roughly the same messages, reuse it
if abs(existing.message_count - len(messages)) < self._summarize_threshold:
return existing
# Create new summary
summary_text = await self._summarize(messages)
summary = ConversationSummary(
summary=summary_text,
last_updated=time.time(),
message_count=len(messages)
)
self._summaries[user_id] = summary
return summary
async def _summarize(self, messages: list[dict]) -> str:
"""Summarize a list of messages using the LLM."""
# Format conversation
conversation = "\n".join([
f"{msg['role'].upper()}: {msg['content']}"
for msg in messages
])
prompt = f"""Summarize this conversation in 2-3 concise sentences. Focus on:
- Main topics discussed
- Any important user preferences or context
- Key information that should be remembered
Conversation:
{conversation}
Summary (2-3 sentences):"""
try:
response = await self._client.chat.completions.create(
model=self._model,
messages=[{"role": "user", "content": prompt}],
max_tokens=150,
temperature=0.3,
)
return response.choices[0].message.content.strip()
except Exception as e:
# Fallback: simple truncation if summarization fails
return f"Previous conversation covered {len(messages)} messages."
Integration with MeshAI
# In meshai/backends/openai_backend.py
class OpenAIBackend(LLMBackend):
"""OpenAI-compatible backend with rolling summary memory."""
def __init__(self, config: LLMConfig, api_key: str):
self.config = config
self._client = AsyncOpenAI(
api_key=api_key,
base_url=config.base_url,
)
# Add rolling summary manager
self._memory = SimpleRollingSummary(
client=self._client,
model=config.model,
window_size=4, # Keep last 4 exchanges (8 messages)
summarize_threshold=10, # Summarize after 10 messages
)
async def generate(
self,
messages: list[dict],
system_prompt: str,
user_id: str, # NEW: need user_id
max_tokens: int = 300,
) -> str:
"""Generate with optimized context."""
# Get optimized context (summary + recent)
context_messages = await self._memory.get_context_messages(
user_id=user_id,
full_history=messages,
)
# Add system prompt
full_messages = [{"role": "system", "content": system_prompt}]
full_messages.extend(context_messages)
# Generate
response = await self._client.chat.completions.create(
model=self.config.model,
messages=full_messages,
max_tokens=max_tokens,
temperature=0.7,
)
return response.choices[0].message.content.strip()
Persist Summaries in SQLite
# Add to meshai/history.py
async def store_summary(self, user_id: str, summary: str, message_count: int) -> None:
"""Store conversation summary."""
if not self._db:
raise RuntimeError("Database not initialized")
async with self._lock:
await self._db.execute("""
CREATE TABLE IF NOT EXISTS conversation_summaries (
user_id TEXT PRIMARY KEY,
summary TEXT NOT NULL,
message_count INTEGER NOT NULL,
updated_at REAL NOT NULL
)
""")
await self._db.execute("""
INSERT OR REPLACE INTO conversation_summaries
(user_id, summary, message_count, updated_at)
VALUES (?, ?, ?, ?)
""", (user_id, summary, message_count, time.time()))
await self._db.commit()
async def get_summary(self, user_id: str) -> Optional[ConversationSummary]:
"""Retrieve conversation summary."""
if not self._db:
raise RuntimeError("Database not initialized")
async with self._lock:
cursor = await self._db.execute("""
SELECT summary, message_count, updated_at
FROM conversation_summaries
WHERE user_id = ?
""", (user_id,))
row = await cursor.fetchone()
if not row:
return None
return ConversationSummary(
summary=row[0],
message_count=row[1],
last_updated=row[2]
)
Pros:
- NO external dependencies
- Works with existing SQLite storage
- Significantly reduces token usage
- Simple to understand and maintain
- Preserves recent context + summarized history
- Configurable window and threshold
Cons:
- Costs tokens to generate summaries
- Slight latency when summarizing
- Need to tune window/threshold params
Verdict: BEST OPTION for MeshAI - simple, effective, no dependencies.
Comparison Matrix
| Approach | Dependencies | Complexity | Token Savings | Persistence | OpenAI-Compatible |
|---|---|---|---|---|---|
| LangChain BufferMemory | langchain (~50MB) | Low | None | No | Yes |
| LangChain WindowMemory | langchain (~50MB) | Low | Medium | No | Yes |
| LangChain SummaryMemory | langchain (~50MB) | Medium | High | No (DIY) | Yes |
| LlamaIndex | llama-index (~100MB) | Medium | Medium | No (DIY) | Yes |
| MemGPT/Letta | letta (~200MB) | Very High | Very High | Yes | Yes (complex) |
| ChromaDB | chromadb (~20MB) | Medium | Medium | Yes | Yes |
| Qdrant | qdrant (~30MB) | High | Medium | Yes | Yes |
| Rolling Summary (DIY) | None | Low | High | Yes (SQLite) | Yes |
RECOMMENDATION
Use Simple Rolling Summary (Option 5) for MeshAI because:
- Zero dependencies - No LangChain, LlamaIndex, or vector stores
- Works with current stack - Uses existing AsyncOpenAI client and SQLite
- Significant efficiency gains - Keeps last 4-6 exchanges + summary of older messages
- Persistent - Summaries stored in SQLite, survive restarts
- Simple to tune - Two params:
window_sizeandsummarize_threshold - OpenAI-compatible - Works with LiteLLM, local models, anything
- Lightweight - ~100 lines of code
Implementation Steps
- Add
SimpleRollingSummaryclass (shown above) - Add summary table to SQLite schema
- Modify
OpenAIBackend.generate()to use_memory.get_context_messages() - Add summary storage methods to
ConversationHistory - Configure:
window_size=4(8 messages),summarize_threshold=10
Expected Performance
Before (full history):
- 20 message pairs = ~3000 tokens sent every request
- Latency: higher, costs more
After (rolling summary):
- Summary (~100 tokens) + 4 recent pairs (~400 tokens) = ~500 tokens
- 83% token reduction for long conversations
- Faster responses, lower costs
When to Consider Alternatives
- Vector stores (ChromaDB): If you need semantic search across users or topics
- LangChain SummaryMemory: If you want a batteries-included solution (accept dependency)
- MemGPT: If conversations become complex multi-day dialogues (they won't on mesh)
Example Usage
# Initialize
backend = OpenAIBackend(config, api_key)
# First few messages - full history sent
await backend.generate(
messages=[
{"role": "user", "content": "What's the weather?"},
{"role": "assistant", "content": "It's sunny!"},
{"role": "user", "content": "Should I bring an umbrella?"},
{"role": "assistant", "content": "No need, it's clear!"},
# ... 6 more exchanges ...
],
system_prompt="You are a helpful assistant.",
user_id="!abc123",
)
# After 10+ messages - summary + recent sent
# Context sent to LLM:
# [
# {"role": "system", "content": "Previous conversation summary: User asked about weather and outdoor activities. Confirmed sunny weather, no rain expected."},
# {"role": "user", "content": "Should I bring an umbrella?"},
# {"role": "assistant", "content": "No need, it's clear!"},
# ... (last 4 exchanges)
# ]
Code Files to Modify
meshai/memory.py(NEW) - AddSimpleRollingSummaryclassmeshai/history.py- Add summary storage methods + table schemameshai/backends/openai_backend.py- Integrate memory managermeshai/responder.py- Passuser_idto backend.generate()meshai/config.py- Add config for window_size, summarize_threshold
Let me know if you want me to implement this!