╔════════════════════════════════════════════════════════════════════════════════╗ ║ LLM MEMORY APPROACHES COMPARISON ║ ╚════════════════════════════════════════════════════════════════════════════════╝ ┌────────────────────────────────────────────────────────────────────────────────┐ │ 1. FULL HISTORY (Current MeshAI Implementation) │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ Request 1: [System] + [Msg1, Msg2] = 200 tokens │ │ Request 5: [System] + [Msg1...Msg10] = 1000 tokens │ │ Request 10: [System] + [Msg1...Msg20] = 2000 tokens │ │ Request 20: [System] + [Msg1...Msg40] = 4000 tokens │ │ │ │ ✓ Complete context │ │ ✗ Linear growth in tokens │ │ ✗ Expensive and slow for long conversations │ │ ✗ Redundant - most messages not relevant to current query │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ┌────────────────────────────────────────────────────────────────────────────────┐ │ 2. WINDOW MEMORY (Keep Last N Only) │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ Request 1: [System] + [Msg1, Msg2] = 200 tokens │ │ Request 5: [System] + [Msg7, Msg8, Msg9, Msg10] = 500 tokens │ │ Request 10: [System] + [Msg17, Msg18, Msg19, Msg20] = 500 tokens │ │ Request 20: [System] + [Msg37, Msg38, Msg39, Msg40] = 500 tokens │ │ │ │ ✓ Constant token usage │ │ ✓ Very fast and cheap │ │ ✗ Completely forgets old context │ │ ✗ Can't reference earlier conversation │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ┌────────────────────────────────────────────────────────────────────────────────┐ │ 3. ROLLING SUMMARY (RECOMMENDED) │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ Request 1-5: [System] + [Msg1...Msg10] = 1000 tokens │ │ (Short conversation - no summary yet) │ │ │ │ Request 10+: [System + Summary] + [Recent 8 msgs] = 600 tokens │ │ │ │ ┌─────────────────────────────────────┐ │ │ │ Summary: "User discussed weather │ │ │ │ and hiking. Mt Si is 4hr moderate │ │ │ │ hike, Rattlesnake is 2mi easier." │ (100 tokens) │ │ └─────────────────────────────────────┘ │ │ ↓ │ │ ┌─────────────────────────────────────┐ │ │ │ User: How crowded does it get? │ │ │ │ Assistant: Very crowded weekends │ │ │ │ User: Any other trails nearby? │ (400 tokens) │ │ │ Assistant: Rattlesnake is closer │ │ │ │ ... (last 4 exchanges) │ │ │ └─────────────────────────────────────┘ │ │ │ │ Request 20: [System + Summary] + [Recent 8 msgs] = 600 tokens │ │ (Summary updated every ~8 new messages) │ │ │ │ ✓ Balanced token usage (70-80% reduction) │ │ ✓ Preserves long-term context via summary │ │ ✓ Recent messages in full detail │ │ ✓ Scalable to very long conversations │ │ ✗ Small overhead for summary generation (1-2s every 8-10 msgs) │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ┌────────────────────────────────────────────────────────────────────────────────┐ │ 4. VECTOR STORE MEMORY (ChromaDB/Qdrant) │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ Current query: "What trails are nearby?" │ │ ↓ (embed and search) │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ Vector DB: Find semantically similar past messages │ │ │ │ - "Mt Si is a moderate 4-hour hike" (score: 0.89) │ │ │ │ - "Rattlesnake Ledge has lake views" (score: 0.85) │ │ │ │ - "Bring water and snacks" (score: 0.62) │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ ↓ │ │ [System + Top 3 relevant] + [Current query] = 500 tokens │ │ │ │ ✓ Semantic retrieval - finds relevant context │ │ ✓ Works for sparse conversations │ │ ✓ Enables cross-conversation search │ │ ✗ Requires embeddings (API calls or local model) │ │ ✗ Adds complexity (vector DB, indexing) │ │ ✗ May retrieve irrelevant "similar" messages │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ┌────────────────────────────────────────────────────────────────────────────────┐ │ 5. MEMGPT/LETTA (Self-Editing Memory) │ ├────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────┐ │ │ │ Core Memory (always in context): │ │ │ │ - User: Matt │ (50 tokens) │ │ │ - Preferences: Metric units │ │ │ └───────────────────────────────────┘ │ │ ↓ │ │ ┌───────────────────────────────────┐ │ │ │ Recall Memory (vector search): │ │ │ │ - [Retrieved: 3 relevant msgs] │ (300 tokens) │ │ └───────────────────────────────────┘ │ │ ↓ │ │ ┌───────────────────────────────────┐ │ │ │ Archival Memory (long-term): │ │ │ │ - [Searchable but not loaded] │ │ │ └───────────────────────────────────┘ │ │ │ │ Agent decides what to remember/forget/search │ │ │ │ ✓ Most sophisticated - agent manages own memory │ │ ✓ Handles complex multi-day conversations │ │ ✗ Very heavy (200MB+ dependencies) │ │ ✗ Requires vector embeddings │ │ ✗ Overkill for simple chat │ │ ✗ Opinionated architecture (hard to integrate) │ │ │ └────────────────────────────────────────────────────────────────────────────────┘ ╔════════════════════════════════════════════════════════════════════════════════╗ ║ RECOMMENDATION MATRIX ║ ╚════════════════════════════════════════════════════════════════════════════════╝ ┌──────────────┬──────────────┬────────────┬──────────────┬──────────────────────┐ │ Approach │ Dependencies │ Tokens │ Complexity │ Use Case │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ Full History │ None │ High │ Low │ Don't use (baseline) │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ Window Only │ None │ Low │ Low │ Stateless chat bots │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ Rolling │ │ │ │ ✓ MESHAI │ │ Summary │ None │ Very Low │ Low │ ✓ Most projects │ │ (DIY) │ │ │ │ ✓ Best balance │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ LangChain │ ~50 MB │ Very Low │ Medium │ Want batteries- │ │ Summary │ │ │ │ included solution │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ Vector Store │ ~20 MB │ Low │ Medium │ Semantic search, │ │ (ChromaDB) │ │ │ │ long-term memory │ ├──────────────┼──────────────┼────────────┼──────────────┼──────────────────────┤ │ MemGPT/Letta │ ~200 MB │ Low │ Very High │ Complex multi-day │ │ │ │ │ │ agent workflows │ └──────────────┴──────────────┴────────────┴──────────────┴──────────────────────┘ ╔════════════════════════════════════════════════════════════════════════════════╗ ║ PERFORMANCE COMPARISON (20 messages) ║ ╚════════════════════════════════════════════════════════════════════════════════╝ Tokens Sent to LLM ↑ │ 4000│ ████████████████████████████████ Full History │ 3000│ │ 2000│ │ 1000│ │ 600│ ██████ Rolling Summary 500│ █████ Window Only │ █████ Vector Store 0└─────────────────────────────────────────────────────────→ 1 5 10 15 20 25 30 35 40 (Conversation length) Legend: ████ Full History (linear growth) ████ Rolling Summary (plateau after initial growth) ████ Window/Vector (constant) ╔════════════════════════════════════════════════════════════════════════════════╗ ║ IMPLEMENTATION COMPLEXITY ║ ╚════════════════════════════════════════════════════════════════════════════════╝ ┌─────────────────────────────────────────────────────────────────────────────┐ │ Simple ←───────────────────────────────────────────────────→ Complex │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ Window Only Rolling Summary LangChain MemGPT │ │ (20 lines) (100 lines) (10 lines (200+ lines │ │ + 50MB dep) + 200MB dep) │ │ │ │ ↑ ↑ ↑ ↑ │ │ No deps No deps Heavy deps Very heavy │ │ No persistence SQLite persist In-memory Built-in DB │ │ Loses old context Keeps summary Keeps summary Multi-tier │ │ │ │ ★ RECOMMENDED ★ │ └─────────────────────────────────────────────────────────────────────────────┘ ╔════════════════════════════════════════════════════════════════════════════════╗ ║ FOR MESHAI SPECIFICALLY ║ ╚════════════════════════════════════════════════════════════════════════════════╝ Current: - Messages: 150 chars max (very small) - Conversations: Per-user, linear - Backend: OpenAI-compatible (LiteLLM, local models) - Storage: SQLite + aiosqlite - Problem: Full history sent every time Constraints: - Lightweight (runs on mesh nodes potentially) - No heavy dependencies - Must work offline (local models) - Persistence required (survive restarts) Solution: Rolling Summary ✓ Zero dependencies (pure Python) ✓ Works with existing AsyncOpenAI client ✓ Persists in existing SQLite database ✓ ~100 lines of code (easy to maintain) ✓ 70-80% token reduction ✓ Tunable (window_size, summarize_threshold) Configuration: - window_size = 4 (keep last 4 exchanges = 8 messages) - summarize_threshold = 8 (re-summarize after 8 new messages) Expected savings: - 10 messages: 0% (no summary yet) - 20 messages: 66% token reduction - 30 messages: 75% token reduction - 50 messages: 84% token reduction Cost impact (at $0.50/1M tokens): - Before: $0.0012 per request (2400 tokens) - After: $0.0003 per request (600 tokens) - Savings: $27/month for 1000 requests/day ╔════════════════════════════════════════════════════════════════════════════════╗ ║ NEXT STEPS ║ ╚════════════════════════════════════════════════════════════════════════════════╝ 1. Read: MEMORY_SUMMARY.md (quick overview) 2. Study: MEMORY_RESEARCH.md (detailed analysis) 3. Test: python examples/memory_comparison.py (see it in action) 4. Build: MEMORY_IMPLEMENTATION_GUIDE.md (step-by-step) 5. Deploy: Monitor and tune based on real usage Files created: - /home/zvx/projects/meshai/MEMORY_SUMMARY.md - /home/zvx/projects/meshai/MEMORY_RESEARCH.md - /home/zvx/projects/meshai/MEMORY_IMPLEMENTATION_GUIDE.md - /home/zvx/projects/meshai/examples/memory_comparison.py Good luck! 🚀