Memory System Architecture
Umber's memory system captures, stores, retrieves, and consolidates knowledge about you.
Here's how the pieces fit together.
💬
Conversation
User + Umber exchange
→
🧠
LLM Extraction
Intelligent memory detection
→
💾
DynamoDB
Persistent storage
🔍
Retrieval
Fast path + semantic
←
🔗
Entity Linking
Knowledge graph
←
📊
Embeddings
Vector representations
🌙
Nightly Consolidation
Merge, link, strengthen
7
Memory Types
<150ms
Fast Path Latency
5
Consolidation Actions
💡
Fire-and-Forget Pattern
Memory extraction never blocks your conversation. When you chat with Umber,
responses come instantly while memory analysis happens in the background.
If extraction fails, you don't notice — the chat continues seamlessly.
Memory Data Model
Each memory is a rich document stored in DynamoDB with vectors, metadata, and entity links.
🗄 ai-pa-agent-memory-{env}
Primary Keys
userId (PK)
memoryId (SK)
Core Content
type
content
rawContent
contentHash
source
Scoring & Relevance
confidence
strength
importance
decayRate
accessCount
Vector Embeddings
embedding[]
embeddingModel
valueVector
valueWeights
Relationships
tags[]
associations[]
linkedEntityIds[]
metadata{}
preference
Likes and dislikes, both explicit and implicit
"Coffee is my jam" → Likes coffee
fact
Factual information about you or your world
"I work at Acme Corp"
plan
Intentions, goals, and future activities
"Planning a trip to Japan next spring"
feeling
Emotional state, concerns, and sentiments
"Stressed about the deadline"
inferred_interest
Interests detected from questions (needs 2+ mentions)
Asking about sailing twice → Interest in sailing
conversation_topic
Summary of what was discussed
"Discussed vacation planning"
recommendation
What Umber suggested to you
"Recommended trying the new Thai place"
Memory Extraction
When you chat with Umber, an LLM analyzes both your message and the response to extract memories.
Intelligent Extraction
LLM-based, not regex patterns
Why LLM over regex? Natural language is varied. The LLM catches nuanced expressions like
"Coffee is my jam", "mornings are tough", or questions that reveal interests — things rigid patterns miss.
👤
User Message
"I've been really into sailing lately. Any good spots near Boston?"
→
U
Umber Response
"Great choice! Check out Boston Harbor or Marblehead for sailing..."
→
🧠
IntelligentMemoryExtractor
Analyzes both sides of conversation
Extracted Memories
From the conversation above
inferred_interest
Interested in sailing
fact
Lives near Boston
recommendation
Suggested Boston Harbor and Marblehead for sailing
// Fire-and-forget pattern - never blocks the response Promise.race([ extractor.extractFromExchange({ userId, userMessage, assistantResponse }) .then(async (result) => { if (result?.memories?.length) { await memoryService.storeIntelligentMemories(userId, result.memories); } }), new Promise((_, reject) => setTimeout(() => reject(), 15000)) ]).catch(error => logger.warn('Memory extraction failed')); return { reply: response }; // Response sent immediately
Memory Retrieval
Two paths: fast entity-based lookup for agent context, and full semantic search for deep queries.
System 1
Fast Path
<150ms latency budget
- 1 Extract entity "Bob" from query
- 2 GSI lookup by entity ID
- 3 Check 5-min retrieval cache
- 4 Return linked memories
System 2
Semantic Search
Full-featured retrieval
- 1 Convert query to embedding vector
- 2 Vector similarity + lexical match
- 3 Reciprocal Rank Fusion (RRF)
- 4 Persona-weighted relevance scoring
- 5 Apply temporal decay
- 6 Reinforce accessed memories
🔄
Reinforcement on Access
Every time a memory is retrieved, it gets reinforced:
- accessCount increments
- strength boosts by +0.05 (capped at 1.0)
- confidence boosts by +0.02 (capped at 0.99)
- lastAccessed timestamp updates
Nightly Consolidation
Every night, an LLM analyzes recent memories to merge duplicates, link related facts, and build the knowledge graph.
2:00 AM — Discovery Phase
Find Consolidation Candidates
Lambda queries for memories created in the last 24h that haven't been consolidated.
Candidates are scored by: recency, low associations, semantic similarity, entity overlap.
2:05 AM — Analysis Phase
LLM Analyzes Relationships
For each batch of 25 memories, an LLM examines relationships with chain-of-thought reasoning.
It considers subject/scope constraints — won't merge memories about different people.
2:30 AM — Execution Phase
Apply Consolidation Actions
Atomic transactions apply the changes: merging memories, linking entities, creating associations.
Snapshots saved for 30-day rollback capability.
Consolidation Actions
MERGE
Combine two memories about the same fact into one
ASSOCIATE
Link related memories without merging
UPDATE_ENTITY
Add attributes or aliases to entities
LINK_ENTITIES
Connect entities (Fred = Frederick)
CREATE_ENTITY
Create new entity from analysis
"Bob works at Acme"
+
"Bob is in engineering"
=
"Bob works in engineering at Acme"
🛡
Safety Guardrails
The LLM respects subject/scope constraints. It won't merge "Bob likes coffee" with "Alice likes tea"
just because both are preferences. Before-state snapshots enable 30-day rollback if needed.
Decay & Reinforcement
Memories have a natural decay rate. Access reinforces them; neglect lets them fade.
Memory Strength Over Time
Simulated decay with and without access
+0.05
Strength boost per access
+0.02
Confidence boost per access
0.01-0.05
Typical decay rate
Memory Compression
For old, infrequently accessed memories
Memories that are >7 days old and accessed <3 times
become candidates for compression. An LLM summarizes and merges them, reducing storage
while preserving the essence.
Supersession
When new information contradicts old
When a new memory contradicts an existing one, the old memory is marked as superseded:
SUPERSEDED
"Bob works at Acme Corp"
→
ACTIVE
"Bob now works at NewCo"
Old memories are retained with
isActive: false for audit trail.