Memory System - Umber Architecture Demo

Memory System Architecture

Umber's memory system captures, stores, retrieves, and consolidates knowledge about you. Here's how the pieces fit together.

💬

Conversation

User + Umber exchange

→

🧠

LLM Extraction

Intelligent memory detection

→

💾

DynamoDB

Persistent storage

🔍
Retrieval
Fast path + semantic

←

🔗

Entity Linking

Knowledge graph

←

📊

Embeddings

Vector representations

🌙

Nightly Consolidation

Merge, link, strengthen

7

Memory Types

<150ms

Fast Path Latency

5

Consolidation Actions

💡 Fire-and-Forget Pattern

Memory extraction never blocks your conversation. When you chat with Umber, responses come instantly while memory analysis happens in the background. If extraction fails, you don't notice — the chat continues seamlessly.

Memory Data Model

Each memory is a rich document stored in DynamoDB with vectors, metadata, and entity links.

🗄 ai-pa-agent-memory-{env}

Primary Keys

userId (PK) memoryId (SK)

Core Content

type content rawContent contentHash source

Scoring & Relevance

confidence strength importance decayRate accessCount

Vector Embeddings

embedding[] embeddingModel valueVector valueWeights

Relationships

tags[] associations[] linkedEntityIds[] metadata{}

❤

preference

Likes and dislikes, both explicit and implicit

"Coffee is my jam" → Likes coffee

📋

fact

Factual information about you or your world

"I work at Acme Corp"

🎯

plan

Intentions, goals, and future activities

"Planning a trip to Japan next spring"

😊

feeling

Emotional state, concerns, and sentiments

"Stressed about the deadline"

🧩

inferred_interest

Interests detected from questions (needs 2+ mentions)

Asking about sailing twice → Interest in sailing

💬

conversation_topic

Summary of what was discussed

"Discussed vacation planning"

💡

recommendation

What Umber suggested to you

"Recommended trying the new Thai place"

Memory Extraction

When you chat with Umber, an LLM analyzes both your message and the response to extract memories.

✨

Intelligent Extraction

LLM-based, not regex patterns

Why LLM over regex? Natural language is varied. The LLM catches nuanced expressions like "Coffee is my jam", "mornings are tough", or questions that reveal interests — things rigid patterns miss.

👤

User Message

"I've been really into sailing lately. Any good spots near Boston?"

→

U

Umber Response

"Great choice! Check out Boston Harbor or Marblehead for sailing..."

→

🧠

IntelligentMemoryExtractor

Analyzes both sides of conversation

📤

Extracted Memories

From the conversation above

inferred_interest Interested in sailing

fact Lives near Boston

recommendation Suggested Boston Harbor and Marblehead for sailing

// Fire-and-forget pattern - never blocks the response
Promise.race([
  extractor.extractFromExchange({ userId, userMessage, assistantResponse })
    .then(async (result) => {
      if (result?.memories?.length) {
        await memoryService.storeIntelligentMemories(userId, result.memories);
      }
    }),
  new Promise((_, reject) => setTimeout(() => reject(), 15000))
]).catch(error => logger.warn('Memory extraction failed'));

return { reply: response };  // Response sent immediately

Memory Retrieval

Two paths: fast entity-based lookup for agent context, and full semantic search for deep queries.

System 1

Fast Path

<150ms latency budget

1 Extract entity "Bob" from query
2 GSI lookup by entity ID
3 Check 5-min retrieval cache
4 Return linked memories

System 2

Semantic Search

Full-featured retrieval

1 Convert query to embedding vector
2 Vector similarity + lexical match
3 Reciprocal Rank Fusion (RRF)
4 Persona-weighted relevance scoring
5 Apply temporal decay
6 Reinforce accessed memories

🔄 Reinforcement on Access

Every time a memory is retrieved, it gets reinforced:

accessCount increments
strength boosts by +0.05 (capped at 1.0)
confidence boosts by +0.02 (capped at 0.99)
lastAccessed timestamp updates

Frequently accessed memories stay strong; rarely accessed ones fade.

Nightly Consolidation

Every night, an LLM analyzes recent memories to merge duplicates, link related facts, and build the knowledge graph.

2:00 AM — Discovery Phase

Find Consolidation Candidates

Lambda queries for memories created in the last 24h that haven't been consolidated. Candidates are scored by: recency, low associations, semantic similarity, entity overlap.

2:05 AM — Analysis Phase

LLM Analyzes Relationships

For each batch of 25 memories, an LLM examines relationships with chain-of-thought reasoning. It considers subject/scope constraints — won't merge memories about different people.

2:30 AM — Execution Phase

Apply Consolidation Actions

Atomic transactions apply the changes: merging memories, linking entities, creating associations. Snapshots saved for 30-day rollback capability.

Consolidation Actions

🤝

MERGE

Combine two memories about the same fact into one

🔗

ASSOCIATE

Link related memories without merging

📝

UPDATE_ENTITY

Add attributes or aliases to entities

👥

LINK_ENTITIES

Connect entities (Fred = Frederick)

✨

CREATE_ENTITY

Create new entity from analysis

📋

"Bob works at Acme"

+

📋

"Bob is in engineering"

=

✅

"Bob works in engineering at Acme"

🛡 Safety Guardrails

The LLM respects subject/scope constraints. It won't merge "Bob likes coffee" with "Alice likes tea" just because both are preferences. Before-state snapshots enable 30-day rollback if needed.

Decay & Reinforcement

Memories have a natural decay rate. Access reinforces them; neglect lets them fade.

💪

Memory Strength Over Time

Simulated decay with and without access

Frequently accessed

0.95

Occasionally accessed

0.65

Rarely accessed

0.25

Never accessed

0.08

+0.05

Strength boost per access

+0.02

Confidence boost per access

0.01-0.05

Typical decay rate

🗜

Memory Compression

For old, infrequently accessed memories

Memories that are >7 days old and accessed <3 times become candidates for compression. An LLM summarizes and merges them, reducing storage while preserving the essence.

❌

Supersession

When new information contradicts old

When a new memory contradicts an existing one, the old memory is marked as superseded:

SUPERSEDED

"Bob works at Acme Corp"

→

ACTIVE

"Bob now works at NewCo"

Old memories are retained with isActive: false for audit trail.