Intent Classification Flow

🧠

Architecture Overview

ADR-007, ADR-013

User Message

"I want to track my expenses"

→

System 1

Embedding Classifier

~300ms (warm)

→

Confidence ≥ 0.85?

YES →

Fast Path

Route directly

Total: ~300-500ms

NO →

System 2

Parallel Classifiers

~1200-1500ms

→

Supervisor

Complex routing

Total: ~2000-4000ms

⚡

System 1: Embedding Classifier

Fast Path

1 Generate Embedding

Normalize input text (lowercase, trim)
Call Amazon Titan Embeddings v2
Produce 1024-dimensional vector
Latency: ~50-100ms

2 KNN Search

Query OpenSearch intent corpus
HNSW approximate nearest neighbor
Retrieve top-K similar examples (K=5)
Latency: ~100-200ms (warm)

3 Calculate Confidence

Similarity score (0-1)
Margin to second-best intent
Agreement among top-K results
Multi-signal weighted formula

4 Route or Escalate

High confidence (≥0.85) → Fast path
Low confidence → System 2 fallback
Record escalation for learning

// Confidence Calculation Formula
const calibrated =
  similarity * 0.4 +
  margin * 0.3 +
  agreement * 0.3;

const isHighConfidence =
  calibrated >= 0.85 &&
  margin >= 0.15 &&
  similarity >= 0.75;

🔀

System 2: Parallel Intent Classifiers

LLM Fallback

When System 1 confidence is low, ALL classifiers run in parallel (~1200ms total instead of ~5000ms sequential). No short-circuits - complete classification preserves ground truth for learning.

🎯

Mode Classifier

fast vs full processing

🎨

Overlay Classifier

image, weather, shopping

💭

Brainstorm Classifier

creative ideation

🤝

Engagement Classifier

follow-up vs new topic

📋

Task Classifier

decompose, create, query

📊

Tracking Classifier

create, entry, query

🔄

Routine Classifier

completion, skip, query

Approach	Latency	Accuracy	Learning
Sequential (legacy)	~5000ms	High	Partial (short-circuits)
Parallel (current)	~1200ms	High	Complete (all classifiers)
Embedding + Parallel	~300-1500ms	High	Continuous (distillation)

🏷️

Intent Types & Routing

ADR-008

Fast-Path Eligible (bypass supervisor)

fast_response

Greetings, chitchat, acknowledgments

overlay_image_search

Image/photo searches

overlay_video_search

Video searches

overlay_weather

Weather queries

overlay_shopping

Product searches

overlay_language_learning

Language practice requests

Supervisor-Routed (complex reasoning)

decompose_task

Complex goals needing breakdown

planning_request

Strategic/future planning

tracking_create

Start tracking something

tracking_entry

Log data to tracker

tracking_query

Query tracked data

routine_completion

Completed a routine

routine_skip

Skipped a routine

routine_query

Ask about routines

brainstorm

Creative ideation sessions

general_query

General questions/requests

🔄

Continuous Learning Loop

Distillation

When System 1 escalates to System 2, the LLM's classification is recorded. Every 5 minutes, the distillation worker promotes high-confidence LLM results back to the corpus, so System 1 learns and handles similar messages directly next time.

1

Low Confidence

→

2

LLM Classifies

→

3

Record Escalation

→

4

Distillation (5m)

→

5

Corpus Updated

↺

📚

Architecture Documentation

📄 ADR-007: Embedding Classifier 📄 ADR-013: Parallel Classifiers 📄 ADR-006: Unified Classifier 📄 ADR-008: Routing Matrix

Key Implementation Files

📁 EmbeddingClassifierService.ts 📁 ParallelIntentOrchestratorService.ts 📁 IntentCorpusStore.ts 📁 DistillationService.ts 📁 ChatStreamOrchestrator.ts

📊

Performance Metrics

Metric	System 1 (Embedding)	System 2 (LLM)	Target
Latency (warm)	~300ms	~1200-1500ms	< 500ms for fast-path
Latency (cold)	5-45s	~2000ms	Mitigated by keep-alive
Confidence Threshold	≥ 0.85		Balance accuracy/speed
System 1 Hit Rate	~21% (current)		60-80% after seeding
Intent Accuracy	~95%		≥ 90%

Multi-Stage Intent Classification