Multi-Stage Intent Classification

Two-tier System 1/System 2 architecture for fast, accurate message routing

🧠

Architecture Overview

ADR-007, ADR-013
User Message
"I want to track my expenses"
System 1
Embedding Classifier
~300ms (warm)
Confidence ≥ 0.85?
YES →
Fast Path
Route directly
Total: ~300-500ms
NO →
System 2
Parallel Classifiers
~1200-1500ms
Supervisor
Complex routing
Total: ~2000-4000ms

System 1: Embedding Classifier

Fast Path

1 Generate Embedding

  • Normalize input text (lowercase, trim)
  • Call Amazon Titan Embeddings v2
  • Produce 1024-dimensional vector
  • Latency: ~50-100ms

2 KNN Search

  • Query OpenSearch intent corpus
  • HNSW approximate nearest neighbor
  • Retrieve top-K similar examples (K=5)
  • Latency: ~100-200ms (warm)

3 Calculate Confidence

  • Similarity score (0-1)
  • Margin to second-best intent
  • Agreement among top-K results
  • Multi-signal weighted formula

4 Route or Escalate

  • High confidence (≥0.85) → Fast path
  • Low confidence → System 2 fallback
  • Record escalation for learning
// Confidence Calculation Formula
const calibrated =
  similarity * 0.4 +
  margin * 0.3 +
  agreement * 0.3;

const isHighConfidence =
  calibrated >= 0.85 &&
  margin >= 0.15 &&
  similarity >= 0.75;
🔀

System 2: Parallel Intent Classifiers

LLM Fallback

When System 1 confidence is low, ALL classifiers run in parallel (~1200ms total instead of ~5000ms sequential). No short-circuits - complete classification preserves ground truth for learning.

🎯
Mode Classifier
fast vs full processing
🎨
Overlay Classifier
image, weather, shopping
💭
Brainstorm Classifier
creative ideation
🤝
Engagement Classifier
follow-up vs new topic
📋
Task Classifier
decompose, create, query
📊
Tracking Classifier
create, entry, query
🔄
Routine Classifier
completion, skip, query
Approach Latency Accuracy Learning
Sequential (legacy) ~5000ms High Partial (short-circuits)
Parallel (current) ~1200ms High Complete (all classifiers)
Embedding + Parallel ~300-1500ms High Continuous (distillation)
🏷️

Intent Types & Routing

ADR-008

Fast-Path Eligible (bypass supervisor)

fast_response
Greetings, chitchat, acknowledgments
overlay_image_search
Image/photo searches
overlay_video_search
Video searches
overlay_weather
Weather queries
overlay_shopping
Product searches
overlay_language_learning
Language practice requests

Supervisor-Routed (complex reasoning)

decompose_task
Complex goals needing breakdown
planning_request
Strategic/future planning
tracking_create
Start tracking something
tracking_entry
Log data to tracker
tracking_query
Query tracked data
routine_completion
Completed a routine
routine_skip
Skipped a routine
routine_query
Ask about routines
brainstorm
Creative ideation sessions
general_query
General questions/requests
🔄

Continuous Learning Loop

Distillation

When System 1 escalates to System 2, the LLM's classification is recorded. Every 5 minutes, the distillation worker promotes high-confidence LLM results back to the corpus, so System 1 learns and handles similar messages directly next time.

1
Low Confidence
2
LLM Classifies
3
Record Escalation
4
Distillation (5m)
5
Corpus Updated
📚

Architecture Documentation

Key Implementation Files

📊

Performance Metrics

Metric System 1 (Embedding) System 2 (LLM) Target
Latency (warm) ~300ms ~1200-1500ms < 500ms for fast-path
Latency (cold) 5-45s ~2000ms Mitigated by keep-alive
Confidence Threshold ≥ 0.85 Balance accuracy/speed
System 1 Hit Rate ~21% (current) 60-80% after seeding
Intent Accuracy ~95% ≥ 90%