Two-tier System 1/System 2 architecture for fast, accurate message routing
When System 1 confidence is low, ALL classifiers run in parallel (~1200ms total instead of ~5000ms sequential). No short-circuits - complete classification preserves ground truth for learning.
| Approach | Latency | Accuracy | Learning |
|---|---|---|---|
| Sequential (legacy) | ~5000ms | High | Partial (short-circuits) |
| Parallel (current) | ~1200ms | High | Complete (all classifiers) |
| Embedding + Parallel | ~300-1500ms | High | Continuous (distillation) |
When System 1 escalates to System 2, the LLM's classification is recorded. Every 5 minutes, the distillation worker promotes high-confidence LLM results back to the corpus, so System 1 learns and handles similar messages directly next time.
| Metric | System 1 (Embedding) | System 2 (LLM) | Target |
|---|---|---|---|
| Latency (warm) | ~300ms | ~1200-1500ms | < 500ms for fast-path |
| Latency (cold) | 5-45s | ~2000ms | Mitigated by keep-alive |
| Confidence Threshold | ≥ 0.85 | Balance accuracy/speed | |
| System 1 Hit Rate | ~21% (current) | 60-80% after seeding | |
| Intent Accuracy | ~95% | ≥ 90% | |