Quality Monitoring & Drift Detection
Three complementary systems ensure data quality, goal decomposition quality, and persona consistency across multi-turn conversations.
🐶
Data Validator
"Data Dog"
🏆
Decomposition Evaluator
Quality Gates
🧬
Persona Drift
Consistency
6
Validation Checks
5
Quality Criteria
5
Persona Traits
3
EventBridge Events
🗂
Three Monitoring Systems
Each addresses a different quality dimension
Data Validator Agent
Production real-time validation
Validates tracking data before presentation. Detects duplicates, anomalies, calculation errors, gaps, and freshness issues.
Decomposition Evaluator
Goal breakdown quality
Evaluates task decomposition across 5 criteria. Uses Canvas-based collaboration for iterative refinement.
Persona Drift Analyzer
Research/testing tool
Measures persona consistency across multi-turn conversations. Tracks 5 traits with statistical analysis.
💡 LLM + Heuristic Hybrid
All three systems use a hybrid approach: fast heuristic checks run first for quick feedback, then LLM-based evaluation provides deeper analysis. This balances speed (~300ms for heuristics) with accuracy (LLM catches edge cases).
Data Validator Agent
The "Data Dog" validates tracking data before presenting it to users. Six modular checks run in parallel for performance, producing a confidence score.
📊
Tracking Data
Up to 1,000 entries
🐶
Data Validator
Parallel checks
Confidence Level
high/medium/low
🧮
Calculation Check
Verifies math: unit price x qty = total, sum validations
Weight: 2.0x
📄
Duplicate Check
Finds duplicate or similar entries with similarity scoring
Weight: 1.5x
📈
Anomaly Check
Z-score statistical analysis for unusual patterns
Weight: 1.2x
📅
Completeness Check
Detects missing data gaps in date ranges
Weight: 1.0x
🏷
Categorization Check
Validates category confidence and consistency
Weight: 1.0x
Freshness Check
Detects stale data, future dates, sync issues
Weight: 0.8x
Validation Modes
Choose speed vs. thoroughness
Full Validation
All 6 checks, detailed results
~1-2s
Quick Check
Duplicate + Calculation only
~300-500ms
Single Check
Run specific check in isolation
~100-200ms
// Parallel execution of all checks
const checkResults = await Promise.allSettled([
  completenessCheck.run(entries, context),
  duplicateCheck.run(entries, context),
  categorizationCheck.run(entries, context),
  calculationCheck.run(entries, context),
  freshnessCheck.run(entries, context),
  anomalyCheck.run(entries, context),
]);

// Weighted score aggregation
const weights = {
  calculation: 2.0,   // Math errors are serious
  duplicate: 1.5,     // Duplicates affect totals
  anomaly: 1.2,       // Unusual patterns need attention
  completeness: 1.0,  // Standard importance
  categorization: 1.0,
  freshness: 0.8,     // Less critical
};
Decomposition Evaluator
Evaluates goal decomposition quality across 5 criteria. Uses Canvas-based collaboration for iterative refinement until quality thresholds are met.
0.88
Granularity
15min-4hr tasks
0.92
Realism
Feasible estimates
0.75
Coverage
Achieves goal
0.85
Progression
Learning curve
0.90
Actionability
Clear steps
🚦
Quality Thresholds
Default evaluation criteria
minOverallScore 0.7 Minimum 70% quality
maxHoursPerSubtask 4 No task longer than 4 hours
minMinutesPerSubtask 15 No task shorter than 15 minutes
minEstimatesCoverage 0.8 80% of tasks need time estimates
maxParallelSubtasks 3 Maximum 3 concurrent tasks
Issue Types Detected
Heuristic + LLM detection
compressed_time vague_definition missing_progression too_large too_small missing_dependencies incomplete_coverage unrealistic_parallel missing_milestone
🔄 Iterative Refinement via Canvas
When quality falls below threshold, the evaluator posts a QUESTION annotation to the Canvas requesting refinement. The TaskAgent sees the feedback, revises the decomposition, and the evaluator re-evaluates. This continues up to 5 iterations until quality passes, then an AGENT_INSIGHT approval is posted.
// Evaluation pipeline
async evaluate(decomposition: TaskDecomposition): EvaluationResult {
  // Fast heuristics first (~50ms)
  const heuristicIssues = this.runHeuristicChecks(decomposition);

  // LLM evaluation for deeper analysis (~500ms)
  const llmAssessment = await this.llmEvaluate(decomposition, {
    model: modelTiers.getModelId('reasoning'),
    temperature: 0.3,  // Consistent evaluation
  });

  // Merge results and determine action
  return this.mergeAssessments(heuristicIssues, llmAssessment);
}
Persona Drift Analyzer
A research/testing tool that measures persona consistency across multi-turn conversations. Tracks 5 personality traits with statistical analysis to detect drift from baseline.
Warmth
Reserved Warm
Formality
Casual Formal
Brevity
Verbose Concise
Proactiveness
Passive Proactive
Empathy
Neutral Empathetic
Multi-Turn Drift Analysis
1
Baseline
1.00
3
Turn
0.92
5
Turn
0.88
7
Turn
0.82
Drift threshold: 0.85 (15% tolerance)
📊
Drift Detection Metrics
Statistical analysis per trait
Drift from Baseline
Absolute difference from turn 1
Standard Deviation
Consistency measure per trait
Maximum Drift
Worst-case deviation observed
Overall Consistency
0-1 score across all traits
📝 Reinforcement Recommendations
Based on drift analysis, the system recommends persona reinforcement frequency:
  • No drift: "Current implementation sufficient"
  • Moderate drift (8-15%): "Reinforce every 5 turns"
  • Significant drift (>15%): "Reinforce every 3 turns"
Confidence Scoring
Each system produces a confidence level through weighted aggregation of check results. The Data Validator uses a particularly sophisticated weighting scheme.
Overall Confidence
HIGH
Score: 0.91
HIGH
≥ 0.9
No errors
≤ 3 warnings
MEDIUM
0.7 - 0.9
Score below 0.9
OR > 3 warnings
LOW
< 0.7
Any errors
OR score below 0.7
Weighted Aggregation
Check importance varies by impact
Calculation
2.0x
Duplicate
1.5x
Anomaly
1.2x
Completeness
1.0x
Categorization
1.0x
Freshness
0.8x
Why different weights? Math errors (Calculation) directly affect totals shown to users. Duplicates inflate counts. Anomalies need investigation but may be legitimate. Freshness is less critical than accuracy.
// Confidence level determination
function determineConfidence(results: CheckResults): ConfidenceLevel {
  const errorCount = results.filter(r => r.severity === 'error').length;
  const warningCount = results.filter(r => r.severity === 'warning').length;
  const weightedScore = calculateWeightedScore(results);

  // Any errors = low confidence
  if (errorCount > 0) return 'low';

  // Score below threshold = low confidence
  if (weightedScore < 0.7) return 'low';

  // Many warnings or medium score = medium confidence
  if (warningCount > 3 || weightedScore < 0.9) return 'medium';

  // High score with few issues = high confidence
  return 'high';
}
Events & Alerts
All monitoring systems integrate with EventBridge for real-time alerting and downstream processing. Critical issues trigger immediate notifications.
validation.started
Emitted when validation begins. Includes trackingId, userId, entryCount, checkTypes.
validation.completed
Emitted when validation finishes. Includes confidence level, overall score, issue count, duration.
validation.issue.detected
Fired for each critical issue (severity='error'). Used for alerting and remediation workflows.
📡
Event Payload Structure
Standardized for downstream processing
interface ValidationEvent {
  source: 'data-validator-agent';
  detailType: 'validation.completed';
  detail: {
    trackingId: string;
    userId: string;
    correlationId: string;
    confidence: 'high' | 'medium' | 'low';
    overallScore: number;
    issueCount: {
      errors: number;
      warnings: number;
    };
    checksRun: string[];
    durationMs: number;
  };
}
🔔
Alerting Integration
SNS topics for notifications
SNS Topic
ai-pa-{env}-synthetics-alerts
Subscribe
./ops alerts subscribe --env dev
Alert Triggers:
Low confidence result Critical calculation error Data freshness > 24h Duplicate rate > 10%
📊 Observability
All three systems use structured correlation logging:
  • Agent name and ID for filtering
  • Correlation ID for request tracing
  • Operation context and performance metrics
  • CloudWatch Metrics namespace: DataValidator
🗄
Key File Locations
Where to find these systems
Data Validator agents/implementations/data-validator/
Decomposition Evaluator agents/implementations/decomposition-evaluator/
Persona Drift backend/src/services/persona/__tests__/persistence/
API Integration backend/src/services/tracking/TrackingAggregationService.ts