Agent Registry - Umber Architecture Demo

Agent Registry Architecture

The Agent Registry is a three-layered system for managing agent lifecycle, discovery, and health monitoring across the platform.

🗄 Registry Management Layer

DynamicAgentRegistry

In-Memory Maps

DynamoDB Persistence

EventBridge Events

↓

🔎 Discovery & Routing Layer

AgentRegistry

DynamicAgentLoader

HotReloadManager

ManifestLoader

↓

📊 Monitoring & Capability Layer

HealthMonitor

CapabilityIndex

HeartbeatPublisher

ServiceRegistry

55+

Registered Agents

4

Discovery Sources

4

Index Dimensions

30s

Hot Reload Interval

🏗

Key Design Patterns

Production-grade agent orchestration

💾 Dual Storage

Hot (memory) + Cold (DynamoDB)

🔍 Multi-Index

4 views of capability data

📡 Event-Driven

Reactive status propagation

💚 Health-Aware

Route only to healthy agents

💡 Why a Registry?

The registry enables dynamic agent orchestration — agents can be added, removed, or updated without redeploying the platform. It provides fast capability lookup, health-aware routing, and real-time configuration updates via hot reload.

Agent Registration

Agents register via two approaches: programmatic registration at runtime, or manifest-based discovery from configuration files.

📝

Programmatic Registration

Runtime agent registration

Agent instances register themselves with the registry at startup. The registry stores the instance, persists metadata, and publishes lifecycle events.

const registry = new DynamicAgentRegistry(config);
await registry.registerAgent(agentInstance);

// Internally:
// 1. Store in Map<agentId, AgentBase>
// 2. Create RegisteredAgent metadata
// 3. Persist to DynamoDB
// 4. Publish 'AgentRegistered' event
// 5. Set up status/capacity listeners
// 6. Initialize the agent

📄

Manifest-Based Registration

Declarative agent configuration

Agents can also be discovered via manifests that declare capabilities, dependencies, configuration, and monitoring requirements.

id string Unique agent identifier

name string Human-readable name

type enum core | specialist | coordinator | utility

capabilities object Core capabilities + enhancements

dependencies object Required services and agents

configuration object Environment, features, limits

monitoring object Metrics and health checks

📊

Registered Agent Metadata

What gets stored for each agent

interface RegisteredAgent {
  id: string;
  name: string;
  type: string;                // "supervisor", "general-agent"
  status: 'available' | 'unavailable' | 'degraded';
  capabilities: string[];       // Array of capability IDs
  config: AgentConfig;
  capacity: {
    maxConcurrent: number;
    current: number;
  };
  metrics: {
    requestsHandled: number;
    avgResponseTime: number;
    errorRate: number;
  };
  lastHealthCheck: Date;
  responseQueue: AgentQueueMetadata;
}

Capability Index

The CapabilityIndex maintains 4 bidirectional indices for fast, efficient lookup of agents by their capabilities.

4 Bidirectional Indices

Capability → Agents

→

scheduling → calendar, task

Agent → Capabilities

→

calendar-agent → scheduling, analysis

Domain → Agents

→

calendar → calendar, scheduling

Action → Agents

→

schedule-meeting → calendar, meeting

🔎

Query Operations

Fast capability searches

// Find agents with single capability
findByCapability('scheduling'): string[]

// Find agents with ALL capabilities (intersection)
findByCapabilities(['scheduling', 'email']): string[]

// Complex search with modes
search({
  capabilities: ['travel-planning', 'booking'],
  domains: ['weather'],
  mode: 'any'  // 'any' = union, 'all' = intersection
}): string[]

// Get index statistics
getStats(): {
  totalAgents, totalCapabilities,
  totalActions, totalDomains,
  capabilityDistribution, actionDistribution
}

⚡ O(1) Lookups

All indices are backed by Map<string, Set<string>> structures, providing constant-time lookups. The multi-index design allows queries from any dimension — find by capability, action, domain, or agent ID.

🧪

Example: Travel Query

Finding the right agents for a complex task

Query

"Plan my trip to Rome with flights and weather"

↓

Capability Search

                  search({ capabilities: ['travel-planning', 'booking'], domains: ['weather'], mode: 'any' })
                

↓

Matched Agents

travel-integrator mobility-agent weather-agent

Health Monitoring

Three-tier health checking system validates Lambda functions, HTTP endpoints, and container services.

1

Lambda Function Health

Checks function state via AWS SDK

State = 'Active', LastUpdateStatus = 'Successful'

2

HTTP Endpoint Health

GET request to /health endpoint

Status 200, timeout 10s

3

Container (ECS) Health

DescribeServicesCommand validation

running > 0, pending = 0, status = 'ACTIVE'

✅

Healthy

2+ consecutive successes

⚠

Degraded

Partial failures detected

❌

Unhealthy

3+ consecutive failures

📊

Health Status Tracking

Per-agent health history

interface HealthStatus {
  agentId: string;
  consecutiveFailures: number;
  consecutiveSuccesses: number;
  lastCheck: HealthCheckResult;
  history: HealthCheckResult[];  // Last 10 checks
}

interface HealthCheckResult {
  agentId: string;
  healthy: boolean;
  timestamp: number;
  responseTime?: number;
  error?: string;
  details?: Record<string, any>;
}

💓

Heartbeat Publishing

Agent liveness signals to DynamoDB

Agents publish heartbeats to the registry table, enabling distributed health tracking and CloudWatch metrics emission.

interface HeartbeatPayload {
  agentId: string;
  agentType: string;
  status: 'healthy' | 'unhealthy' | 'degraded';
  health?: {
    status: string;
    lastCheck?: string;
    checks?: Array<{ name, status, message }>;
  };
  metrics?: Record<string, any>;
  ttlSeconds?: number;
}

Agent Discovery

The DynamicAgentLoader pulls configurations from 4 sources, with later sources overriding earlier ones.

📦

S3 Manifests

Priority 1

JSON manifest files stored in S3 bucket. Primary source for production agent configurations.

s3://agent-manifests/manifests/{agentId}/manifest.json

📁

Filesystem

Priority 2

Agent implementations directory with pre-registered agent types for local development.

agents/implementations/{agentType}/

🗃

DynamoDB

Priority 3

Registry table scan for enabled agents. Includes heartbeat and runtime status.

ai-pa-agent-registry-{env}

🌐

External API

Priority 4

Remote registry API endpoint for cross-region or external agent discovery.

$AGENT_REGISTRY_API_ENDPOINT

🔄 Merge Strategy

Configurations are merged in priority order — API overrides DynamoDB, DynamoDB overrides Filesystem, Filesystem overrides S3. This allows runtime overrides without changing base manifests.

📡

EventBridge Integration

Registry lifecycle events

AgentRegistered New agent joined the registry

AgentUpdated Agent config or status changed

AgentUnregistered Agent removed from registry

AgentStatusUpdated Health status transition

AgentMetricsUpdated Performance metrics published

Hot Reload

The HotReloadManager polls S3 every 30 seconds for manifest changes, enabling real-time configuration updates without restart.

1

List Manifests

List all manifest files in S3 bucket, capturing ETags for change detection

2

Compare Snapshots

Compare current ETags against previous poll to detect added, updated, or removed files

3

Fetch Changes

Download content for new or updated manifests from S3

4

Emit Events

Publish 'added', 'updated', or 'removed' events for each detected change

5

Update Registry

Registry listeners react to events, updating in-memory state and capability indices

📢

Change Events

EventEmitter interface

// Subscribe to hot reload events
hotReloadManager.on('added', (change) => {
  console.log(`New agent: ${change.agentId}`);
});

hotReloadManager.on('updated', (change) => {
  console.log(`Updated: ${change.agentId}`);
  console.log(`Previous: ${change.previousVersion}`);
  console.log(`New: ${change.newVersion}`);
});

hotReloadManager.on('removed', (change) => {
  console.log(`Removed: ${change.agentId}`);
});

interface HotReloadChange {
  type: 'added' | 'updated' | 'removed';
  agentId: string;
  manifest?: AgentManifestContent;
  previousVersion?: string;
  newVersion?: string;
  timestamp: Date;
}

30s

Poll Interval

ETag

Change Detection

0

Downtime

🚀 Zero-Downtime Updates

Hot reload enables configuration changes without restarting the platform. Update an agent's capabilities, add new agents, or remove deprecated ones — all by modifying S3 manifests. Changes propagate within 30 seconds.

📊

Status Tracking

Monitor hot reload health

getStatus(): {
  enabled: boolean;
  isPolling: boolean;
  lastPollAt?: Date;
  lastChangeAt?: Date;
  totalAgents: number;
  pollCount: number;
  errorCount: number;
}