Agent Registry Architecture
The Agent Registry is a three-layered system for managing agent lifecycle,
discovery, and health monitoring across the platform.
🗄 Registry Management Layer
DynamicAgentRegistry
In-Memory Maps
DynamoDB Persistence
EventBridge Events
↓
🔎 Discovery & Routing Layer
AgentRegistry
DynamicAgentLoader
HotReloadManager
ManifestLoader
↓
📊 Monitoring & Capability Layer
HealthMonitor
CapabilityIndex
HeartbeatPublisher
ServiceRegistry
55+
Registered Agents
4
Discovery Sources
4
Index Dimensions
30s
Hot Reload Interval
Key Design Patterns
Production-grade agent orchestration
💾 Dual Storage
Hot (memory) + Cold (DynamoDB)
🔍 Multi-Index
4 views of capability data
📡 Event-Driven
Reactive status propagation
💚 Health-Aware
Route only to healthy agents
💡
Why a Registry?
The registry enables dynamic agent orchestration — agents can be
added, removed, or updated without redeploying the platform. It provides fast capability
lookup, health-aware routing, and real-time configuration updates via hot reload.
Agent Registration
Agents register via two approaches: programmatic registration at runtime,
or manifest-based discovery from configuration files.
Programmatic Registration
Runtime agent registration
Agent instances register themselves with the registry at startup. The registry
stores the instance, persists metadata, and publishes lifecycle events.
const registry = new DynamicAgentRegistry(config); await registry.registerAgent(agentInstance); // Internally: // 1. Store in Map<agentId, AgentBase> // 2. Create RegisteredAgent metadata // 3. Persist to DynamoDB // 4. Publish 'AgentRegistered' event // 5. Set up status/capacity listeners // 6. Initialize the agent
Manifest-Based Registration
Declarative agent configuration
Agents can also be discovered via manifests that declare capabilities,
dependencies, configuration, and monitoring requirements.
id
string
Unique agent identifier
name
string
Human-readable name
type
enum
core | specialist | coordinator | utility
capabilities
object
Core capabilities + enhancements
dependencies
object
Required services and agents
configuration
object
Environment, features, limits
monitoring
object
Metrics and health checks
Registered Agent Metadata
What gets stored for each agent
interface RegisteredAgent { id: string; name: string; type: string; // "supervisor", "general-agent" status: 'available' | 'unavailable' | 'degraded'; capabilities: string[]; // Array of capability IDs config: AgentConfig; capacity: { maxConcurrent: number; current: number; }; metrics: { requestsHandled: number; avgResponseTime: number; errorRate: number; }; lastHealthCheck: Date; responseQueue: AgentQueueMetadata; }
Capability Index
The CapabilityIndex maintains 4 bidirectional indices for fast,
efficient lookup of agents by their capabilities.
4 Bidirectional Indices
Capability → Agents
→
scheduling
→ calendar, task
Agent → Capabilities
→
calendar-agent
→ scheduling, analysis
Domain → Agents
→
calendar
→ calendar, scheduling
Action → Agents
→
schedule-meeting
→ calendar, meeting
Query Operations
Fast capability searches
// Find agents with single capability findByCapability('scheduling'): string[] // Find agents with ALL capabilities (intersection) findByCapabilities(['scheduling', 'email']): string[] // Complex search with modes search({ capabilities: ['travel-planning', 'booking'], domains: ['weather'], mode: 'any' // 'any' = union, 'all' = intersection }): string[] // Get index statistics getStats(): { totalAgents, totalCapabilities, totalActions, totalDomains, capabilityDistribution, actionDistribution }
⚡
O(1) Lookups
All indices are backed by
Map<string, Set<string>> structures,
providing constant-time lookups. The multi-index design allows queries from any
dimension — find by capability, action, domain, or agent ID.
Example: Travel Query
Finding the right agents for a complex task
Query
"Plan my trip to Rome with flights and weather"
↓
Capability Search
search({ capabilities: ['travel-planning', 'booking'], domains: ['weather'], mode: 'any' })
↓
Matched Agents
travel-integrator
mobility-agent
weather-agent
Health Monitoring
Three-tier health checking system validates Lambda functions,
HTTP endpoints, and container services.
1
Lambda Function Health
Checks function state via AWS SDK
State = 'Active', LastUpdateStatus = 'Successful'
2
HTTP Endpoint Health
GET request to /health endpoint
Status 200, timeout 10s
3
Container (ECS) Health
DescribeServicesCommand validation
running > 0, pending = 0, status = 'ACTIVE'
Healthy
2+ consecutive successes
Degraded
Partial failures detected
Unhealthy
3+ consecutive failures
Health Status Tracking
Per-agent health history
interface HealthStatus { agentId: string; consecutiveFailures: number; consecutiveSuccesses: number; lastCheck: HealthCheckResult; history: HealthCheckResult[]; // Last 10 checks } interface HealthCheckResult { agentId: string; healthy: boolean; timestamp: number; responseTime?: number; error?: string; details?: Record<string, any>; }
Heartbeat Publishing
Agent liveness signals to DynamoDB
Agents publish heartbeats to the registry table, enabling distributed health tracking
and CloudWatch metrics emission.
interface HeartbeatPayload { agentId: string; agentType: string; status: 'healthy' | 'unhealthy' | 'degraded'; health?: { status: string; lastCheck?: string; checks?: Array<{ name, status, message }>; }; metrics?: Record<string, any>; ttlSeconds?: number; }
Agent Discovery
The DynamicAgentLoader pulls configurations from 4 sources,
with later sources overriding earlier ones.
S3 Manifests
Priority 1
JSON manifest files stored in S3 bucket. Primary source for production agent configurations.
s3://agent-manifests/manifests/{agentId}/manifest.json
Filesystem
Priority 2
Agent implementations directory with pre-registered agent types for local development.
agents/implementations/{agentType}/
DynamoDB
Priority 3
Registry table scan for enabled agents. Includes heartbeat and runtime status.
ai-pa-agent-registry-{env}
External API
Priority 4
Remote registry API endpoint for cross-region or external agent discovery.
$AGENT_REGISTRY_API_ENDPOINT
🔄
Merge Strategy
Configurations are merged in priority order — API overrides DynamoDB,
DynamoDB overrides Filesystem, Filesystem overrides S3. This allows runtime
overrides without changing base manifests.
EventBridge Integration
Registry lifecycle events
AgentRegistered
New agent joined the registry
AgentUpdated
Agent config or status changed
AgentUnregistered
Agent removed from registry
AgentStatusUpdated
Health status transition
AgentMetricsUpdated
Performance metrics published
Hot Reload
The HotReloadManager polls S3 every 30 seconds for manifest changes,
enabling real-time configuration updates without restart.
1
List Manifests
List all manifest files in S3 bucket, capturing ETags for change detection
2
Compare Snapshots
Compare current ETags against previous poll to detect added, updated, or removed files
3
Fetch Changes
Download content for new or updated manifests from S3
4
Emit Events
Publish 'added', 'updated', or 'removed' events for each detected change
5
Update Registry
Registry listeners react to events, updating in-memory state and capability indices
Change Events
EventEmitter interface
// Subscribe to hot reload events hotReloadManager.on('added', (change) => { console.log(`New agent: ${change.agentId}`); }); hotReloadManager.on('updated', (change) => { console.log(`Updated: ${change.agentId}`); console.log(`Previous: ${change.previousVersion}`); console.log(`New: ${change.newVersion}`); }); hotReloadManager.on('removed', (change) => { console.log(`Removed: ${change.agentId}`); }); interface HotReloadChange { type: 'added' | 'updated' | 'removed'; agentId: string; manifest?: AgentManifestContent; previousVersion?: string; newVersion?: string; timestamp: Date; }
30s
Poll Interval
ETag
Change Detection
0
Downtime
🚀
Zero-Downtime Updates
Hot reload enables configuration changes without restarting the platform.
Update an agent's capabilities, add new agents, or remove deprecated ones
— all by modifying S3 manifests. Changes propagate within 30 seconds.
Status Tracking
Monitor hot reload health
getStatus(): {
enabled: boolean;
isPolling: boolean;
lastPollAt?: Date;
lastChangeAt?: Date;
totalAgents: number;
pollCount: number;
errorCount: number;
}