Browser Agent v0.1

Fetch public web content to help answer questions and complete tasks

💬
User asks question
🎯
Supervisor routes
🌐
Browser Agent
📄
Fetch page
🔍
Check auth
Extract content

What It Does (v0.1)

Public web content fetching for consumer tasks

  • Fetch and read public web pages
  • Extract relevant content (menus, hours, prices)
  • Detect when login is required and bail gracefully
  • Return structured data to other agents
  • Support common consumer scenarios

How It's Used

Other agents consult Browser Agent when they need web info

// Travel Agent needs restaurant info
await this.consultExpert({
domain: 'browser',
request: 'FETCH_PAGE',
url: 'yelp.com/...'
});

Consumer Scenarios

Real-world use cases that work with public web pages

🍽️
Restaurant Information
Get menus, hours, location, and reviews from restaurant websites and Yelp/Google listings
"What's on the menu at Pizzeria Delfina?"
PUBLIC - WORKS
✈️
Flight Status
Check real-time flight status, delays, and gate info from airline public pages
"Is my United flight UA 123 on time?"
PUBLIC - WORKS
🛒
Product Research
Look up prices, specs, and reviews from product pages
"What are the reviews for the Sony WH-1000XM5?"
PUBLIC - WORKS
🎬
Movie & Event Info
Find showtimes, venue details, and event schedules
"What time is Dune showing at AMC tonight?"
PUBLIC - WORKS
🍳
Recipe Lookup
Get ingredients and instructions from recipe websites
"Find me a good carbonara recipe"
PUBLIC - WORKS
📰
News & Articles
Read and summarize news articles and blog posts
"What does that TechCrunch article say about the new iPhone?"
PUBLIC - WORKS
🏨
Hotel & Venue Details
Get amenities, location info, and public reviews
"Does the Marriott downtown have a pool?"
PUBLIC - WORKS
🗓️
Business Hours
Check opening hours for stores, restaurants, services
"What time does the Apple Store close today?"
PUBLIC - WORKS

Scenario: Restaurant Research

Helping plan dinner by fetching menu and details

Fetching Info

U
Hey Umber
Online
We're thinking about Flour + Water for dinner Saturday. What's their menu like and do they have outdoor seating?
Consulting Browser Agent
Checking flourandwater.com...

Response with Web Data

U
Hey Umber
Online
We're thinking about Flour + Water for dinner Saturday. What's their menu like?
Flour + Water is known for handmade pasta. Here's what I found:
F
flourandwater.com/menu FETCHED
Current Menu Highlights
Pasta: Tagliatelle Bolognese, Cacio e Pepe, Squid Ink Chitarra

Mains: Wood-fired whole fish, Lamb shoulder

Hours: Dinner 5:30-10pm
They do have a small patio. Want me to help with a reservation?

Scenario: Flight Status Check

Real-time flight information from public airline pages

U
Hey Umber
Online
Is my flight to Chicago delayed? It's AA 1247 departing at 3pm.
Browser Agent checking AA flight status
Good news - your flight is on time.
AA
aa.com/flight-status/AA1247 FETCHED
AA 1247 - SFO → ORD
Status: On Time Gate: B42
Departs: 3:00 PM Arrives: 9:15 PM
Boarding should start around 2:30. You're all set.

Scenario: Product Research

Getting specs and reviews for purchase decisions

U
Hey Umber
Online
I'm looking at the Dyson V15 vacuum. What do people say about it?
Browser Agent checking reviews
The Dyson V15 Detect is highly rated. Here's what I found:
A
amazon.com/.../reviews FETCHED
Dyson V15 Detect - Reviews
Rating: 4.6/5 (2,847 reviews)
Pros: Laser reveals hidden dust, great suction, LCD shows particles

Cons: Heavy, expensive, 60min battery
Main trade-off is price ($750) vs performance. The laser dust detection seems to be the standout feature people love.

Auth Wall Detection

When a page requires login, Browser Agent detects it and bails gracefully

Detection Signals

How Browser Agent knows a page needs login

  • URL redirected to /login, /signin, /auth
  • Page contains login form elements
  • Page title indicates "Sign In" or "Log In"
  • HTTP 401/403 response codes
  • Known auth wall patterns (OpenTable, airline accounts)

Graceful Handling

Instead of failing, we explain and offer alternatives

🔒 Login Required
This page requires you to be logged in. I can't access your account on your behalf, but I can help you find public information about this instead.
U
Hey Umber
Online
Can you check my OpenTable reservation for Saturday?
Browser Agent checking OpenTable
🔒 Login Required
OpenTable reservations require you to be logged into your account. I can't access that on your behalf.
Here's what I can do instead:
  • Check the restaurant's public page for hours/menu
  • Look up the restaurant phone number to call
  • Check your email for the confirmation

Scenarios That Hit Auth Walls

🍽️
Reservation Details
OpenTable, Resy - viewing your booked reservations
AUTH REQUIRED - BAILS
✈️
Airline Account
Booking details, seat selection, frequent flyer miles
AUTH REQUIRED - BAILS
🏦
Banking Sites
Account balances, transactions, statements
AUTH REQUIRED - BAILS
📦
Order History
Amazon orders, shopping account details
AUTH REQUIRED - BAILS

Future: Authenticated Access

What could be possible with user-authorized account access

FUTURE IMPLEMENTATION Not in v0.1 Scope
The following scenarios would require secure credential management and explicit user authorization. This is not part of the current project but shows what authenticated browser access could enable.
🍽️
Make Reservations
Actually book a table at OpenTable/Resy using saved credentials
"Book a table for 4 at Flour + Water Saturday at 7pm"
✈️
Manage Bookings
Change seats, upgrade, check in for flights
"Change my seat to a window on tomorrow's flight"
📦
Track Orders
Check delivery status on Amazon, track packages
"Where's my Amazon order? It was supposed to arrive today."
💳
Subscription Management
Check or cancel subscriptions, update payment methods
"Cancel my Netflix subscription"

Requirements for Future Auth Support

  • Secure credential storage (OAuth tokens, encrypted passwords)
  • Explicit user authorization flow per service
  • Session management and token refresh
  • Audit logging for all authenticated actions
  • Revocation capability for user control

Technical Implementation

Architecture and integration details

Architecture

Browser Agent uses Playwright MCP for browser control

// Agent consultation flow
TravelAgent
↓ consultExpert('browser', url)
BrowserAgent
↓ mcpClient.invoke('browser_navigate')
Playwright MCP Server
↓ browser_snapshot()
Page Content (accessibility tree)
↓ check auth signals
Return structured content

Key Capabilities

MCP tools available to Browser Agent

browser_navigate Go to URL
browser_snapshot Get page content
browser_click Click elements
browser_wait_for Wait for content
Integration
Playwright MCP
@playwright/mcp-server
Transport
stdio
Local process communication
Browser
Chromium
Headless mode
Timeout
30 seconds
Max page load time
Pattern
Ask the Expert
Agent-to-agent consultation
Auth Detection
URL + DOM
Multi-signal approach

Key Files

Implementation locations in the codebase

Agent: agents/implementations/browser-agent/src/BrowserAgent.ts
MCP Config: agents/implementations/browser-agent/src/browser-connector.yaml
Auth Detection: agents/implementations/browser-agent/src/services/AuthDetector.ts
Content Extractor: agents/implementations/browser-agent/src/services/ContentExtractor.ts
← Back to Demo Hub