# MCP Bridge Chrome Extension

Persistent WebSocket bridge between Chrome and the MCP server. Tracks page visits, executes server-driven DOM commands, and streams extraction results back.

## Files

```
mcp-extension/
├── manifest.json      Manifest V3 config, permissions, entry points
├── background.js      Service worker — tab tracking, message routing, offscreen lifecycle
├── offscreen.html     Minimal HTML shell for the offscreen document
├── offscreen.js       Persistent WebSocket connection to wss://workingagents.ai/ws
├── content.js         DOM command executor, injected into every page
└── icons/             Toolbar icons (16/48/128px)
```

## Architecture

```
Server (WsHandler)
    ↕ WebSocket (wss://)
offscreen.js          Holds the WebSocket, relays JSON both directions
    ↕ chrome.runtime.sendMessage
background.js         Routes messages, tracks tabs, manages offscreen lifecycle
    ↕ chrome.tabs.sendMessage(tabId)
content.js            Executes DOM commands on the actual page
```

Messages flow through Chrome's internal messaging API. The server never talks to content scripts directly — everything is relayed through background.js.

## Permissions (manifest.json)

| Permission | Why |
|---|---|
| `offscreen` | Create hidden document to hold the WebSocket (service workers can't) |
| `tabs` | Monitor tab navigation and closures |
| `cookies` | Read the `user` auth cookie from workingagents.ai |
| `activeTab` + `scripting` | Interact with the active tab on toolbar icon click |
| `<all_urls>` host permission | Inject content.js into every page |

## offscreen.js — WebSocket Keeper

Chrome Manifest V3 kills service workers after ~30s of inactivity, which kills WebSocket connections with them. The offscreen document is an invisible HTML page Chrome keeps alive, providing a persistent home for the WebSocket.

**Responsibilities:**
- Connects to `wss://workingagents.ai/ws?token=<auth_token>`
- Receives JSON from server → forwards to background.js with `source: "offscreen"`
- Receives JSON from background.js (tagged `target: "offscreen"`) → sends to server via `ws.send`
- Reconnects with exponential backoff: 1s → 2s → 4s → ... → 30s max

**Auth flow:** background.js reads the `user` cookie → sends it to offscreen.js → offscreen.js appends it as a query parameter on the WebSocket URL.

## background.js — The Router

Service worker with three jobs:

### 1. Offscreen Lifecycle
Creates the offscreen document on install/startup. If it disappears (killed by Chrome, extension updated), recreates it on the next outbound message. Sends auth token after a 500ms delay for listener initialization.

### 2. Page Tracking
Maintains a `tabUrls` Map of `tabId → URL`.

- **Tab navigation:** If the tab had a different URL before, sends `page_departure` for the old URL, then `page_visit` for the new one (with URL, title, and tab_id)
- **Tab close:** Sends `page_departure`, cleans up the map
- **Chrome internal pages:** Filtered out (`chrome://`, `chrome-extension://`)

The `tab_id` is included in every event so the server can route DOM commands back to the correct tab.

### 3. Message Routing

| Direction | Source | Routing |
|---|---|---|
| Server → browser | offscreen.js (`source: "offscreen"`) | `dom_command` → `chrome.tabs.sendMessage(tab_id)` to content script. Other types (`welcome`, `rpc`, `notification`, `summary_ack`) logged. |
| Browser → server | content.js (`sender.tab`) | `dom_result` and `dom_mutation` → forwarded to offscreen.js for relay |

If a `dom_command` can't reach its tab (closed, navigated away), sends an error `dom_result` back so the server doesn't hang waiting.

### 4. Summarize Page
Toolbar icon click sends `summarize_page` with URL and title to the server, which triggers `Summary.request_summary`.

## content.js — DOM Executor

Injected into every page at `document_idle`. Completely passive — does nothing until it receives a `dom_command` from background.js.

### Supported Commands

| Command | Description |
|---|---|
| `dom_query` | `querySelector` — returns one element's text, HTML, or attribute |
| `dom_query_all` | `querySelectorAll` — returns array of element data |
| `dom_click` | Clicks an element matching a CSS selector |
| `dom_wait` | MutationObserver waits for a selector to appear, with configurable timeout |
| `dom_watch` | Persistent MutationObserver, streams `dom_mutation` events back to server |
| `dom_unwatch` | Disconnects a watcher by selector |
| `dom_extract` | Multi-field structured extraction (the main workhorse) |

### Extraction Types

| Type | Returns |
|---|---|
| `text` | `el.textContent.trim()` |
| `html` | `el.innerHTML` |
| `attribute` | `el.getAttribute(name)` |
| `table` | `{headers: [...], rows: [[...]]}` from `<table>` |
| `list` | Array of text content from `querySelectorAll` |

### Message Handling
Content.js does NOT use `sendResponse`. Results are sent as separate `chrome.runtime.sendMessage` calls. This prevents Chrome from keeping the message channel open — if the async DOM operation takes too long, Chrome would auto-close it and reject the `tabs.sendMessage` promise in background.js, causing a duplicate error response.

Watchers are cleaned up on `beforeunload`. Page departure detection is handled by background.js (more reliable than content script unload events).

## Server-Side Integration

### Modules

| Module | File | Role |
|---|---|---|
| `WsHandler` | `lib/ws_handler.ex` | WebSocket protocol handler, message routing, rate limiting (30 msg/s) |
| `WsPage` | `lib/ws_page.ex` | Per-user session state, page visit history |
| `PageScraperServer` | `lib/page_scraper_server.ex` | Per-{user, pattern} GenServer, orchestrates DOM extraction lifecycle |
| `PageScraper` | `lib/page_scraper.ex` | URL pattern registry and generic extraction storage |
| `PageScraper.Handler` | `lib/page_scraper/handler.ex` | Behaviour defining handler callbacks |

### WebSocket Endpoint

```
GET /ws → WebSockAdapter.upgrade → WsHandler.init(%{user: user})
```

Registers `{:handler, username}` and `{:page, username}` in `WsRegistry`.

### Message Types (Server Receives)

| Type | From | Action |
|---|---|---|
| `page_visit` | Extension | Match URL against patterns → start PageScraperServers |
| `page_departure` | Extension | Notify all scrapers for departing URL |
| `dom_result` | Extension | Route to PageScraperServer by `pattern_id` |
| `dom_mutation` | Extension | Route MutationObserver events to scraper |
| `extension_connected` | Extension | Log connection |
| `summarize_page` | Extension | Trigger `Summary.request_summary` |

### PageScraperServer Lifecycle

1. `WsHandler` receives `page_visit` → `PageScraper.match_url(url)` finds matching patterns
2. `PageScraperServer.get_or_start` starts a GenServer per {username, pattern_id}
3. Calls `handler.on_match(url, captures)` → returns extraction plan
4. If plan has `:wait`, sends `dom_wait` command (waits for selector to appear)
5. Sends all `:extractions` specs concurrently as separate `dom_extract` commands
6. Each `dom_result` arrives via cast, routed to `handler.on_extraction(step, data, state)`
7. Handler returns action: `:store`, `:next_step`, `:noop`, or `:stop`
8. `:store` calls `handler.store_result(url, captures, data)` to persist

**Timers:**
- Idle timeout: 30 minutes
- Departure grace period: 5 seconds
- Request timeout: 15 seconds

### Extraction Plan Structure

```elixir
%{
  wait: %{selector: "...", timeout: 10_000},
  extractions: [%{name: "title", selector: ".title", type: :text}, ...],
  watches: [%{selector: "...", attributes: true, children: true, subtree: true}],
  poll: %{interval: 30_000, extractions: [...]},
  sequence: [{:click, selector}, {:wait, sel, timeout}, {:extract, name, spec}, {:delay, ms}],
  handler_state: %{}
}
```

### Handler Behaviour Callbacks

| Callback | Purpose |
|---|---|
| `on_match(url, captures)` | Define extraction plan for matched URL |
| `on_extraction(step, data, state)` | Process results, decide next action |
| `on_departure(state)` | Cleanup on page leave (optional) |
| `on_mutation(selector, data, state)` | Handle watched DOM changes (optional) |
| `setup_database()` | Create handler's table (optional) |
| `url_exists?(url)` | Deduplication check (optional) |
| `store_result(url, captures, data)` | Persist to handler's table (optional) |

### Example Handlers

**GithubPR** (`lib/page_scraper/handlers/github_pr.ex`):
- Pattern: `github.com/:owner/:repo/pull/:number`
- Wait: `.markdown-title` (10s)
- Extracts: title, state, author, body, files count, commits count
- Stores in `github_prs` table

**CrunchbaseFunding** (`lib/page_scraper/handlers/crunchbase_funding.ex`):
- Pattern: `crunchbase.com/organization/:slug`
- Wait: `#company_funding table` (15s)
- Extracts: 13 fields across funding, investors, company info
- Stores in `crunchbase_companies` table

## DOM Command Round-Trip Example

Server wants to extract a PR title from a GitHub tab:

```
1. Server sends dom_command {tab_id: 42, command: "dom_extract",
   payload: {extractions: [{name: "title", selector: ".markdown-title", type: "text"}]},
   pattern_id: "github_pr", request_id: "abc"}
       ↓ WebSocket
2. offscreen.js receives JSON → chrome.runtime.sendMessage({source: "offscreen", data: ...})
       ↓
3. background.js sees dom_command → chrome.tabs.sendMessage(42, ...)
       ↓
4. content.js runs document.querySelector(".markdown-title").textContent.trim()
   → chrome.runtime.sendMessage({type: "dom_result", request_id: "abc", result: {title: "Fix bug"}})
       ↓
5. background.js sees dom_result from tab → sendToOffscreen(message)
       ↓
6. offscreen.js sends over WebSocket → server
```

## Why This Architecture

**Why offscreen document for WebSocket?** Chrome kills service workers after ~30s of inactivity. The offscreen document is the only Manifest V3 mechanism for persistent connections.

**Why not WebSocket in content scripts?** Content scripts run in the page's origin. A WebSocket from `github.com` to `workingagents.ai` would hit CORS. The offscreen document runs in the extension's origin.

**Why route through background.js?** Only the service worker has `chrome.tabs.sendMessage(tabId)`. The offscreen document can only broadcast via `chrome.runtime.sendMessage`.

**Why fire-and-forget in content.js?** Using `sendResponse` keeps the message channel open. If the async DOM operation takes too long, Chrome auto-closes it and rejects the promise in background.js, causing a duplicate error result. Separate messages avoid this race.