MCP Bridge: A Chrome Extension That Lets Your Server See What You're Browsing

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — February 27, 2026, 15:45

What It Does

MCP Bridge is a Chrome extension that opens a persistent WebSocket connection between your browser and an MCP server. It does three things:

Page tracking — tells the server every URL you visit and leave, in real time
DOM commands — lets the server query, click, watch, and extract data from any page you have open
Summarize page — click the toolbar icon to request a server-side summary of the current page

The server can see your open tabs, read structured data from the DOM, click buttons, wait for elements to render, and set up live watchers that stream mutations back. All without you touching the browser console.

Architecture

Four files. Four roles. One message bus.

Server (Elixir/WsHandler)
    ↕ WebSocket (wss://)
offscreen.js          — holds the WebSocket, relays JSON both directions
    ↕ chrome.runtime.sendMessage
background.js         — routes messages, tracks tabs, manages lifecycle
    ↕ chrome.tabs.sendMessage
content.js            — executes DOM commands on the actual page

Messages flow through Chrome’s internal messaging API. The server never talks to content scripts directly — everything is relayed through background.js.

File by File

manifest.json

The extension’s identity card. Manifest V3 format.

Permissions:

offscreen — create a hidden document that holds the WebSocket (service workers can’t)
tabs — monitor tab navigation and closures
cookies — read the user authentication cookie from workingagents.ai
activeTab + scripting — interact with the current tab on icon click
<all_urls> host permission — inject content.js into every page

Entry points:

background.js as the service worker
content.js injected into all URLs at document_idle

offscreen.js — The WebSocket Keeper

Chrome’s Manifest V3 killed persistent background pages. Service workers get terminated after 30 seconds of inactivity. WebSockets die with them.

The workaround: an offscreen document. It’s an invisible HTML page (offscreen.html) that Chrome keeps alive as long as the extension needs it. This is where the WebSocket lives.

What it does:

Connects to wss://workingagents.ai/ws with an auth token from the background worker
Receives JSON from the server, forwards it to background.js via chrome.runtime.sendMessage with source: "offscreen"
Receives JSON from background.js (tagged target: "offscreen"), sends it to the server via ws.send
Reconnects with exponential backoff (1s → 2s → 4s → … → 30s max) on disconnect

Auth flow: background.js reads the user cookie from workingagents.ai, sends it to offscreen.js, which appends it as a query parameter on the WebSocket URL. No token, no connection.

background.js — The Router

The service worker. Three jobs:

1. Offscreen lifecycle

Creates the offscreen document on install and startup. If it disappears (Chrome killed it, extension updated), recreates it on the next outbound message. Sends the auth token after a 500ms delay to let the offscreen listener initialize.

2. Page tracking

Maintains a tabUrls Map of tabId → URL. On every tab navigation:

If the tab had a different URL before, sends page_departure for the old URL
Sends page_visit with the new URL, title, and tab ID
On tab close, sends page_departure and cleans up the map

The tab_id is included in every event so the server can route DOM commands back to the correct tab.

3. Message routing

Two directions:

Server → browser: Messages from offscreen.js arrive with source: "offscreen". Background.js dispatches by type — dom_command gets forwarded to the target tab’s content script via chrome.tabs.sendMessage(tab_id, ...). Other types (welcome, rpc, notification, summary_ack) are logged.
Browser → server: Messages from content scripts arrive with sender.tab. Types dom_result and dom_mutation are forwarded to offscreen.js for relay to the server.

If a dom_command can’t reach its tab (closed, navigated away), background.js sends an error dom_result back so the server doesn’t hang waiting.

content.js — The DOM Executor

Injected into every page. Does nothing until it receives a dom_command message from background.js. Completely passive — no polling, no scanning, no side effects on load.

Supported commands:

Command	What it does
`dom_query`	`querySelector` — returns one element’s text, HTML, or attribute
`dom_query_all`	`querySelectorAll` — returns array of element data
`dom_click`	Clicks an element matching a selector
`dom_wait`	Waits (via MutationObserver) for a selector to appear, with timeout
`dom_watch`	Sets up a persistent MutationObserver, streams changes back as `dom_mutation` events
`dom_unwatch`	Disconnects a watcher
`dom_extract`	Multi-field structured extraction — the main workhorse

Extraction types:

text — el.textContent.trim()
html — el.innerHTML
attribute — el.getAttribute(name)
table — parses <table> into {headers: [...], rows: [[...]]}
list — querySelectorAll → array of text content

Message handling detail: Content.js intentionally does NOT return true from the onMessage listener and does NOT use sendResponse. Instead it sends results via chrome.runtime.sendMessage as a separate message. This prevents Chrome from keeping the message channel open, which would cause the tabs.sendMessage promise in background.js to reject and trigger a duplicate error response.

Watchers are cleaned up on beforeunload to prevent MutationObserver leaks. Page departure detection is handled by background.js (more reliable than content script unload events).

offscreen.html

Three lines of HTML. Loads offscreen.js. That’s it. Chrome requires an actual HTML document for offscreen — you can’t just run a script.

How a DOM Command Round-Trips

Say the server wants to extract a PR title from a GitHub tab:

Server sends {type: "dom_command", tab_id: 42, command: "dom_extract", payload: {extractions: [{name: "title", selector: ".js-issue-title", type: "text"}]}, request_id: "abc"} via WebSocket
offscreen.js receives the JSON, forwards it to background.js via chrome.runtime.sendMessage({source: "offscreen", data: ...})
background.js sees type: "dom_command", calls chrome.tabs.sendMessage(42, ...) to route it to tab 42’s content script
content.js in tab 42 receives the message, runs document.querySelector(".js-issue-title").textContent.trim(), sends {type: "dom_result", request_id: "abc", result: {title: "Fix login bug"}} via chrome.runtime.sendMessage
background.js sees type: "dom_result" from a tab, forwards to offscreen via sendToOffscreen
offscreen.js sends it over the WebSocket to the server

Total hops: 6. Total latency: negligible (all in-process Chrome IPC except the WebSocket leg).

Why This Architecture

Why not WebSocket in the service worker? Chrome kills service workers after ~30 seconds of inactivity. WebSocket connections die with them. The offscreen document is the only way to keep a persistent connection in Manifest V3.

Why not inject the WebSocket into content scripts? Content scripts run in the page’s origin. A WebSocket from github.com to workingagents.ai would hit CORS issues. The offscreen document runs in the extension’s origin, avoiding this entirely.

Why route through background.js instead of offscreen → content directly? Only background.js has access to chrome.tabs.sendMessage(tabId, ...). The offscreen document can’t target specific tabs — it can only broadcast via chrome.runtime.sendMessage, which background.js receives.

Why fire-and-forget in content.js? Using sendResponse (the callback-based reply) keeps the message channel open. If the async DOM operation takes too long, Chrome auto-closes it and rejects the promise in background.js, which then sends a spurious error result. Sending results as separate messages avoids this race entirely.