How Claude in Chrome Works

Claude in Chrome has two working modes. One is an independent AI conversation mode where users can directly issue tasks in the extension. The other is a collaborative mode with Claude Code, where users can use the browser capabilities exposed by the extension in Claude Code to complete additional tasks using their browser.

Let's start by introducing the principles of the first mode.

Independent Conversation Mode

Reverse engineering the extension's working method is straightforward. Open the Claude Chrome extension, right-click to inspect and monitor the extension's network requests, as shown in the figure.

capture

We discovered that during task execution, it continuously requests Claude's /v1/messages endpoint, which is Claude's official LLM API (similar to ChatGPT's /chat/completions endpoint). From the request body, we can extract the system prompt and tools definitions. I've open-sourced and organized this at https://github.com/AIPexStudio/system-prompts-of-claude-chrome

The specific system prompt is quite extensive (40+ kb), so I won't go into detail here. It mainly includes:

Security Protection

Prevent prompt injection attacks
Isolate instructions from web page content, requiring explicit user confirmation
Prevent social engineering attacks (impersonating administrators, urgent requests, etc.)
Protect user privacy and data security

Behavioral Guidelines

Refuse to process harmful content (violence, pornography, malicious code, etc.)
Conversation style and format requirements
User health care (mental health support)

Operation Classification

Prohibited operations: Handling banking information, downloading untrusted files, permanent deletion, modifying security permissions, etc.
Operations requiring explicit permission: Downloading files, purchasing, entering financial data, accepting agreements, etc.
Regular operations: Operations that can be automatically executed

Copyright Protection

Do not copy copyrighted content
Citation limits (maximum 15 words per citation)
Do not provide copyrighted materials such as song lyrics

Tool Usage Requirements

Use read_page to get page element references
Prefer DOM element references over coordinates
Efficiently read long page content

Runtime Environment Related

System information
Keyboard shortcuts

Multi-Tab Management

How to get tab information (tabs_context)
How to work across multiple tabs
Create and manage new tabs
Tab state management

Response Flow Control

Call turn_answer_start before outputting responses
Ensure correct response flow

The tools include the following, categorized by function:

Tool Name	Description
`read_page`	Get the Accessibility Tree representation of page elements
`find`	Find page elements using natural language queries
`get_page_text`	Extract raw text content from the page
`navigate`	Navigate to a specified URL or browser history

2. Interaction & Automation

Tool Name	Description
`computer`	Mouse and keyboard interaction, as well as screenshot functionality
`form_input`	Set values in form elements
`javascript_tool`	Execute JavaScript code in the page context

3. Tab Management

Tool Name	Description
`tabs_create`	Create new browser tabs
`tabs_context`	Get tab context information

4. Media & Files

Tool Name	Description
`upload_image`	Upload images to file input fields or drop targets
`gif_creator`	Record and export browser operations as animated GIFs

5. Debugging & Monitoring

Tool Name	Description
`read_console_messages`	Read browser console messages
`read_network_requests`	Read HTTP network request information

6. Utilities

Tool Name	Description
`resize_window`	Resize browser window dimensions

7. Custom Tools

Tool Name	Description
`update_plan`	Update and display plans to users for approval
`turn_answer_start`	Mark the start of a response round

The Claude Chrome extension combines these tools responsible for page understanding, clicking, form filling, screenshots, and debugging to complete complex tasks. Among them, update_plan and turn_answer_start are custom tools that have unique UI displays in the extension. update_plan requests users to "Approve Plan" or "Make Changes", and turn_answer_start requests users to confirm task start. The figure shows the UI display of update_plan.

update_plan

Collaborative Mode with Claude Code

Claude Chrome requests the nativeMessaging permission, which allows the extension to communicate bidirectionally with local applications. Through this permission, Claude Chrome can establish connections with Claude Code (or other applications supporting Native Messaging) installed on the user's computer.

How Native Messaging Works

Permission Request: The extension declares the nativeMessaging permission in manifest.json
Local Application Registration: Claude Code registers as a Native Messaging Host in the system
Establish Connection: The extension communicates with the local application through standard input/output (stdin/stdout)
Message Passing: Uses JSON-formatted messages for bidirectional data exchange

Collaboration Flow with Claude Code

┌─────────────────┐         ┌──────────────────┐         ┌──────────────┐
│  Claude Code    │ ◄─────► │  Native Messaging │ ◄─────► │ Claude Chrome│
│  (Local App)    │   JSON   │  Host (Bridge)   │   JSON   │  (Extension) │
└─────────────────┘         └──────────────────┘         └──────────────┘

nativeMessaging

Workflow:

Claude Code Initiates Request: User requests browser operations in Claude Code
Message Passing: Claude Code sends requests to Claude Chrome extension via Native Messaging
Extension Execution: Claude Chrome extension executes corresponding operations in the browser (such as clicking, filling forms, etc.)
Result Return: Operation results are returned to Claude Code via Native Messaging
Display Results: Claude Code displays results to the user

At this point, the Claude Chrome extension is no longer an Agent, but rather acts as an MCP server exposing browser tools to Claude Code, which then uses them.

tools

Context Efficiency

I tried running the official example with Claude in Chrome:

Navigate to Zillow and find me a 2-bedroom apartment in San Francisco under $4000/month. Filter for places available within the next 30 days and show me the top 3 options with photos.

I found it took 10 minutes to complete execution, so I investigated where the problem occurred. I discovered that the Claude Chrome extension primarily uses two modes to understand pages: one uses the page's accessibility tree, and when the accessibility tree cannot achieve the goal, it uses screenshots.

Many people may not be familiar with the accessibility tree, so let me briefly introduce it.

The Accessibility Tree is a page structure representation built by browsers to support assistive technologies (such as screen readers). It contains semantic information about all interactive elements on the page, such as:

Element Roles: Buttons, links, input fields, headings, etc.
Element Attributes: Names, descriptions, states (whether disabled, selected, etc.)
Hierarchical Relationships: Parent-child relationships and order between elements
Text Content: Readable text of elements

Compared to directly parsing the DOM tree, the accessibility tree provides a more concise and semantic page representation. For AI, the accessibility tree is easier to understand than raw HTML because it filters out a large amount of style and layout-related information, retaining only functional semantic information.

However, the accessibility tree has limitations. It depends on whether the website's accessibility tree is well-constructed. If a website's accessibility tree is not well-constructed, it may lead to inaccurate accessibility tree representations, thus affecting AI understanding.

Here I discovered a problem with Claude Chrome: for multiple snapshots of the same page, they are all saved in the context, but actually, past snapshots are meaningless and can be discarded. Additionally, Claude in Chrome puts the entire page's accessibility tree into the context, causing the context to explode rapidly when working on long pages.

Another problem is that when the accessibility tree fails, Claude in Chrome uses screenshots. However, how large is one screenshot? 100kb-500kb, varying. In my task, the screenshot was 335kb. Such screenshot information will persist in subsequent contexts, causing context explosion which leads to slow, ineffective, and expensive tasks.

Actually, there's no need to put screenshot information into the context as-is. To understand pages, we can first reduce the resolution, process it, and then put it into the context. AIPex's experience shows it can compress 10-20 times.

Of course, Claude was the first to propose context engineering. I believe the extension's context optimization algorithms will be launched soon. Stay tuned.

User Experience

Claude in Chrome's UI is very comfortable. Users can record workflows and edit them, then directly invoke them with commands next time. One selling point of Claude in Chrome is that it claims to support background operation, but actually, all operations of Claude in Chrome depend on the debugger permission. When the debugger permission is being used, the browser will definitely acquire user focus and force display at the forefront, making "background operation" a fallacy.

There's a workaround of not using the debugger, but that would require abandoning the accessibility tree and using other methods to complete operations like clicking and input.

I originally used and reverse-engineered the Claude in Chrome extension with a learning mindset, but after running the first use case, I understood that this extension currently has significant context efficiency problems, which would lead to slow and inefficient tasks.

AIPex has already navigated these pitfalls and done a lot of work on context engineering. For the same task, AIPex can be at least 2x faster than Claude in Chrome, and the more complex the task, the more obvious the difference. AIPex's core context practices have been organized at https://www.claudechrome.com/zh/blog/ai-browser-automation-challenges.

AIPex also supports BYOK (Bring Your Own Key), so you can try it for free with your own LLM API key.

References

https://x.com/pk_iv/status/2005694082627297735

How Claude for Chrome Works

How Claude in Chrome Works

Independent Conversation Mode

1. Page Reading & Navigation

2. Interaction & Automation

3. Tab Management

4. Media & Files

5. Debugging & Monitoring

6. Utilities

7. Custom Tools

Collaborative Mode with Claude Code

How Native Messaging Works

Collaboration Flow with Claude Code

Context Efficiency

User Experience

References

Категории

Больше статей

Why AIPex is the Game Changer for AI Browser Automation

Core Challenges in AI Browser Automation and How AIPex Solves Them

Aipex Performance Optimization: Making AI Smarter at Understanding Web Pages

Рассылка

Explore More