How Claude for Chrome Works
2025/12/31

How Claude for Chrome Works

Explore how Claude for Chrome integrates AI conversation capabilities into your browser. Learn about its architecture, content analysis features, and how it differs from browser automation tools.

How Claude in Chrome Works

Claude in Chrome has two working modes. One is an independent AI conversation mode where users can directly issue tasks in the extension. The other is a collaborative mode with Claude Code, where users can use the browser capabilities exposed by the extension in Claude Code to complete additional tasks using their browser.

Let's start by introducing the principles of the first mode.

Independent Conversation Mode

Reverse engineering the extension's working method is straightforward. Open the Claude Chrome extension, right-click to inspect and monitor the extension's network requests, as shown in the figure.

capture

We discovered that during task execution, it continuously requests Claude's /v1/messages endpoint, which is Claude's official LLM API (similar to ChatGPT's /chat/completions endpoint). From the request body, we can extract the system prompt and tools definitions. I've open-sourced and organized this at https://github.com/AIPexStudio/system-prompts-of-claude-chrome

The specific system prompt is quite extensive (40+ kb), so I won't go into detail here. It mainly includes:

  1. Security Protection
  • Prevent prompt injection attacks
  • Isolate instructions from web page content, requiring explicit user confirmation
  • Prevent social engineering attacks (impersonating administrators, urgent requests, etc.)
  • Protect user privacy and data security
  1. Behavioral Guidelines
  • Refuse to process harmful content (violence, pornography, malicious code, etc.)
  • Conversation style and format requirements
  • User health care (mental health support)
  1. Operation Classification
  • Prohibited operations: Handling banking information, downloading untrusted files, permanent deletion, modifying security permissions, etc.
  • Operations requiring explicit permission: Downloading files, purchasing, entering financial data, accepting agreements, etc.
  • Regular operations: Operations that can be automatically executed
  1. Copyright Protection
  • Do not copy copyrighted content
  • Citation limits (maximum 15 words per citation)
  • Do not provide copyrighted materials such as song lyrics
  1. Tool Usage Requirements
  • Use read_page to get page element references
  • Prefer DOM element references over coordinates
  • Efficiently read long page content
  1. Runtime Environment Related
  • System information
  • Keyboard shortcuts
  1. Multi-Tab Management
  • How to get tab information (tabs_context)
  • How to work across multiple tabs
  • Create and manage new tabs
  • Tab state management
  1. Response Flow Control
  • Call turn_answer_start before outputting responses
  • Ensure correct response flow

The tools include the following, categorized by function:

1. Page Reading & Navigation

Tool NameDescription
read_pageGet the Accessibility Tree representation of page elements
findFind page elements using natural language queries
get_page_textExtract raw text content from the page
navigateNavigate to a specified URL or browser history

2. Interaction & Automation

Tool NameDescription
computerMouse and keyboard interaction, as well as screenshot functionality
form_inputSet values in form elements
javascript_toolExecute JavaScript code in the page context

3. Tab Management

Tool NameDescription
tabs_createCreate new browser tabs
tabs_contextGet tab context information

4. Media & Files

Tool NameDescription
upload_imageUpload images to file input fields or drop targets
gif_creatorRecord and export browser operations as animated GIFs

5. Debugging & Monitoring

Tool NameDescription
read_console_messagesRead browser console messages
read_network_requestsRead HTTP network request information

6. Utilities

Tool NameDescription
resize_windowResize browser window dimensions

7. Custom Tools

Tool NameDescription
update_planUpdate and display plans to users for approval
turn_answer_startMark the start of a response round

The Claude Chrome extension combines these tools responsible for page understanding, clicking, form filling, screenshots, and debugging to complete complex tasks. Among them, update_plan and turn_answer_start are custom tools that have unique UI displays in the extension. update_plan requests users to "Approve Plan" or "Make Changes", and turn_answer_start requests users to confirm task start. The figure shows the UI display of update_plan.

update_plan

Collaborative Mode with Claude Code

Claude Chrome requests the nativeMessaging permission, which allows the extension to communicate bidirectionally with local applications. Through this permission, Claude Chrome can establish connections with Claude Code (or other applications supporting Native Messaging) installed on the user's computer.

How Native Messaging Works

  1. Permission Request: The extension declares the nativeMessaging permission in manifest.json
  2. Local Application Registration: Claude Code registers as a Native Messaging Host in the system
  3. Establish Connection: The extension communicates with the local application through standard input/output (stdin/stdout)
  4. Message Passing: Uses JSON-formatted messages for bidirectional data exchange

Collaboration Flow with Claude Code

┌─────────────────┐         ┌──────────────────┐         ┌──────────────┐
│  Claude Code    │ ◄─────► │  Native Messaging │ ◄─────► │ Claude Chrome│
│  (Local App)    │   JSON   │  Host (Bridge)   │   JSON   │  (Extension) │
└─────────────────┘         └──────────────────┘         └──────────────┘

nativeMessaging

Workflow:

  1. Claude Code Initiates Request: User requests browser operations in Claude Code
  2. Message Passing: Claude Code sends requests to Claude Chrome extension via Native Messaging
  3. Extension Execution: Claude Chrome extension executes corresponding operations in the browser (such as clicking, filling forms, etc.)
  4. Result Return: Operation results are returned to Claude Code via Native Messaging
  5. Display Results: Claude Code displays results to the user

At this point, the Claude Chrome extension is no longer an Agent, but rather acts as an MCP server exposing browser tools to Claude Code, which then uses them.

tools

Context Efficiency

I tried running the official example with Claude in Chrome:

Navigate to Zillow and find me a 2-bedroom apartment in San Francisco under $4000/month. Filter for places available within the next 30 days and show me the top 3 options with photos.

I found it took 10 minutes to complete execution, so I investigated where the problem occurred. I discovered that the Claude Chrome extension primarily uses two modes to understand pages: one uses the page's accessibility tree, and when the accessibility tree cannot achieve the goal, it uses screenshots.

Many people may not be familiar with the accessibility tree, so let me briefly introduce it.

The Accessibility Tree is a page structure representation built by browsers to support assistive technologies (such as screen readers). It contains semantic information about all interactive elements on the page, such as:

  • Element Roles: Buttons, links, input fields, headings, etc.
  • Element Attributes: Names, descriptions, states (whether disabled, selected, etc.)
  • Hierarchical Relationships: Parent-child relationships and order between elements
  • Text Content: Readable text of elements

Compared to directly parsing the DOM tree, the accessibility tree provides a more concise and semantic page representation. For AI, the accessibility tree is easier to understand than raw HTML because it filters out a large amount of style and layout-related information, retaining only functional semantic information.

However, the accessibility tree has limitations. It depends on whether the website's accessibility tree is well-constructed. If a website's accessibility tree is not well-constructed, it may lead to inaccurate accessibility tree representations, thus affecting AI understanding.

Here I discovered a problem with Claude Chrome: for multiple snapshots of the same page, they are all saved in the context, but actually, past snapshots are meaningless and can be discarded. Additionally, Claude in Chrome puts the entire page's accessibility tree into the context, causing the context to explode rapidly when working on long pages.

Another problem is that when the accessibility tree fails, Claude in Chrome uses screenshots. However, how large is one screenshot? 100kb-500kb, varying. In my task, the screenshot was 335kb. Such screenshot information will persist in subsequent contexts, causing context explosion which leads to slow, ineffective, and expensive tasks.

Actually, there's no need to put screenshot information into the context as-is. To understand pages, we can first reduce the resolution, process it, and then put it into the context. AIPex's experience shows it can compress 10-20 times.

Of course, Claude was the first to propose context engineering. I believe the extension's context optimization algorithms will be launched soon. Stay tuned.

User Experience

Claude in Chrome's UI is very comfortable. Users can record workflows and edit them, then directly invoke them with commands next time. One selling point of Claude in Chrome is that it claims to support background operation, but actually, all operations of Claude in Chrome depend on the debugger permission. When the debugger permission is being used, the browser will definitely acquire user focus and force display at the forefront, making "background operation" a fallacy.

There's a workaround of not using the debugger, but that would require abandoning the accessibility tree and using other methods to complete operations like clicking and input.

Why I Recommend Trying AIPex?

I originally used and reverse-engineered the Claude in Chrome extension with a learning mindset, but after running the first use case, I understood that this extension currently has significant context efficiency problems, which would lead to slow and inefficient tasks.

AIPex has already navigated these pitfalls and done a lot of work on context engineering. For the same task, AIPex can be at least 2x faster than Claude in Chrome, and the more complex the task, the more obvious the difference. AIPex's core context practices have been organized at https://www.claudechrome.com/zh/blog/ai-browser-automation-challenges.

AIPex also supports BYOK (Bring Your Own Key), so you can try it for free with your own LLM API key.

References

  1. https://x.com/pk_iv/status/2005694082627297735

Категории

Рассылка

Присоединяйтесь к сообществу

Подпишитесь на нашу рассылку, чтобы получать последние новости и обновления