
How Claude for Chrome Works
Explore how Claude for Chrome integrates AI conversation capabilities into your browser. Learn about its architecture, content analysis features, and how it differs from browser automation tools.
How Claude in Chrome Works
Claude in Chrome has two working modes. One is an independent AI conversation mode where users can directly issue tasks in the extension. The other is a collaborative mode with Claude Code, where users can use the browser capabilities exposed by the extension in Claude Code to complete additional tasks using their browser.
Let's start by introducing the principles of the first mode.
Independent Conversation Mode
Reverse engineering the extension's working method is straightforward. Open the Claude Chrome extension, right-click to inspect and monitor the extension's network requests, as shown in the figure.

We discovered that during task execution, it continuously requests Claude's /v1/messages endpoint, which is Claude's official LLM API (similar to ChatGPT's /chat/completions endpoint). From the request body, we can extract the system prompt and tools definitions. I've open-sourced and organized this at https://github.com/AIPexStudio/system-prompts-of-claude-chrome
The specific system prompt is quite extensive (40+ kb), so I won't go into detail here. It mainly includes:
- Security Protection
- Prevent prompt injection attacks
- Isolate instructions from web page content, requiring explicit user confirmation
- Prevent social engineering attacks (impersonating administrators, urgent requests, etc.)
- Protect user privacy and data security
- Behavioral Guidelines
- Refuse to process harmful content (violence, pornography, malicious code, etc.)
- Conversation style and format requirements
- User health care (mental health support)
- Operation Classification
- Prohibited operations: Handling banking information, downloading untrusted files, permanent deletion, modifying security permissions, etc.
- Operations requiring explicit permission: Downloading files, purchasing, entering financial data, accepting agreements, etc.
- Regular operations: Operations that can be automatically executed
- Copyright Protection
- Do not copy copyrighted content
- Citation limits (maximum 15 words per citation)
- Do not provide copyrighted materials such as song lyrics
- Tool Usage Requirements
- Use
read_pageto get page element references - Prefer DOM element references over coordinates
- Efficiently read long page content
- Runtime Environment Related
- System information
- Keyboard shortcuts
- Multi-Tab Management
- How to get tab information (
tabs_context) - How to work across multiple tabs
- Create and manage new tabs
- Tab state management
- Response Flow Control
- Call
turn_answer_startbefore outputting responses - Ensure correct response flow
The tools include the following, categorized by function:
1. Page Reading & Navigation
| Tool Name | Description |
|---|---|
read_page | Get the Accessibility Tree representation of page elements |
find | Find page elements using natural language queries |
get_page_text | Extract raw text content from the page |
navigate | Navigate to a specified URL or browser history |
2. Interaction & Automation
| Tool Name | Description |
|---|---|
computer | Mouse and keyboard interaction, as well as screenshot functionality |
form_input | Set values in form elements |
javascript_tool | Execute JavaScript code in the page context |
3. Tab Management
| Tool Name | Description |
|---|---|
tabs_create | Create new browser tabs |
tabs_context | Get tab context information |
4. Media & Files
| Tool Name | Description |
|---|---|
upload_image | Upload images to file input fields or drop targets |
gif_creator | Record and export browser operations as animated GIFs |
5. Debugging & Monitoring
| Tool Name | Description |
|---|---|
read_console_messages | Read browser console messages |
read_network_requests | Read HTTP network request information |
6. Utilities
| Tool Name | Description |
|---|---|
resize_window | Resize browser window dimensions |
7. Custom Tools
| Tool Name | Description |
|---|---|
update_plan | Update and display plans to users for approval |
turn_answer_start | Mark the start of a response round |
The Claude Chrome extension combines these tools responsible for page understanding, clicking, form filling, screenshots, and debugging to complete complex tasks. Among them, update_plan and turn_answer_start are custom tools that have unique UI displays in the extension. update_plan requests users to "Approve Plan" or "Make Changes", and turn_answer_start requests users to confirm task start. The figure shows the UI display of update_plan.

Collaborative Mode with Claude Code
Claude Chrome requests the nativeMessaging permission, which allows the extension to communicate bidirectionally with local applications. Through this permission, Claude Chrome can establish connections with Claude Code (or other applications supporting Native Messaging) installed on the user's computer.
How Native Messaging Works
- Permission Request: The extension declares the
nativeMessagingpermission inmanifest.json - Local Application Registration: Claude Code registers as a Native Messaging Host in the system
- Establish Connection: The extension communicates with the local application through standard input/output (stdin/stdout)
- Message Passing: Uses JSON-formatted messages for bidirectional data exchange
Collaboration Flow with Claude Code
┌─────────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Claude Code │ ◄─────► │ Native Messaging │ ◄─────► │ Claude Chrome│
│ (Local App) │ JSON │ Host (Bridge) │ JSON │ (Extension) │
└─────────────────┘ └──────────────────┘ └──────────────┘
Workflow:
- Claude Code Initiates Request: User requests browser operations in Claude Code
- Message Passing: Claude Code sends requests to Claude Chrome extension via Native Messaging
- Extension Execution: Claude Chrome extension executes corresponding operations in the browser (such as clicking, filling forms, etc.)
- Result Return: Operation results are returned to Claude Code via Native Messaging
- Display Results: Claude Code displays results to the user
At this point, the Claude Chrome extension is no longer an Agent, but rather acts as an MCP server exposing browser tools to Claude Code, which then uses them.

Context Efficiency
I tried running the official example with Claude in Chrome:
Navigate to Zillow and find me a 2-bedroom apartment in San Francisco under $4000/month. Filter for places available within the next 30 days and show me the top 3 options with photos.I found it took 10 minutes to complete execution, so I investigated where the problem occurred. I discovered that the Claude Chrome extension primarily uses two modes to understand pages: one uses the page's accessibility tree, and when the accessibility tree cannot achieve the goal, it uses screenshots.
Many people may not be familiar with the accessibility tree, so let me briefly introduce it.
The Accessibility Tree is a page structure representation built by browsers to support assistive technologies (such as screen readers). It contains semantic information about all interactive elements on the page, such as:
- Element Roles: Buttons, links, input fields, headings, etc.
- Element Attributes: Names, descriptions, states (whether disabled, selected, etc.)
- Hierarchical Relationships: Parent-child relationships and order between elements
- Text Content: Readable text of elements
Compared to directly parsing the DOM tree, the accessibility tree provides a more concise and semantic page representation. For AI, the accessibility tree is easier to understand than raw HTML because it filters out a large amount of style and layout-related information, retaining only functional semantic information.
However, the accessibility tree has limitations. It depends on whether the website's accessibility tree is well-constructed. If a website's accessibility tree is not well-constructed, it may lead to inaccurate accessibility tree representations, thus affecting AI understanding.
Here I discovered a problem with Claude Chrome: for multiple snapshots of the same page, they are all saved in the context, but actually, past snapshots are meaningless and can be discarded. Additionally, Claude in Chrome puts the entire page's accessibility tree into the context, causing the context to explode rapidly when working on long pages.
Another problem is that when the accessibility tree fails, Claude in Chrome uses screenshots. However, how large is one screenshot? 100kb-500kb, varying. In my task, the screenshot was 335kb. Such screenshot information will persist in subsequent contexts, causing context explosion which leads to slow, ineffective, and expensive tasks.
Actually, there's no need to put screenshot information into the context as-is. To understand pages, we can first reduce the resolution, process it, and then put it into the context. AIPex's experience shows it can compress 10-20 times.
Of course, Claude was the first to propose context engineering. I believe the extension's context optimization algorithms will be launched soon. Stay tuned.
User Experience
Claude in Chrome's UI is very comfortable. Users can record workflows and edit them, then directly invoke them with commands next time. One selling point of Claude in Chrome is that it claims to support background operation, but actually, all operations of Claude in Chrome depend on the debugger permission. When the debugger permission is being used, the browser will definitely acquire user focus and force display at the forefront, making "background operation" a fallacy.
There's a workaround of not using the debugger, but that would require abandoning the accessibility tree and using other methods to complete operations like clicking and input.
Why I Recommend Trying AIPex?
I originally used and reverse-engineered the Claude in Chrome extension with a learning mindset, but after running the first use case, I understood that this extension currently has significant context efficiency problems, which would lead to slow and inefficient tasks.
AIPex has already navigated these pitfalls and done a lot of work on context engineering. For the same task, AIPex can be at least 2x faster than Claude in Chrome, and the more complex the task, the more obvious the difference. AIPex's core context practices have been organized at https://www.claudechrome.com/zh/blog/ai-browser-automation-challenges.
AIPex also supports BYOK (Bring Your Own Key), so you can try it for free with your own LLM API key.
References
カテゴリー
さらに投稿を見る

How to Record User Manual Guides in AIPex: AI-Powered Documentation Made Simple
Learn how to create comprehensive user manual guides effortlessly with AIPex's recording feature. Record your actions and let AI generate professional documentation automatically.

How AI Browser Automation Works: Uncovering the Principles Behind AI Browsers
Deep dive into the four levels of browser automation, analyze the principles and trade-offs of different technical approaches, and reveal how AI Browsers achieve efficient automation through accessibility trees, CDP protocol, and intelligent snapshots.

なぜAIPexがAIブラウザのゲームチェンジャーなのか
AIPexの独自の優位性が、AIブラウザ自動化のゲームチェンジャーとなっています。
ニュースレター
コミュニティに参加
最新のニュースとアップデートを受け取るためにニュースレターを購読