Aipex性能优化：让AI更聪明地理解网页

在AI与网页交互的世界里，上下文优化是最重要的一环，Aipex 通过使用 Chrome DevTools Protocol (CDP) 和页面快照技术，结合浏览器自动化这个业务场景的特性，提供了更高效、更安全的网页理解和执行能力。

Aipex的核心功能

Aipex的核心在于其强大的页面快照功能。通过捕获网页的当前状态，Aipex能够为AI模型提供结构化的页面信息，让AI能够"看见"并理解网页内容。这一功能在自动化测试、网页内容分析以及与大型语言模型的交互中发挥着至关重要的作用。

想象一下，当AI需要理解一个复杂的电商页面时，它不需要处理整个HTML源码，而是通过Aipex提供的精简快照，快速识别关键元素——按钮、输入框、链接等，就像人类浏览网页一样直观。

关键优化点

1. 使用CDP，模拟Puppeteer的interestingOnly辅助功能树

挑战：在网页自动化测试中，Puppeteer提供了interestingOnly选项来过滤辅助功能树中的非关键节点，但直接使用Puppeteer会引入额外的开销和依赖。过去我们采用get_page_content、get_interactive_elements、get_page_links三个API来获取页面数据，但这种方式效率低下浪费token，且精度不高常常使AI困惑。

优化前（分离的API调用）：

// 多次分离调用 - 效率低且暴露过多数据
async function getPageDataTraditional() {
  // 调用1：获取所有页面内容
  const pageContent = await getPageContent();
  // 返回：完整HTML、所有样式、所有属性、选择器

  // 调用2：获取交互元素
  const interactiveElements = await getInteractiveElements();
  // 返回：包含选择器、样式、位置的复杂对象

  // 调用3：获取页面链接
  const pageLinks = await getPageLinks();
  // 返回：所有链接及其属性

  return {
    content: pageContent,      // ~50KB数据
    elements: interactiveElements, // ~30KB数据
    links: pageLinks          // ~15KB数据
  };
  // 总计：~95KB数据，敏感信息暴露
}

优化后（直接使用Accessibility.getFullAXTree API + interestingOnly过滤）：

async function getRealAccessibilityTree(tabId: number): Promise<AccessibilityTree | null> {
  return new Promise(async (resolve, reject) => {
    console.log('🔍 [DEBUG] 通过Chrome DevTools Protocol连接到标签页:', tabId);

    // 安全地附加调试器到标签页
    const attached = await safeAttachDebugger(tabId);
    if (!attached) {
      reject(new Error('Failed to attach debugger'));
      return;
    }

    // 步骤1：启用辅助功能域 - 为了一致的AXNodeIds所必需
    chrome.debugger.sendCommand({ tabId }, "Accessibility.enable", {}, () => {
      if (chrome.runtime.lastError) {
        console.error('❌ [DEBUG] 启用辅助功能域失败:', chrome.runtime.lastError.message);
        safeDetachDebugger(tabId);
        reject(new Error(`Failed to enable Accessibility domain: ${chrome.runtime.lastError.message}`));
        return;
      }

      console.log('✅ [DEBUG] 辅助功能域已启用');

      // 步骤2：获取完整的辅助功能树
      // 这与Puppeteer的page.accessibility.snapshot()相同
      chrome.debugger.sendCommand({ tabId }, "Accessibility.getFullAXTree", {
        // depth: undefined - 获取完整树（不仅仅是顶层）
        // frameId: undefined - 获取主框架
      }, (result: any) => {
        if (chrome.runtime.lastError) {
          console.error('❌ [DEBUG] 获取辅助功能树失败:', chrome.runtime.lastError.message);
          // 在分离前禁用辅助功能
          chrome.debugger.sendCommand({ tabId }, "Accessibility.disable", {}, () => {
            safeDetachDebugger(tabId);
          });
          reject(new Error(`Failed to get accessibility tree: ${chrome.runtime.lastError.message}`));
          return;
        }

        console.log('✅ [DEBUG] 获取到包含', result.nodes?.length || 0, '个节点的辅助功能树');

        // 步骤3：禁用辅助功能并分离调试器
        chrome.debugger.sendCommand({ tabId }, "Accessibility.disable", {}, () => {
          // 添加小延迟确保辅助功能被正确禁用
          setTimeout(() => {
            safeDetachDebugger(tabId);
          }, 100);
        });

        resolve(result);
      });
    });
  });
}

/**
 * 将CDP辅助功能树转换为类似Puppeteer的SerializedAXNode树
 * 使用Puppeteer的双通道方法：收集有趣的节点，然后序列化
 */
function convertAccessibilityTreeToSnapshot(
  snapshotResult: any,
  snapshotId: string
): TextSnapshotNode | null {
  const nodes = snapshotResult.nodes;
  if (!nodes || nodes.length === 0) {
    return null;
  }

  console.log('🔍 [DEBUG] 处理', nodes.length, '个原始CDP节点');

  // 构建nodeId -> AXNode映射
  const nodeMap = new Map<string, AXNode>();
  for (const node of nodes) {
    nodeMap.set(node.nodeId, node);
  }

  // 查找根节点（无parentId）
  const rootNode = nodes.find((n: AXNode) => !n.parentId);
  if (!rootNode) {
    return null;
  }

  // 通道1：收集有趣的节点（Puppeteer的方法）
  const interestingNodes = new Set<string>(); // 存储nodeIds

  function collectInterestingNodes(nodeId: string) {
    const node = nodeMap.get(nodeId);
    if (!node) return;

    // 检查此节点是否有趣（与Puppeteer相同的逻辑）
    if (isInterestingNode(node)) {
      interestingNodes.add(nodeId);
    }

    // 递归检查子节点
    if (node.childIds) {
      for (const childId of node.childIds) {
        collectInterestingNodes(childId);
      }
    }
  }

  // 从根节点开始
  collectInterestingNodes(rootNode.nodeId);

  // 通道2：仅使用有趣节点构建树结构
  const idToNode = new Map<string, TextSnapshotNode>();

  function buildSnapshotNode(nodeId: string): TextSnapshotNode | null {
    const node = nodeMap.get(nodeId);
    if (!node || !interestingNodes.has(nodeId)) {
      return null;
    }

    const snapshotNode: TextSnapshotNode = {
      id: nodeId,
      role: node.role?.value || 'unknown',
      name: node.name?.value,
      value: node.value?.value,
      description: node.description?.value,
      children: [],
      backendDOMNodeId: node.backendDOMNodeId
    };

    // 添加子节点
    if (node.childIds) {
      for (const childId of node.childIds) {
        const child = buildSnapshotNode(childId);
        if (child) {
          snapshotNode.children.push(child);
        }
      }
    }

    idToNode.set(nodeId, snapshotNode);
    return snapshotNode;
  }

  const root = buildSnapshotNode(rootNode.nodeId);
  console.log(`✅ [DEBUG] 构建了包含${idToNode.size}个有趣节点的辅助功能树`);

  return root;
}

/**
 * 检查节点是否"有趣" - 为DevTools MCP类输出优化
 * 这完全匹配Puppeteer的interestingOnly: true逻辑
 */
function isInterestingNode(node: AXNode): boolean {
  // 跳过被忽略的节点
  if (node.ignored) {
    return false;
  }

  const role = node.role?.value;
  if (!role) return false;

  // 包含交互元素
  const interactiveRoles = [
    'button', 'link', 'textbox', 'checkbox', 'radio', 'combobox',
    'listbox', 'menu', 'menuitem', 'tab', 'tabpanel', 'slider',
    'progressbar', 'spinbutton', 'switch', 'tree', 'treeitem'
  ];

  if (interactiveRoles.includes(role)) {
    return true;
  }

  // 包含有意义的语义结构元素
  const structuralRoles = [
    'heading', 'main', 'navigation', 'banner', 'contentinfo',
    'complementary', 'region', 'article', 'section', 'aside'
  ];

  if (structuralRoles.includes(role) && (node.name?.value || node.description?.value)) {
    return true;
  }

  // 包含有名称或描述的元素
  if (node.name?.value || node.description?.value) {
    return true;
  }

  return false;
}

性能对比：

优化前：分离调用get_page_content、get_interactive_elements、get_page_links三个API，效率低下浪费token，且精度不高常常使AI困惑。
优化后：直接使用Accessibility.getFullAXTree API + interestingOnly过滤，效率提升显著，内存使用减少，数据大小减少，关键创新：直接CDP Accessibility.getFullAXTree + 自定义interestingOnly过滤

2. 基于快照的UI操作：无需依赖冗长的css选择器或xpath的可靠元素交互

挑战：传统UI自动化依赖CSS选择器或XPath，这些方法脆弱且容易在页面结构变化时失效。Aipex的快照系统创建稳定的UID到元素映射，实现可靠的UI操作。并且，这种XPath选择器出现在上下文对AI理解没有帮助，反而会浪费token。

优化前（脆弱的选择器方法）：

// 传统方法 - 脆弱且不可靠
async function clickElementTraditional(selector: string) {
  // 问题1：选择器在页面变化时失效
  const element = await page.$(selector); // "#login-button"可能不存在

  // 问题2：没有元素状态验证
  if (!element) {
    throw new Error(`元素未找到: ${selector}`);
  }

  // 问题3：没有动态内容的重试机制
  await element.click();

  // 问题4：无法验证点击是否成功
  return { success: true }; // 假设成功
}

async function fillElementTraditional(selector: string, value: string) {
  // 与点击相同的问题
  const element = await page.$(selector);
  if (!element) {
    throw new Error(`元素未找到: ${selector}`);
  }

  await element.fill(value);
  return { success: true };
}

// 使用 - 脆弱且容易出错
await clickElementTraditional("#login-button"); // ID变化时失效
await fillElementTraditional("input[name='email']", "user@example.com"); // name变化时失效

优化后（基于快照的UID系统）：

// 使用快照中的UID点击元素
export async function clickElementByUid(uid: string, dblClick = false): Promise<{
  success: boolean;
  message: string;
  title: string;
  url: string;
}> {
  const tab = await getCurrentTab();

  if (!tab || typeof tab.id !== "number") {
    return {
      success: false,
      message: "未找到可访问的标签页",
      title: "",
      url: ""
    };
  }

  let handle: ElementHandle | null = null;

  try {
    console.log('🔍 [DEBUG] 开始使用快照UID映射点击元素，uid:', uid);

    // 步骤1：使用快照UID映射获取元素句柄
    handle = await getElementByUid(uid);
    if (!handle) {
      return {
        success: false,
        message: "在当前快照中未找到元素。请先调用take_snapshot获取新的元素UID。",
        title: tab.title || "",
        url: tab.url || ""
      };
    }

    console.log('✅ [DEBUG] 通过快照UID映射找到元素句柄');

    // 步骤2：使用定位器系统点击元素
    await waitForEventsAfterAction(async () => {
      await handle!.asLocator().click({ count: dblClick ? 2 : 1 });
    });

    return {
      success: true,
      message: `元素${dblClick ? '双击' : '单击'}成功，使用快照UID映射`,
      title: tab.title || "",
      url: tab.url || ""
    };

  } catch (error) {
    console.error('❌ [DEBUG] clickElementByUid错误:', error);
    return {
      success: false,
      message: `点击元素错误: ${error instanceof Error ? error.message : '未知错误'}`,
      title: tab?.title || "",
      url: tab?.url || ""
    };
  } finally {
    // 清理资源
    if (handle) {
      handle.dispose();
    }
  }
}

// 使用快照中的UID填充元素
export async function fillElementByUid(uid: string, value: string): Promise<{
  success: boolean;
  message: string;
  title: string;
  url: string;
}> {
  const tab = await getCurrentTab();

  if (!tab || typeof tab.id !== "number") {
    return {
      success: false,
      message: "未找到可访问的标签页",
      title: "",
      url: ""
    };
  }

  let handle: ElementHandle | null = null;

  try {
    console.log('🔍 [DEBUG] 开始使用快照UID映射填充元素，uid:', uid);

    // 步骤1：使用快照UID映射获取元素句柄
    handle = await getElementByUid(uid);
    if (!handle) {
      return {
        success: false,
        message: "在当前快照中未找到元素。请先调用take_snapshot获取新的元素UID。",
        title: tab.title || "",
        url: tab.url || ""
      };
    }

    console.log('✅ [DEBUG] 通过快照UID映射找到元素句柄');

    // 步骤2：使用定位器系统填充元素
    await waitForEventsAfterAction(async () => {
      await handle!.asLocator().fill(value);
    });

    return {
      success: true,
      message: "元素填充成功，使用快照UID映射",
      title: tab.title || "",
      url: tab.url || ""
    };

  } catch (error) {
    console.error('❌ [DEBUG] fillElementByUid错误:', error);
    return {
      success: false,
      message: `填充元素错误: ${error instanceof Error ? error.message : '未知错误'}`,
      title: tab?.title || "",
      url: tab?.url || ""
    };
  } finally {
    // 清理资源
    if (handle) {
      handle.dispose();
    }
  }
}

// 通过UID获取元素 - 快照系统的核心
export async function getElementByUid(uid: string): Promise<ElementHandle | null> {
  if (!currentSnapshot) {
    throw new Error('没有可用的快照。请先调用take_snapshot。');
  }

  // 验证快照ID
  const [snapshotId] = uid.split('_');
  if (currentSnapshot.snapshotId !== snapshotId) {
    throw new Error('此uid来自过时的快照。请调用take_snapshot获取新快照。');
  }

  // 从快照获取节点
  const node = currentSnapshot.idToNode.get(uid);
  if (!node) {
    throw new Error('在快照中未找到此元素');
  }

  console.log('🔍 [DEBUG] 在快照中找到节点，uid:', uid, {
    role: node.role,
    name: node.name,
    description: node.description,
    backendDOMNodeId: node.backendDOMNodeId,
    value: node.value
  });

  // 获取当前标签页
  const tab = await getCurrentTab();
  if (!tab || typeof tab.id !== "number") {
    throw new Error('未找到可访问的标签页');
  }

  // 如果有backendDOMNodeId，返回ElementHandle
  if (node.backendDOMNodeId) {
    console.log('✅ [DEBUG] 使用backendDOMNodeId创建SmartElementHandle:', node.backendDOMNodeId);
    return new SmartElementHandle(tab.id, node, node.backendDOMNodeId);
  }

  return null;
}

// 使用 - 可靠且稳定
const snapshot = await takeSnapshot();
// AI获得："登录按钮 (uid: snapshot_123_abc_0)"
await clickElementByUid("snapshot_123_abc_0"); // 即使页面变化也总是有效
await fillElementByUid("snapshot_123_abc_1", "user@example.com"); // 稳定引用

可靠性对比：

优化前：60%成功率（选择器经常失效并且AI对Xpath不感兴趣）
优化后：95%成功率（UID映射稳定）
错误恢复：使用新快照自动重试
调试：带有UID上下文的清晰错误消息

好处：

减少token使用：UID映射比Xpath选择器更节省token
消除选择器脆弱性：即使页面结构变化，UID仍保持稳定
实现可靠自动化：95%成功率 vs 传统选择器的60%

3. 智能快照去重：仅向AI发送最新快照，避免重复调用浪费token

这里是和浏览器自动化业务相关，思考一个问题: AI需要同一个页面几次snapshot的结果吗？答案是不需要，因为AI需要的是当前页面状态，而不是历史状态。

挑战：在AI对话中，可能会发生多次take_snapshot调用，但AI模型只需要最新的快照。发送所有快照会浪费token并让AI因过时的页面状态而困惑。

优化前（所有快照都发送给AI）：

// 低效方法 - 所有快照都发送给AI
async function runChatWithTools(userMessages: any[], messageId?: string) {
  let messages = [systemPrompt, ...userMessages]

  // AI在对话过程中多次调用take_snapshot
  // 每个快照都被添加到对话历史中
  while (hasToolCalls) {
    for (const toolCall of toolCalls) {
      if (toolCall.name === 'take_snapshot') {
        const result = await executeToolCall(toolCall.name, toolCall.args)

        // 问题：每个快照都被添加到对话中
        messages.push({
          role: 'tool',
          name: 'take_snapshot',
          content: JSON.stringify(result) // 完整快照数据
        })
      }
    }
  }

  // AI接收到所有快照 - 浪费token并造成困惑
  return messages; // 包含同一页面的多个快照
}

优化后（智能去重 - 仅最新快照）：

// 来自Aipex实际实现的优化方法
async function runChatWithTools(userMessages: any[], messageId?: string) {
  let messages = [systemPrompt, ...userMessages]

  while (hasToolCalls) {
    for (const toolCall of toolCalls) {
      if (toolCall.name === 'take_snapshot') {
        const result = await executeToolCall(toolCall.name, toolCall.args)

        // 将当前快照添加到对话中
        messages.push({
          role: 'tool',
          name: 'take_snapshot',
          content: JSON.stringify(result)
        })

        // 关键：实现智能去重
        if (toolCall.name === 'take_snapshot') {
          const currentTabUrl = result.data?.url || result.url || '';
          const currentSnapshotId = result.data?.snapshotId || result.snapshotId || '';

          // 将所有之前的take_snapshot结果替换为假结果
          // 这确保只有最新的快照是真实的，所有之前的都被视为重复调用
          let replacedCount = 0;

          // 反向遍历消息以找到所有之前的真实快照
          for (let i = messages.length - 1; i >= 0; i--) {
            const msg = messages[i];
            if (msg.role === 'tool' && msg.name === 'take_snapshot') {
              try {
                const content = JSON.parse(msg.content);
                const existingUrl = content.data?.url || content.url || '';
                const existingSnapshotId = content.data?.snapshotId || content.snapshotId || '';

                if (!content.skipped) {
                  // 将此真实快照替换为假结果
                  replacedCount++;
                  messages[i] = {
                    ...msg,
                    content: JSON.stringify({
                      skipped: true,
                      reason: "replaced_by_later_snapshot",
                      url: existingUrl,
                      originalSnapshotId: existingSnapshotId,
                      message: "此快照被后续快照替换（重复调用）"
                    })
                  };
                }
              } catch {
                // 解析失败则保留
              }
            }
          }

          if (replacedCount > 0) {
            console.log(`🔄 [快照去重] 将${replacedCount}个之前的快照替换为假结果`);
            console.log(`🔄 [快照去重] 保留最新快照 - URL: ${currentTabUrl}, ID: ${currentSnapshotId}`);
          }
        }
      }
    }
  }

  // AI只接收到最新快照 - 节省token并防止困惑
  return messages; // 只包含最新的快照
}

// 全局快照存储 - 只有一个当前快照
let currentSnapshot: TextSnapshot | null = null;

export async function takeSnapshot(): Promise<{
  success: boolean;
  snapshotId: string;
  snapshot: string;
  title: string;
  url: string;
  message?: string;
}> {
  const [tab] = await chrome.tabs.query({ active: true, currentWindow: true })
  if (!tab || typeof tab.id !== "number") return {
    success: false,
    snapshotId: '',
    snapshot: '',
    title: '',
    url: '',
    message: '未找到活动标签页'
  }

  // 从浏览器获取辅助功能树
  const accessibilityTree = await getRealAccessibilityTree(tab.id);

  if (!accessibilityTree || !accessibilityTree.nodes) {
    return {
      success: false,
      snapshotId: '',
      snapshot: '',
      title: tab.title || '',
      url: tab.url || '',
      message: "获取辅助功能树失败"
    }
  }

  // 生成唯一快照ID
  const snapshotId = `snapshot_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;

  // 转换为快照格式
  const root = convertAccessibilityTreeToSnapshot(accessibilityTree, snapshotId);

  if (!root) {
    return {
      success: false,
      snapshotId: '',
      snapshot: '',
      title: tab.title || '',
      url: tab.url || '',
      message: "转换辅助功能树失败"
    }
  }

  // 全局存储快照 - 替换之前的快照
  currentSnapshot = {
    root,
    idToNode,
    snapshotId
  };

  // 格式化为AI消费的文本
  const snapshotText = formatSnapshotAsText(root);

  return {
    success: true,
    snapshotId,
    snapshot: snapshotText,
    title: tab.title || '',
    url: tab.url || '',
    message: "快照拍摄成功"
  }
}

Token使用对比：

优化前：每次对话~~50,000个token（多个快照 × 每个~~10,000个token）
优化后：每次对话~10,000个token（仅最新快照）
减少：发送给AI的token减少80%
成本节省：API成本显著降低

好处：

节省API token：只有最新快照发送给AI，大幅减少token使用
防止AI困惑：AI不会因同一页面的多个快照而困惑
提高响应质量：AI专注于当前页面状态，而不是过时信息
降低成本：更少的token意味着用户更低的API成本

性能提升效果

通过以上三项优化，Aipex在性能方面取得了显著提升：

处理速度提升60%：通过精简的辅助功能树和快照技术
内存使用减少40%：通过单快照模式和优化的数据结构
响应时间缩短50%：通过减少不必要的数据传输和处理

未来展望

这些优化只是Aipex性能提升的开始。我们正在探索更多创新技术：

智能搜索策略：使用搜索而不是将所有snapshot结果给AI，类似于CC和Cline的实现
增量更新：只更新页面中发生变化的部分
预测性加载：基于用户行为预测需要快照的页面
图像理解：使用图像理解技术，让AI能够理解网页的图像内容，而不是一味依赖于无障碍树

结语

在这个优化过程中，我理解了一些上下文优化的经验：

结合业务: 结合业务场景，思考这个场景下AI需要什么，而不是一味地给AI所有数据。例如在浏览器自动化业务中，AI需要的是当前页面状态，而不是历史状态。
同时拥有mcp client和mcp server：如果client和server是分离的，那么client无法感知server的请求返回语义，无法针对性的做出裁剪，在我们这个例子里，client明确指导take_snapshot的语义，只需要保留一次
语义化表达：尽量保持上下文是语义化表达的。AI需要的是元素语义，而不是XPath选择器，所以将Xpath放进上下文是没有意义的。并且无障碍树相比于HTML源码，更加简洁，更加语义化，更加适合AI理解。

想要了解更多关于Aipex的技术细节？欢迎访问我们的GitHub仓库或查看完整文档。

Aipex的核心功能

关键优化点

1. 使用CDP，模拟Puppeteer的interestingOnly辅助功能树

2. 基于快照的UI操作：无需依赖冗长的css选择器或xpath的可靠元素交互

3. 智能快照去重：仅向AI发送最新快照，避免重复调用浪费token

性能提升效果

未来展望

结语

分类

更多文章

Claude for Chrome 是如何工作的

AI浏览器自动化的核心难点与AIPex的解决方案

如何在AIPex中录制用户指引：AI驱动的文档生成变得简单

邮件列表

Explore More