Stagehand入門 - AIでブラウザ自動化、自然言語でE2Eテストを書く時代がきた

はじめに

ブラウザ自動化、もっと楽にならないかなって思ったことありませんか。

PlaywrightやSeleniumを使ってE2Eテストを書いてきた身としては、セレクタの指定が変わるたびにテストが壊れる、あの瞬間が本当に面倒なんですよね。「ボタンのclass名が変わっただけで全テスト失敗」みたいな話、もう何度経験したかわかりません。

そんな中で最近急成長しているのがStagehandです。Browserbase製のAIブラウザ自動化フレームワークで、GitHubのスター数はすでに1.9万を超えています。これ、自然言語でブラウザ操作を指示できるという、なかなか未来感のあるツールなんですよ。

個人的には、Stagehandを触ってみて「あ、これブラウザ自動化の常識が変わるやつだ」と感じました。今回はその理由を含めて紹介していきます。

Stagehandとは

Stagehandは、AIと自然言語を活用したブラウザ自動化フレームワークです。公式サイトでは「The AI Browser Automation Framework」と謳っており、Playwrightをベースにしながらも、AIによる柔軟な操作を実現しています。

主な特徴は以下の通りです。

自然言語でのブラウザ操作: 「ログインボタンをクリック」のような指示でOK
Playwright完全互換: 既存のPlaywrightスクリプトにそのまま追加可能
自己修復機能: サイトのUI変更に自動で対応
構造化データ抽出: Zodスキーマでページから型安全にデータを取得

特にSeleniumやPlaywrightで苦労してきた人間からすると、セレクタを気にせずに「このボタンをクリック」と書けるのは革命的ですね。

特徴・メリット

1. 自然言語でブラウザを操作できる

Stagehandの最大の特徴は、自然言語でブラウザ操作を指示できることです。

await page.act("click the login button");
await page.act("type 'test@example.com' in the email field");

従来はpage.getByRole('button', { name: 'ログイン' }).click()のように書いていたものが、自然な言葉で書ける。これ、意外と大きいんですよ。

なぜかというと、セレクタの変更に強くなるからです。ボタンのテキストが「ログイン」から「Log in」に変わっても、AIが文脈を理解して適切な要素を見つけてくれます。

2. 自己修復機能でメンテナンスが楽になる

従来のE2Eテストで一番面倒なのが、サイト変更時のテスト修正です。CSSクラスが変わった、要素の階層構造が変わった、そのたびにセレクタを修正する必要がありました。

Stagehandは過去のアクションを記憶していて、サイトが変更されても自動で適応してくれます。テストのメンテナンスコストが劇的に下がるわけですね。

3. コードとAIのハイブリッドアプローチ

「全部AIに任せるのは不安」という人も多いと思います。個人的にもそう思っていました。

Stagehandの良いところは、確実に動作させたい部分は従来のPlaywrightコードで書き、柔軟性が必要な部分だけAIを使う、というハイブリッドなアプローチが取れること。

// 確実にこのURLに遷移したい場合は従来のコード
await page.goto('https://example.com/login');

// UIの変更に強くしたい部分はAI
await page.act("fill in the login form with test credentials");
await page.act("submit the form");

このバランス感覚が実務向きだと感じています。

4. キャッシング機能でコスト削減

AI呼び出しにはトークンコストがかかります。毎回AIに問い合わせていたらコストが膨大になりそう、という心配があると思います。

Stagehandにはキャッシング機能があり、同じアクションは保存して再利用できます。一度学習したアクションは繰り返し使えるので、運用コストを抑えられる設計になっています。

インストール方法

Node.jsがインストールされている環境であれば、すぐに始められます。

新規プロジェクトの場合

npx create-browser-app

このコマンド一発で、Stagehandを使うための環境がセットアップされます。対話形式でプロジェクト名やLLMプロバイダーを選択できます。

既存プロジェクトに追加する場合

npm install @browserbasehq/stagehand

Playwrightベースなので、既存のPlaywrightプロジェクトにも簡単に追加できます。

ソースからビルドする場合

git clone https://github.com/browserbase/stagehand.git
cd stagehand
pnpm install
pnpm run build

基本的な使い方

初期設定

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({
  env: "LOCAL", // または "BROWSERBASE"
  modelName: "gpt-4o",
  modelClientOptions: {
    apiKey: process.env.OPENAI_API_KEY,
  },
});

await stagehand.init();
const page = stagehand.page;

act() - 自然言語でアクション実行

単一のアクションを実行する場合に使います。

// ボタンをクリック
await page.act("click on the login button");

// テキスト入力
await page.act("type 'hello@example.com' in the email input");

// フォーム送信
await page.act("submit the contact form");

extract() - 構造化データの抽出

Zodスキーマを使って、ページからデータを型安全に抽出できます。

import { z } from "zod";

const productSchema = z.object({
  name: z.string(),
  price: z.number(),
  description: z.string(),
});

const product = await page.extract(
  "extract the product information",
  productSchema
);

console.log(product.name);  // 型推論が効く
console.log(product.price); // number型

これ、スクレイピングをする際にかなり便利です。型がついているので、後続の処理で安心して使えます。

observe() - 実行可能なアクションの発見

ページ上で実行可能なアクションを自動検出します。

const actions = await page.observe("find all clickable buttons");

console.log(actions);
// [
//   { element: "login button", action: "click" },
//   { element: "signup button", action: "click" },
//   ...
// ]

動的なページを扱う際に、「このページで何ができるか」を把握するのに使えます。

agent() - 複雑なワークフローの自動化

複数ステップにまたがるタスクを、AIエージェントに任せることもできます。

const agent = stagehand.agent({
  model: "gpt-4o",
});

await agent.execute("search for 'TypeScript tutorial' and open the first result");

実践的なユースケース

ECサイトでの商品情報収集

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const stagehand = new Stagehand({ env: "LOCAL" });
await stagehand.init();
const page = stagehand.page;

await page.goto("https://example-shop.com/products");

const productsSchema = z.array(z.object({
  name: z.string(),
  price: z.number(),
  rating: z.number().optional(),
}));

const products = await page.extract(
  "extract all product names, prices, and ratings from this page",
  productsSchema
);

console.log(`Found ${products.length} products`);
products.forEach(p => {
  console.log(`${p.name}: ¥${p.price}`);
});

await stagehand.close();

フォーム自動入力

await page.goto("https://example.com/contact");

await page.act("fill the name field with 'Taro Yamada'");
await page.act("fill the email field with 'taro@example.com'");
await page.act("select 'Technical Support' from the category dropdown");
await page.act("type 'I have a question about your product' in the message textarea");
await page.act("check the privacy policy checkbox");
await page.act("click the submit button");

// 送信完了の確認
const result = await page.extract(
  "extract the confirmation message",
  z.object({ message: z.string() })
);
console.log(result.message);

ログインフローのテスト

import { test, expect } from "@playwright/test";
import { Stagehand } from "@browserbasehq/stagehand";

test("ログインが正常に完了する", async () => {
  const stagehand = new Stagehand({ env: "LOCAL" });
  await stagehand.init();
  const page = stagehand.page;

  await page.goto("https://example.com/login");

  // 自然言語でログイン操作
  await page.act("enter 'test@example.com' in the email field");
  await page.act("enter 'password123' in the password field");
  await page.act("click the login button");

  // 結果の確認は従来のアサーションで
  await expect(page).toHaveURL(/dashboard/);

  await stagehand.close();
});