iris
NewYour AI writes the code. Iris tells it whether the code actually works - with evidence, not screenshots.
Overview
Iris
Start by detecting which mode to run:
# Is Iris already set up in this project?
cat .iris.json 2>/dev/null || echo "NOT_FOUND"- •`.iris.json` not found → run Setup (below)
- •`.iris.json` found → run Test (further below)
SETUP MODE
Run this once per project. Writes config files, installs the SDK, and validates the
connection. After setup, every subsequent
/irisgoes straight to Test mode.
Step 0 — Ask these questions before doing anything
Ask ALL of them in a single message. Do not start installing until you have the answers.
Before asking Q7, run the detection commands below to pre-fill a suggestion — but always confirm with the user, because they may plan to use a tool that isn't installed yet.
which claude 2>/dev/null && echo "claude-code"
which opencode 2>/dev/null && echo "opencode"
which codex 2>/dev/null && echo "codex"
ls ~/.cursor/ 2>/dev/null && echo "cursor"
ls ~/.codeium/windsurf/ 2>/dev/null && echo "windsurf"
ls .vscode/ 2>/dev/null && echo "vscode"
which zed 2>/dev/null && echo "zed"1. What framework/stack is this app?
a) Vite + React (specify React 18 or 19)
b) Next.js (specify version + app/pages router)
c) Vite + Vue
d) Vite + Svelte
e) SvelteKit
f) Remix
g) Plain HTML / vanilla JS / other
2. What package manager are you using?
npm | pnpm | yarn | bun
3. What port does your dev server normally run on?
(e.g. 3000, 5173, 8080 — just the number, not the full URL)
4. Do you already have data-testid attributes on your key elements?
(If yes, Iris reuses them. If no, we'll add a handful to the most important elements.)
5. How do you want to use Iris?
a) Quick spot-check — verify a specific thing the agent just built.
b) Pair programming — present mode on, watch the agent work in the browser.
c) Full automation — record flows, replay in CI, catch regressions.
d) All of the above.
6. Do you want to see the browser while the agent tests?
a) Yes — show me a real browser window (headed mode).
b) No — run silently in the background (headless, default).
Save as IRIS_HEADED (true / false).
7. Which AI coding tool(s) will you use this project with?
(I detected: <list from detection above, or "none found">)
a) Claude Code b) OpenCode c) Codex CLI
d) Cursor e) Windsurf f) VS Code + GitHub Copilot g) Zed
h) Multiple — list them
Save as IRIS_HARNESSES.Step 0b — Pick a dedicated Iris testing port
Iris runs its own dev server instance so it never collides with the user's browser session.
Default: port 4310. Check if it's free:
lsof -ti :4310 2>/dev/null | head -1No output = free. If busy:
lsof -i :4310 2>/dev/null | head -5Ask the user: "Port 4310 is in use by <process>. Use 4311 instead, or kill that process?" Never silently pick an occupied port. Save the confirmed port as IRIS_PORT.
Also check the user's regular dev port (from Q3) isn't occupied by something unexpected.
Step 1 — Configure the MCP server
There is no single MCP config file all tools share. Each harness has its own file and
schema. Write only the ones in
IRIS_HARNESSES.
| Tool | File | Root key | Command format | type needed? |
|---|---|---|---|---|
| Claude Code | .mcp.json | mcpServers | "command" + "args" split | no |
| OpenCode | opencode.json | mcp | "command" flat array | "local" required |
| Codex CLI | .codex/config.toml | [mcp_servers.iris] | TOML command + args | no |
| Cursor | .cursor/mcp.json | mcpServers | "command" + "args" split | no |
| Windsurf | ~/.codeium/windsurf/mcp_config.json | mcpServers | "command" + "args" split | no |
| VS Code | .vscode/mcp.json | "servers" | "command" + "args" split | no |
| Zed | ~/.config/zed/settings.json | context_servers | "command" + "args" split | no |
Claude Code — `.mcp.json`
{
"mcpServers": {
"iris": {
"command": "npx",
"args": ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}Headed: append "--headed" to args. Tell user to reload Claude Code (/mcp to refresh).
OpenCode — `opencode.json` (type:"local" required; command is one flat array, no args)
{
"mcp": {
"iris": {
"type": "local",
"command": ["npx", "@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}Verify with opencode mcp list.
Codex CLI — `.codex/config.toml` (TOML, not JSON)
[mcp_servers.iris]
command = "npx"
args = ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"]Cursor — `.cursor/mcp.json` (same schema as Claude Code, different path)
{
"mcpServers": {
"iris": {
"command": "npx",
"args": ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}Windsurf — `~/.codeium/windsurf/mcp_config.json` (global; create if missing)
{
"mcpServers": {
"iris": {
"command": "npx",
"args": ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}VS Code — `.vscode/mcp.json` ("servers" not "mcpServers" — most common mistake)
{
"servers": {
"iris": {
"command": "npx",
"args": ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}MCP tools only appear in Copilot Agent mode.
Zed — `~/.config/zed/settings.json` (context_servers not mcpServers)
{
"context_servers": {
"iris": {
"command": "npx",
"args": ["@syrin/iris", "mcp", "--drive", "http://localhost:4310"],
},
},
}Replace 4310 with IRIS_PORT in all configs above.
Step 1b — Register the stop hook (Claude Code only)
Write or merge into .claude/settings.json:
{
"hooks": {
"Stop": [
{
"matcher": "",
"hooks": [{ "type": "command", "command": "npx @syrin/iris stop --quiet" }],
},
],
},
}Step 2 — Install the SDK
npm install --save-dev @syrin/iris # swap npm for pnpm/yarn/bun per Q2Step 3 — Wire up the SDK
Add to your app's dev entry point (inside a DEV guard — never runs in production):
Vite + React
// src/iris-dev.ts (import in main.tsx inside import.meta.env.DEV check)
import { install } from '@syrin/iris/react';
import { iris, registerCapabilities } from '@syrin/iris';
if (import.meta.env.DEV) {
install();
iris.connect({ session: 'my-app' });
registerCapabilities({ testids: [], signals: [], stores: [] });
}Next.js (App Router)
// app/iris-dev.tsx (import in layout.tsx inside a 'use client' + dev check)
'use client';
import { useEffect } from 'react';
export function IrisDev() {
useEffect(() => {
if (process.env.NODE_ENV !== 'development') return;
import('@syrin/iris').then(({ iris, registerCapabilities }) => {
import('@syrin/iris/react').then(({ install }) => {
install();
iris.connect({ session: 'my-app' });
registerCapabilities({ testids: [], signals: [], stores: [] });
});
});
}, []);
return null;
}Add @syrin/iris/next → withIris to next.config.mjs for source mapping.
Other frameworks — same pattern: import iris and call iris.connect() inside a dev guard. Framework-specific adapters: Vue, Svelte adapters follow the same shape.
Step 3b — Add dev:iris script
Add to package.json scripts so Iris has its own dev server on IRIS_PORT:
| Framework | dev:iris value |
|---|---|
| Vite | "vite --port 4310" |
| Next.js | "next dev --port 4310" |
| Create React App | "PORT=4310 react-scripts start" |
| SvelteKit | "vite dev --port 4310" |
| Remix | "remix dev --port 4310" |
Replace 4310 with IRIS_PORT.
Step 4 — Save config and validate
Write .iris.json to the project root (commit this):
{
"port": 4310,
"headed": false,
"framework": "vite-react",
"harnesses": ["claude-code"],
}Fill in IRIS_PORT, IRIS_HEADED, framework from Q1, IRIS_HARNESSES from Q7.
Tell the user: "Run `npm run dev:iris` to start the Iris testing server."
Once they confirm it's running, call iris_sessions(). You should see a session at http://localhost:<IRIS_PORT>/. If the URL shows a different port, another app connected first — call iris_end_session() and navigate: iris_navigate({ url: "http://localhost:<IRIS_PORT>" }).
When a session is confirmed, tell the user:
"Iris is set up. Type
/irisanytime to verify the app after a change."
Setup complete — stop here. Do not proceed to Test mode.
TEST MODE
Runs automatically when
.iris.jsonexists. Connects to the running app, exercisesflows, asserts outcomes, and reports what passed and what broke.
Phase 1 — Connect
Call iris_sessions(). Three possible states:
A. One session → proceed.
B. No sessions: Read IRIS_PORT from .iris.json. Tell the user:
"No app connected. Run
npm run dev:irisfirst, then try/irisagain."Stop here.
C. Multiple sessions — ask:
"I see [N] sessions connected: [list sessionId + url]. Which should I test?"
Pin sessionId for every subsequent call.
Phase 2 — Orient
Call these in parallel:
iris_snapshot({ sessionId, maxDepth: 3 })
iris_capabilities({ sessionId })
iris_network({ sessionId, limit: 10 })
iris_console({ sessionId, limit: 20 })Build a mental model:
- •Route/screen: where is the app right now?
- •Testids: what interactive elements are registered?
- •Signals: what domain events does the app emit?
- •Console state: any errors already present before touching anything?
Pre-existing console errors → call them out immediately before testing.
Phase 3 — Decide what to test
Then pick a mode:
| Context | Mode |
|---|---|
| User says "test X" or names a flow | Targeted — focus on that feature |
| User says "everything" or "smoke test" | Smoke — exercise every registered testid |
| Recent git diff shows a specific component | Targeted — that component and adjacent flows |
| No clear signal | Smoke |
Phase 4 — Run the tests
Targeted
- Navigate if needed:
iris_navigate({ sessionId, url }) - Snapshot to confirm correct state
- Act on controls using testids:
`` iris_act({ sessionId, ref, action: "click" }) ``
- Assert — always use
sincefrom the act result:
`` iris_assert({ sessionId, since, timeout_ms: 5000, predicate: { allOf: [ { kind: "net", method: "POST", urlContains: "/api/...", status: 200 }, { kind: "element", query: { role: "...", name: "..." }, state: "visible" }, { kind: "signal", name: "..." }, { kind: "console", level: "error", absent: true } ]}}) ``
- Record: ✅ pass / ❌ fail / ⚠️ partial
Smoke
Walk every testid in capabilities.testids. For each one that is visible and interactable:
iris_query({ sessionId, by: "testid", value: testid })
→ iris_act({ sessionId, ref, action: "click" })
→ iris_assert({ since, predicate: { kind: "console", level: "error", absent: true } })Flag anything that throws a console error or triggers a status >= 400 network call.
Phase 5 — Report
## Iris — <route or feature>
**Result: ✅ PASS / ❌ FAIL / ⚠️ PARTIAL**
| Flow | Result | Evidence |
|---|---|---|
| Login → dashboard | ✅ | POST /api/login 200, route /dashboard |
| Click "Deploy" | ❌ | POST /api/deploy 401 — missing auth header |
| Sidebar nav | ✅ | 4 items, no console errors |
**Console errors:** none / <list>
**Failed requests:** none / <list>
**Fix at:** src/lib/api.ts:65 ← from iris_inspect on the failing elementIf something failed, call iris_inspect({ sessionId, ref }) on the failing element to get the file:line, and include it in the report.
Rules (always apply in Test mode)
- •Always pass
sinceiniris_assert— scopes to post-action events, prevents stale buffer fakes. - •Always assert
{ kind: "console", level: "error", absent: true }— silent errors are the most common thing agents miss. - •Batch net + element + signal + console into one
allOf— don't calliris_assertfour times. - •Never assert on pixels — use predicates, not
iris_screenshot(screenshots are for genuinely visual checks only). - •If the session disconnects mid-test (navigation creates a new session ID) — call
iris_sessions()again and continue.
Install & Usage
mkdir -p .claude/skillsAdd the configuration to .claude/skills/iris.md
/irisSecurity Audits
Frequently Asked Questions
What is iris?
Your AI writes the code. Iris tells it whether the code actually works - with evidence, not screenshots.
How to install iris?
To install iris: create the skills directory (mkdir -p .claude/skills), then add the config to .claude/skills/iris.md. Finally, /iris in Claude Code.
What is iris best for?
iris is a community categorized under General. Created by syrin-labs.