Closes the visual feedback loop for AI coding agents:
render → perceive → report → fix — before claiming done.
A machine-graded cycle the agent consumes to catch and fix visual bugs autonomously.
Every issue is anchored to a real DOM source — no hallucinated pixel boxes.
Precise element boxes via getBoundingClientRect + scroll offset. Coordinate-grounded overflow and clip detection.
Ratios from getComputedStyle — degrades honestly over gradients and images rather than lying.
Tesseract-powered text locations for clipped labels, overflowing content, and garbled rendering.
Catches 4xx errors and console warnings — the #1 “looks fine in code, broken live” cause.
Claude / OpenAI / Gemini add semantic critique on top. Pixel boxes flagged bbox_precise:false.
Contact sheets across 375, 768, 1280, 1920 px breakpoints to surface layout bugs at every viewport.
Named baseline snapshots + diff scoring so agents know exactly what regressed between iterations.
Runs a real Chromium launch, checks Tesseract, poppler, and every missing system dependency.
Swap backends via --backend flag or env var. Uniform API surface across all vision providers.
watch samples frames over time to verify playback, loading & liveness — is the video actually playing, did the spinner clear, are captions on. Deterministic <video> state + pixel liveness. For streaming UIs & live dashboards.
Large artifacts are tiled at full resolution (overview + detail), so no small text or chart data is lost to downscaling. Pixel-based & source-agnostic — HTML, images, PDFs, canvas alike.
AgentVision is the eyes. Pair it with Verel — the brain — where nothing is “done” until a grader returns a verdict. The eyes perceive and grade intent; the brain decides with attestation and compounds only verified work into memory. Then the eyes look again.

New in v0.9.1: intent conformance (does it match what you set out to build?), a generative loop for AI images/infographics, and the eyes→brain Handoff signal. Verel on GitHub →
Use AgentVision from any surface that fits your workflow.
| Surface | Who it’s for |
|---|---|
| Library import agentvision | Python apps and custom agent harnesses |
| CLI agentvision analyze / loop / sheet | Any agent that can run a shell command; CI pipelines |
| Skill Claude Code Skill | Claude agents — auto-invokes the loop before claiming done |
| MCP agentvision-mcp | Cursor, Claude Desktop, any MCP-capable host |
| REST agentvision-serve | Non-MCP / networked / CI agents via HTTP |
Switch via --backend or AGENTVISION_VISION_BACKEND
Default: claude-haiku-4-5
Upgradable to Sonnet / Opus
GPT-4o and compatible vision models
Google Gemini vision via google-genai
CV + OCR heuristics. No API key, no egress — great for CI & air-gapped envs.
No API key required for the demo.
# install with rendering pip install "agentvision[render]" # install Chromium playwright install chromium # run the demo (no API key) agentvision demo # check system deps agentvision doctor
# analyze a file / URL agentvision analyze ./index.html --backend local # self-correcting loop agentvision loop ./dashboard.html --max-iter 3 # responsive contact sheet agentvision sheet ./index.html --breakpoints 375,768,1280 # visual regression agentvision baseline ./index.html --name home agentvision regress ./index.html --name home
# everything pip install "agentvision[all]" # render + Claude pip install "agentvision[render,anthropic]" # MCP server pip install "agentvision[render,mcp]" # REST service pip install "agentvision[render,serve]"
import asyncio from agentvision import load_settings from agentvision.core.loop import LoopSession async def main(): settings = load_settings( vision_backend="local" ) session = LoopSession( "examples/broken.html", settings=settings ) result = await session.iterate() print(result.report.verdict) asyncio.run(main())
CI gate in one step (uses: amitpatole/agent-vision@v0.9.1), a pre-commit hook, or shell out anywhere (exit codes 0/2/3). For agents: the Claude Code Skill, MCP tools, or a drop-in agent contract.