AgentVision v0.9.1 is live on PyPI

Coding agents are blind.
Now they can see.

Closes the visual feedback loop for AI coding agents:
render → perceive → report → fix — before claiming done.

View on GitHub PyPI Package Docs
$pip install"agentvision[render]"
How it works

The Self-Correcting Visual Loop

A machine-graded cycle the agent consumes to catch and fix visual bugs autonomously.

Render
Perceive
Report
Agent Fixes
Re-render
Diff
✓ PASS ⚠ WARN ✗ FAIL
Trustworthy by design

Grounded findings, not guesses

Every issue is anchored to a real DOM source — no hallucinated pixel boxes.

📐

DOM Geometry

Precise element boxes via getBoundingClientRect + scroll offset. Coordinate-grounded overflow and clip detection.

🎨

Real WCAG Contrast

Ratios from getComputedStyle — degrades honestly over gradients and images rather than lying.

🔤

OCR Word Boxes

Tesseract-powered text locations for clipped labels, overflowing content, and garbled rendering.

🌐

Network & Console

Catches 4xx errors and console warnings — the #1 “looks fine in code, broken live” cause.

🤖

Vision LLM Critique

Claude / OpenAI / Gemini add semantic critique on top. Pixel boxes flagged bbox_precise:false.

📱

Responsive Sheets

Contact sheets across 375, 768, 1280, 1920 px breakpoints to surface layout bugs at every viewport.

📸

Visual Regression

Named baseline snapshots + diff scoring so agents know exactly what regressed between iterations.

🩺

Doctor Command

Runs a real Chromium launch, checks Tesseract, poppler, and every missing system dependency.

🔌

Provider-Agnostic API

Swap backends via --backend flag or env var. Uniform API surface across all vision providers.

🎞️

Temporal Verification

watch samples frames over time to verify playback, loading & liveness — is the video actually playing, did the spinner clear, are captions on. Deterministic <video> state + pixel liveness. For streaming UIs & live dashboards.

🧩

Full-Coverage Vision

Large artifacts are tiled at full resolution (overview + detail), so no small text or chart data is lost to downscaling. Pixel-based & source-agnostic — HTML, images, PDFs, canvas alike.

Eyes & Brain

The eyes for a brain that decides

AgentVision is the eyes. Pair it with Verel — the brain — where nothing is “done” until a grader returns a verdict. The eyes perceive and grade intent; the brain decides with attestation and compounds only verified work into memory. Then the eyes look again.

Eyes and Brain - AgentVision perceives and grades intent; Verel decides and compounds verified work into memory

New in v0.9.1: intent conformance (does it match what you set out to build?), a generative loop for AI images/infographics, and the eyes→brain Handoff signal. Verel on GitHub →

Flexible integration

Many faces, one core

Use AgentVision from any surface that fits your workflow.

SurfaceWho it’s for
Library  import agentvisionPython apps and custom agent harnesses
CLI  agentvision analyze / loop / sheetAny agent that can run a shell command; CI pipelines
Skill  Claude Code SkillClaude agents — auto-invokes the loop before claiming done
MCP  agentvision-mcpCursor, Claude Desktop, any MCP-capable host
REST  agentvision-serveNon-MCP / networked / CI agents via HTTP
Vision backends

Pluggable & provider-agnostic

Switch via --backend or AGENTVISION_VISION_BACKEND

🟣

Anthropic

Default: claude-haiku-4-5
Upgradable to Sonnet / Opus

🟢

OpenAI

GPT-4o and compatible vision models

🔵

Gemini

Google Gemini vision via google-genai

Local (no key)

CV + OCR heuristics. No API key, no egress — great for CI & air-gapped envs.

Get started

Running in 60 seconds

No API key required for the demo.

Quick start
# install with rendering
pip install "agentvision[render]"

# install Chromium
playwright install chromium

# run the demo (no API key)
agentvision demo

# check system deps
agentvision doctor
CLI usage
# analyze a file / URL
agentvision analyze ./index.html --backend local

# self-correcting loop
agentvision loop ./dashboard.html --max-iter 3

# responsive contact sheet
agentvision sheet ./index.html --breakpoints 375,768,1280

# visual regression
agentvision baseline ./index.html --name home
agentvision regress  ./index.html --name home
Install extras
# everything
pip install "agentvision[all]"

# render + Claude
pip install "agentvision[render,anthropic]"

# MCP server
pip install "agentvision[render,mcp]"

# REST service
pip install "agentvision[render,serve]"
Python API
import asyncio
from agentvision import load_settings
from agentvision.core.loop import LoopSession

async def main():
    settings = load_settings(
        vision_backend="local"
    )
    session = LoopSession(
        "examples/broken.html",
        settings=settings
    )
    result = await session.iterate()
    print(result.report.verdict)

asyncio.run(main())

What we do not claim

Adopt

Drop it into your workflow & agents

CI gate in one step (uses: amitpatole/agent-vision@v0.9.1), a pre-commit hook, or shell out anywhere (exit codes 0/2/3). For agents: the Claude Code Skill, MCP tools, or a drop-in agent contract.