Condor Eye — AI Vision for Your Screen

"No Claude, look here — it's not working!"

You're staring at a broken UI. Claude is staring at your code. You describe the bug. Claude guesses. You paste a screenshot into the chat. Claude says "I can't view images in this interface."

The problem is simple: your AI agent is blind. It can read your code, run your tests, and edit your files — but it can't see what you see on screen.

Condor Eye fixes that. One overlay, one command, and your agent can see exactly what's in front of you.

claude-code

you the chart is rendering wrong claude I'll check the render logic... claude The code looks correct to me. you no, LOOK at it. the bars are clipped ───────── before condor eye ───────── you /screen chrome claude I can see it — the volume bars are clipped at the bottom. The SVG viewBox height is 200 but the bars extend to y=235. Fixing... ────────── after condor eye ──────────

Capabilities

Everything your agent needs to see

A lightweight overlay that turns any screen into a data source for AI agents.

◼

Screen Capture

Capture any region via the transparent overlay frame. Win32 GDI for native speed. DPI-aware with automatic scaling.

⍑

AI Vision

Screenshots sent to Claude for visual analysis. Get structured JSON extractions or natural language descriptions.

⇄

HTTP API

Three endpoints on port 9050. Any tool that speaks HTTP can capture screens, locate windows, and check status.

◈

MCP Server

Four Claude Code tools via Model Context Protocol. Your agent can see the screen without leaving the terminal.

✎

Drawing Pen

Freehand annotations with pressure simulation. Circle areas, draw arrows, write notes — included in screenshots.

▣

Focus Box

Draggable highlight region that tells the AI exactly where to look. Resize and position to isolate any UI element.

Focus Box

"Look at this part"

Without focus, the AI describes everything on screen. With the focus box, it zeroes in on exactly what you're pointing at.

Without focus box

AAPL 189.42 +1.23

MSFT 412.87 +0.54

NVDA 891.03 -12.40

TSLA 178.55 +3.21

AMZN 182.10 -0.89

META 501.22 +2.15

AI Response

I see a stock watchlist showing 6 tickers with prices and daily changes. There's a sidebar with market statistics. AAPL is at 189.42 up 1.23, MSFT at 412.87...

▣ With focus box

AAPL 189.42 +1.23

MSFT 412.87 +0.54

NVDA 891.03 -12.40

TSLA 178.55 +3.21

AMZN 182.10 -0.89

META 501.22 +2.15

FOCUS

AI Response

NVDA is down $12.40 (−1.37%) at 891.03 — the only significant red in the list. This is a sharp single-session drop. Previous close was ~903.43.

Architecture

How it works

Rust backend with an axum HTTP server, WebView2 frontend, and a Node.js MCP bridge.

architecture.rs

┌─────────────────────────────────────────────────────────────┐
│  WebView2 Frontend  (src/)                                  │
│                                                             │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐  │
│  │ Capture  │ │ Focus    │ │ Draw     │ │ Results   │  │
│  │ Frame    │ │ Box      │ │ Canvas   │ │ Panel     │  │
│  └──────────┘ └──────────┘ └──────────┘ └───────────┘  │
├─────────────────────────────────────────────────────────────┤
│  Rust Backend  (src-tauri/src/)                              │
│                                                             │
│  capture.rs  ── Win32 GDI screen capture                   │
│  claude.rs   ── Anthropic Vision API                       │
│  http_api.rs ── axum server (:9050)                        │
│  compare.rs  ── diff engine                                │
│  truth.rs    ── Redis ground truth                        │
│  windows.rs  ── Win32 window enumeration                  │
├─────────────────────────────────────────────────────────────┤
│  MCP Server  (mcp/)                                        │
│                                                             │
│  Node.js stdio transport ── wraps HTTP API as               │
│  Claude Code tools for terminal integration                 │
└─────────────────────────────────────────────────────────────┘

HTTP API

Three endpoints. Zero dependencies.

An axum server on port 9050. Anything that speaks HTTP can capture screens.

POST /api/capture

Capture a screen region and get an AI-powered description or structured extraction.

curl -X POST http://localhost:9050/api/capture \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Describe what you see."}'

POST /api/locate

Full-screen capture to find a window or UI element by description. Returns pixel bounds.

curl -X POST http://localhost:9050/api/locate \
  -H "Content-Type: application/json" \
  -d '{"query": "the Chrome browser"}'

GET /api/status

Health check. Returns version, API key status, model, and current configuration.

curl http://localhost:9050/api/status
{"running":true,"version":"0.1.0","api_key_configured":true,"model":"claude-haiku-4-5-20251001"}

MCP Integration

Built for Claude Code

Register globally

# One-time setup claude mcp add --scope user condor-eye \ -- node ./mcp/index.js # Verify claude mcp list

Environment

Variable	Required	Default
ANTHROPIC_API_KEY	Yes	—
CLAUDE_MODEL	No	claude-haiku-4-5
CONDOR_EYE_PORT	No	9050
REDIS_URL	No	redis://127.0.0.1:6379

◼

condor_eye_capture

Capture a region + AI description. Supports prompts, regions, window targeting, and key combos.

⌖

condor_eye_locate

Full-screen scan to find a window or element. Returns title, handle, and pixel bounds.

☰

condor_eye_windows

List visible windows instantly. No API call, no cost — just Win32 enumeration.

⚡

condor_eye_status

Health check. Confirms the app is running and API key is configured.

Slash Command

Just type /screen

The repo ships with a Claude Code custom command. Clone it — and /screen works automatically.

/ /screen

Capture whatever's visible through the overlay frame. Returns an AI description + inline screenshot.

/ /screen firefox

Find Firefox by title, bring it to the foreground, capture its bounds, and describe the content.

/ /screen chrome tab 3

Focus Chrome, switch to tab 3 with Ctrl+3, then capture and analyze.

/ /screen full

Capture the entire screen at native resolution. Good for getting spatial context.

Located at .claude/commands/screen.md — automatically available when you clone the repo.

Quick Start

Up and running in 3 steps

Requires a Rust toolchain on Windows and an Anthropic API key.

Clone & Build

Clone the repo and build with Tauri CLI.

git clone https://github.com/CondorCommodore/condor-eye
cd condor-eye
cargo tauri dev

Configure

Create a .env file with your API key.

echo "ANTHROPIC_API_KEY=sk-ant-..." > .env

Register MCP

Give Claude Code access to your screen.

claude mcp add --scope user \
  condor-eye -- node ./mcp/index.js

CONDOREYE

"No Claude, look here — it's not working!"

Everything your agent needs to see

Screen Capture

AI Vision

HTTP API

MCP Server

Drawing Pen

Focus Box

"Look at this part"

How it works

Three endpoints. Zero dependencies.

Built for Claude Code

Register globally

Environment

Just type /screen

Up and running in 3 steps

Clone & Build

Configure

Register MCP

CONDOR
EYE