Open Source · MIT License

CONDOR
EYE

Your AI agent can read code, run tests, edit files — but it can't see your screen. Now it can.

Get Started View on GitHub
Tauri 2 Rust WebView2 Anthropic API MCP

"No Claude, look here — it's not working!"

You're staring at a broken UI. Claude is staring at your code. You describe the bug. Claude guesses. You paste a screenshot into the chat. Claude says "I can't view images in this interface."

The problem is simple: your AI agent is blind. It can read your code, run your tests, and edit your files — but it can't see what you see on screen.

Condor Eye fixes that. One overlay, one command, and your agent can see exactly what's in front of you.

claude-code
you the chart is rendering wrong claude I'll check the render logic... claude The code looks correct to me. you no, LOOK at it. the bars are clipped ───────── before condor eye ─────────   you /screen chrome claude I can see it — the volume bars are         clipped at the bottom. The SVG         viewBox height is 200 but the         bars extend to y=235. Fixing... ────────── after condor eye ──────────
Condor Eye _ x
FOCUS
breakout?
What to look for...
[ ]
Capture
API ✓ Captured in 1.2s · $0.003 Ctrl+Shift+C

Transparent overlay with capture frame, focus box, and pen annotations

Everything your agent needs to see

A lightweight overlay that turns any screen into a data source for AI agents.

Screen Capture

Capture any region via the transparent overlay frame. Win32 GDI for native speed. DPI-aware with automatic scaling.

AI Vision

Screenshots sent to Claude for visual analysis. Get structured JSON extractions or natural language descriptions.

HTTP API

Three endpoints on port 9050. Any tool that speaks HTTP can capture screens, locate windows, and check status.

MCP Server

Four Claude Code tools via Model Context Protocol. Your agent can see the screen without leaving the terminal.

Drawing Pen

Freehand annotations with pressure simulation. Circle areas, draw arrows, write notes — included in screenshots.

Focus Box

Draggable highlight region that tells the AI exactly where to look. Resize and position to isolate any UI element.

"Look at this part"

Without focus, the AI describes everything on screen. With the focus box, it zeroes in on exactly what you're pointing at.

Without focus box
AAPL  189.42  +1.23
MSFT  412.87  +0.54
NVDA  891.03  -12.40
TSLA  178.55  +3.21
AMZN  182.10  -0.89
META  501.22  +2.15
Vol: 2.1M
P/E: 28.4
Mkt: 2.9T
52w H: 199
52w L: 124
AI Response

I see a stock watchlist showing 6 tickers with prices and daily changes. There's a sidebar with market statistics. AAPL is at 189.42 up 1.23, MSFT at 412.87...

▣ With focus box
AAPL  189.42  +1.23
MSFT  412.87  +0.54
NVDA  891.03  -12.40
TSLA  178.55  +3.21
AMZN  182.10  -0.89
META  501.22  +2.15
Vol: 2.1M
P/E: 28.4
Mkt: 2.9T
52w H: 199
52w L: 124
FOCUS
AI Response

NVDA is down $12.40 (−1.37%) at 891.03 — the only significant red in the list. This is a sharp single-session drop. Previous close was ~903.43.

How it works

Rust backend with an axum HTTP server, WebView2 frontend, and a Node.js MCP bridge.

architecture.rs
┌─────────────────────────────────────────────────────────────┐
  WebView2 Frontend  (src/)                                  
                                                             
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐  
   Capture    Focus      Draw       Results     
   Frame      Box        Canvas     Panel       
  └──────────┘ └──────────┘ └──────────┘ └───────────┘  
├─────────────────────────────────────────────────────────────┤
  Rust Backend  (src-tauri/src/)                              
                                                             
  capture.rs  ── Win32 GDI screen capture                   
  claude.rs   ── Anthropic Vision API                       
  http_api.rs ── axum server (:9050)                        
  compare.rs  ── diff engine                                
  truth.rs    ── Redis ground truth                        
  windows.rs  ── Win32 window enumeration                  
├─────────────────────────────────────────────────────────────┤
  MCP Server  (mcp/)                                        
                                                             
  Node.js stdio transport ── wraps HTTP API as               
  Claude Code tools for terminal integration                 
└─────────────────────────────────────────────────────────────┘

Three endpoints. Zero dependencies.

An axum server on port 9050. Anything that speaks HTTP can capture screens.

POST /api/capture
Capture a screen region and get an AI-powered description or structured extraction.
curl -X POST http://localhost:9050/api/capture \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Describe what you see."}'
POST /api/locate
Full-screen capture to find a window or UI element by description. Returns pixel bounds.
curl -X POST http://localhost:9050/api/locate \
  -H "Content-Type: application/json" \
  -d '{"query": "the Chrome browser"}'
GET /api/status
Health check. Returns version, API key status, model, and current configuration.
curl http://localhost:9050/api/status
{"running":true,"version":"0.1.0","api_key_configured":true,"model":"claude-haiku-4-5-20251001"}

Built for Claude Code

Register once. Your agent gains four tools for screen interaction — without leaving the terminal.

Register globally

# One-time setup claude mcp add --scope user condor-eye \ -- node ./mcp/index.js # Verify claude mcp list

Environment

VariableRequiredDefault
ANTHROPIC_API_KEYYes
CLAUDE_MODELNoclaude-haiku-4-5
CONDOR_EYE_PORTNo9050
REDIS_URLNoredis://127.0.0.1:6379
condor_eye_capture
Capture a region + AI description. Supports prompts, regions, window targeting, and key combos.
condor_eye_locate
Full-screen scan to find a window or element. Returns title, handle, and pixel bounds.
condor_eye_windows
List visible windows instantly. No API call, no cost — just Win32 enumeration.
condor_eye_status
Health check. Confirms the app is running and API key is configured.

Just type /screen

The repo ships with a Claude Code custom command. Clone it — and /screen works automatically.

/ /screen
Capture whatever's visible through the overlay frame. Returns an AI description + inline screenshot.
/ /screen firefox
Find Firefox by title, bring it to the foreground, capture its bounds, and describe the content.
/ /screen chrome tab 3
Focus Chrome, switch to tab 3 with Ctrl+3, then capture and analyze.
/ /screen full
Capture the entire screen at native resolution. Good for getting spatial context.

Located at .claude/commands/screen.md — automatically available when you clone the repo.

Up and running in 3 steps

Requires a Rust toolchain on Windows and an Anthropic API key.

Clone & Build

Clone the repo and build with Tauri CLI.

git clone https://github.com/CondorCommodore/condor-eye cd condor-eye cargo tauri dev

Configure

Create a .env file with your API key.

echo "ANTHROPIC_API_KEY=sk-ant-..." > .env

Register MCP

Give Claude Code access to your screen.

claude mcp add --scope user \ condor-eye -- node ./mcp/index.js

Uses claude-haiku-4-5 by default — the fastest, cheapest Claude model.
A typical capture costs ~$0.003. That's roughly 300 captures per dollar.