loading…
Search for a command to run...
loading…
📇 🏠 🍎 🪟 🐧 - Multimodal AI vision MCP server for image, video, and object detection analysis. Enables UI/UX evaluation, visual regression testing, and inter
📇 🏠 🍎 🪟 🐧 - Multimodal AI vision MCP server for image, video, and object detection analysis. Enables UI/UX evaluation, visual regression testing, and interface understanding using Google Gemini and Vertex AI.
A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.
You could choose either to use google provider or vertex_ai provider. For simplicity, google provider is recommended.
Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).
(i) Using Google AI Studio Provider
export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"
Get your Google AI Studio's api key here
(ii) Using Vertex AI Provider
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="[email protected]"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"
Refer to the guideline here on how to set this up.
Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.
Add to your Claude Desktop configuration:
(i) Using Google AI Studio Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}
(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "[email protected]",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}
(i) Using Google AI Studio Provider
claude mcp add ai-vision-mcp \
-e IMAGE_PROVIDER=google \
-e VIDEO_PROVIDER=google \
-e GEMINI_API_KEY=your-gemini-api-key \
-- npx ai-vision-mcp
(ii) Using Vertex AI Provider
claude mcp add ai-vision-mcp \
-e IMAGE_PROVIDER=vertex_ai \
-e VIDEO_PROVIDER=vertex_ai \
-e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \
-e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \
-e VERTEX_PROJECT_ID=your-gcp-project-id \
-e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
-- npx ai-vision-mcp
Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:
{
"env": {
"MCP_TIMEOUT": "60000",
"MCP_TOOL_TIMEOUT": "300000"
}
}
Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server
Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.
(i) Using Google AI Studio Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}
(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "[email protected]",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}
Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:
(i) Using Google AI Studio Provider
{
"mcpServers": {
"timeout": 300,
"type": "stdio",
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"VIDEO_PROVIDER": "google",
"GEMINI_API_KEY": "your-gemini-api-key"
}
}
}
}
(ii) Using Vertex AI Provider
{
"mcpServers": {
"ai-vision-mcp": {
"timeout": 300,
"type": "stdio",
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "vertex_ai",
"VIDEO_PROVIDER": "vertex_ai",
"VERTEX_CLIENT_EMAIL": "[email protected]",
"VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"VERTEX_PROJECT_ID": "your-gcp-project-id",
"GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
}
}
}
}
The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:
npx ai-vision-mcp
The server provides four main MCP tools:
analyze_imageAnalyzes an image using AI and returns a detailed description.
Parameters:
imageSource (string): URL, base64 data, or file path to the imageprompt (string): Question or instruction for the AImode (string, optional): Analysis mode - one of:general (default) - General image analysispalette - Extract design tokens (colors, spacing, typography)hierarchy - Analyze visual hierarchy and eye flowcomponents - Catalog UI components and design system maturityoptions (object, optional): Analysis options including temperature and max tokensExamples:
{
"imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
"prompt": "What is this image about? Describe what you see in detail."
}
{
"imageSource": "https://example.com/design.png",
"prompt": "Extract all design tokens from this screenshot",
"mode": "palette"
}
{
"imageSource": "C:\\Users\\username\\Downloads\\ui_mockup.png",
"prompt": "Analyze the visual hierarchy and eye flow",
"mode": "hierarchy"
}
{
"imageSource": "https://example.com/design-system.png",
"prompt": "List all UI components and evaluate design system maturity",
"mode": "components"
}
compare_imagesCompares multiple images using AI and returns a detailed comparison analysis.
Parameters:
imageSources (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 imagesprompt (string): Question or instruction for comparing the imagesoptions (object, optional): Analysis options including temperature and max tokensExamples:
{
"imageSources": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"prompt": "Compare these two images and tell me the differences"
}
{
"imageSources": [
"https://example.com/image1.jpg",
"C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg",
"data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
],
"prompt": "Which image has the best lighting quality?"
}
detect_objects_in_imageDetects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.
Parameters:
imageSource (string): URL, base64 data, or file path to the imageprompt (string): Custom detection prompt describing what to detect or recognize in the imageoutputFilePath (string, optional): Explicit output path for the annotated imageConfiguration:
This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:
# Recommended environment variable settings for object detection (these are now the defaults)
TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic responses
TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95 # Nucleus sampling
TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30 # Vocabulary selection
MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit for JSON
File Handling Logic:
Response Types:
file object when explicit outputFilePath is providedtempFile object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folderdetections array with detected objects and coordinatessummary with percentage-based coordinates for browser automationExamples:
{
"imageSource": "https://example.com/image.jpg",
"prompt": "Detect all objects in this image"
}
{
"imageSource": "C:\\Users\\username\\Downloads\\image.jpg",
"outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png"
}
{
"imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...",
"prompt": "Detect and label all electronic devices in this image"
}
audit_designAudits UI/UX design compliance with pixel-level analysis and AI critique.
This tool provides automated design compliance auditing using pure TypeScript/JavaScript pixel analysis combined with Gemini Vision API critique. It extracts dominant colors, detects visual complexity, validates WCAG contrast ratios, and generates actionable design recommendations.
Inspired by: Automating UX/UI Design Analysis with Python, Machine Learning, and LLMs by Jade Graham
Parameters:
imageSource (string): URL, base64 data, or file path to the design imageprompt (string, optional): Custom audit context or focus areasoptions (object, optional): Analysis options including temperature and max tokensFeatures:
Examples:
{
"imageSource": "https://example.com/design.png",
"prompt": "Audit this design for accessibility and visual hierarchy"
}
{
"imageSource": "C:\\Users\\username\\Downloads\\ui_design.png",
"prompt": "Check WCAG AA compliance"
}
analyze_videoAnalyzes a video using AI and returns a detailed description.
Parameters:
videoSource (string): YouTube URL, GCS URI, or local file path to the videoprompt (string): Question or instruction for the AIoptions (object, optional): Analysis options including temperature and max tokensSupported video sources:
https://www.youtube.com/watch?v=...)C:\Users\username\Downloads\video.mp4)Examples:
{
"videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg",
"prompt": "What is this video about? Describe what you see in detail."
}
{
"videoSource": "C:\\Users\\username\\Downloads\\video.mp4",
"prompt": "What is this video about? Describe what you see in detail."
}
Note: Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.
For basic setup, you only need to configure the provider selection and required credentials:
export IMAGE_PROVIDER="google"
export VIDEO_PROVIDER="google"
export GEMINI_API_KEY="your-gemini-api-key"
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="[email protected]"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"
For comprehensive environment variable documentation, including:
👉 See Environment Variable Guide
The server uses a hierarchical configuration system where more specific settings override general ones:
TEMPERATURE_FOR_ANALYZE_IMAGE, etc.)TEMPERATURE_FOR_IMAGE, etc.)TEMPERATURE, etc.)Basic Optimization:
# General settings
export TEMPERATURE=0.7
export MAX_TOKENS=1500
# Task-specific optimization
export TEMPERATURE_FOR_IMAGE=0.2 # More precise for images
export TEMPERATURE_FOR_VIDEO=0.5 # More creative for videos
Function-specific Optimization:
# Optimize individual functions
export TEMPERATURE_FOR_ANALYZE_IMAGE=0.1
export TEMPERATURE_FOR_COMPARE_IMAGES=0.3
export TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic
export MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit
Model Selection:
# Choose models per function
export ANALYZE_IMAGE_MODEL="gemini-2.5-flash-lite"
export COMPARE_IMAGES_MODEL="gemini-2.5-flash"
export ANALYZE_VIDEO_MODEL="gemini-2.5-flash-pro"
If you see errors like:
tools/call failed: Transport closedCommon causes:
A) Image annotation dependency failed to load
This server uses imagescript for image annotation/dimension extraction.
Verify it loads:
npm run doctor
# or
npm run check:imagescript
B) stdout logs corrupt stdio MCP framing
This server uses the MCP stdio transport (newline-delimited JSON-RPC over stdout).
console.error)console.log in stdio MCP serversIf stdout is polluted, clients (Codex/Claude Code) may disconnect and report Transport closed.
# Clone the repository
git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git
cd ai-vision-mcp
# Install dependencies
npm install
# Build the project
npm run build
# Start development server
npm run dev
npm run build - Build the TypeScript projectnpm run dev - Start development server with watch modenpm run lint - Run ESLintnpm run format - Format code with Prettiernpm start - Start the built serverThe project follows a modular architecture:
src/
├── providers/ # AI provider implementations
│ ├── gemini/ # Google Gemini provider
│ ├── vertexai/ # Vertex AI provider
│ └── factory/ # Provider factory
├── services/ # Core services
│ ├── ConfigService.ts
│ └── FileService.ts
├── storage/ # Storage implementations
├── file-upload/ # File upload strategies
├── types/ # TypeScript type definitions
├── utils/ # Utility functions
└── server.ts # Main MCP server
The server includes comprehensive error handling:
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the MIT License - see the LICENSE file for details.
audit_design toolДобавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"tan-yong-sheng-ai-vision-mcp": {
"command": "npx",
"args": []
}
}
}Extract design specs and assets
An Open-Sourced UI to install and manage MCP servers for Windows, Linux and macOS.
Build, validate, and deploy multi-agent AI solutions on the ADAS platform. Design skills with tools, manage solution lifecycle, and connect from any AI environm
MCP Bundles: Create custom bundles of tools and connect providers with OAuth or API keys. Use one MCP server across thousands of integrations, with programmatic