loading…
Search for a command to run...
loading…
An MCP server that gives AI agents the ability to listen to and understand music/audio files, enabling semantic analysis, stem separation, lyrics transcription,
An MCP server that gives AI agents the ability to listen to and understand music/audio files, enabling semantic analysis, stem separation, lyrics transcription, and signal processing via tool calls.
Give your AI agent the ability to listen to and understand music/audio files — works with ANY MCP client.
Audio intelligence MCP server. Semantic understanding, stem separation, lyrics transcription, signal analysis — all as tool calls.
# Your agent asks:
"Analyze this track — what's the genre, tempo, key, and mood?"
"Separate the vocals from the instrumental"
"Transcribe the lyrics from this song"
"Generate a trap beat at 140 BPM"
Claud-Ear connects your AI agent (Hermes Agent, Claude Code, Codex CLI, etc.) to a full audio intelligence pipeline. Drop in an MP3, WAV, FLAC, OGG, M4A, or OPUS file and your agent can analyze, separate, transcribe, and understand it.
I have a music library of ~5,000 tracks. I wanted my agent to understand them like I do — not just "this is a 3-minute MP3" but "this is a melancholic D minor indie rock track with a prominent bass line and lyrics about loss."
Existing tools were either:
Claud-Ear is an MCP server because my agent should be able to say "analyze this track" and get back structured data — genre, tempo, key, stems, lyrics, mood — as a tool call result. Not me manually running 4 different CLI tools and copy-pasting the output.
The autonomous agent mode exists because I don't want to manually trigger analysis on 5,000 tracks. It should run overnight, unsupervised, and finish the job.
| Capability | Model/Tool | What It Gives You |
|---|---|---|
| 🔍 Semantic understanding | CLAP (LAION/CLAP Music & Speech) | Genre, mood, instruments, era classification |
| 🎛️ Source separation | Demucs HT | Isolate vocals, drums, bass, other as separate files |
| 📝 Lyrics transcription | Whisper large-v3 | Transcribe lyrics from isolated vocals |
| 📊 Signal analysis | librosa | Tempo, key, chords, structure, rhythm |
| ⬇️ Audio downloading | yt-dlp | Download from YouTube, Spotify, etc. |
| 🏥 Audio surgery | sonic_surgery | EQ, stem manipulation, dynamics processing |
| 🎹 Beat production | beat_studio + MIDI | Generate beats, chord progressions, melodies |
Default LLM backend: Ollama (configurable to any OpenAI-compatible API).
These are the battles I'm actively fighting:
server.py is 104K lines — This started as a clean MCP server and became a monolith. CLAP loading, Demucs inference, Whisper transcription, librosa analysis, caching, disk eviction, schema versioning — all in one file. It needs to be split into modules but I keep adding features instead of refactoring.
8GB VRAM means one model at a time — CLAP, Demucs, and Whisper all want GPU. I can't run them simultaneously. The "deep_listen" tool has to load/unload models in sequence, which turns a 2-minute analysis into a 10-minute analysis. I have a GPU lock system but it's a hack.
Cache invalidation is hard — I built LRU memory + disk cache with schema versioning. When I change the output format, old cache entries auto-invalidate. But the cache key logic is fragile — same file, same analysis, different day = cache miss because the schema version bumped. I'm over-engineering caching.
yt-dlp breaks monthly — YouTube changes their frontend, yt-dlp needs an update, and the search_and_download tool stops working until I manually update. This is not the tool's fault but it's a maintenance burden I didn't anticipate.
15-minute max duration is arbitrary — Set to 900 seconds because longer tracks OOM on 8GB VRAM. A 20-minute ambient piece or live set gets truncated. The limit should be dynamic based on available memory, not hardcoded.
Autonomous agent gets stuck — The batch analysis agent runs overnight but sometimes hangs on one track (corrupted file, unsupported codec, Demucs crash). There's no timeout per-track, so one bad file blocks the whole queue. I need per-track error isolation.
Billboard/Spotify integrations are brittle — charts.py and discovery.py depend on third-party APIs with rate limits and breaking changes. The Billboard scraper broke twice in 3 months. These are nice-to-have features that cost more maintenance than value.
server.py into modules — one file per capability (clap.py, demucs.py, whisper.py, librosa.py, cache.py)# Clone
git clone https://github.com/Null-Phnix/claud-ear.git
cd claud-ear
# Install with uv
uv sync
# Test the LLM backend
uv run python llm_backend.py
# Run the MCP server
uv run claud-ear
By default, Claud-Ear connects to Ollama at http://localhost:11434 using llama3.1:8b. To customize:
export AUDIO_LLM_MODEL=llama3.1:8b # model name
export AUDIO_LLM_HOST=http://localhost:11434 # API endpoint
export AUDIO_LLM_PROVIDER=ollama # or "openai" for OpenAI-compatible APIs
For OpenAI-compatible providers (vLLM, TGI, LiteLLM, etc.):
export AUDIO_LLM_PROVIDER=openai
export AUDIO_LLM_HOST=http://localhost:8000
export AUDIO_LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
Hermes Agent (or any MCP client) — add to your MCP config:
{
"mcpServers": {
"claud-ear": {
"command": "uv",
"args": ["run", "claud-ear"]
}
}
}
Or for Claude Code:
claude mcp add claud-ear -- uv run claud-ear
deep_listen(file_path)Full analysis pipeline — semantic understanding, source separation, transcription, and signal analysis all in one call. This is the main tool.
analyze_audio(file_path)Quick analysis — genre, mood, instruments, tempo, key. Lighter than deep_listen.
separate_stems(file_path)Isolate vocals, drums, bass, and other stems from a track as separate audio files.
transcribe_lyrics(file_path)Extract and transcribe lyrics from vocals.
search_and_download(query)Search for and download audio from YouTube and other platforms via yt-dlp.
sonic_surgery(file_path, operation, **params)EQ adjustments, stem manipulation, dynamics processing.
generate_beat(genre, bpm, bars)Generate a beat with chord progressions, melodies, and drum patterns as MIDI.
claud-ear/
├── server.py # MCP server (FastMCP) — main entry point
├── llm_backend.py # Configurable LLM API client (Ollama/OpenAI)
├── agent.py # Autonomous batch analysis agent
├── beat_studio.py # Beat production engine
├── quality.py # Audio quality assessment
├── discovery.py # Music discovery tools
├── song_db.py # Track metadata & lyrics database
├── sonic_surgery.py # Audio repair & enhancement
├── extractor.py # Feature extraction pipeline
├── download_playlists.py # Bulk downloader
├── analyze_bass.py # Bass frequency analysis
├── analyze_bitter.py # Mood/valence classifier
├── charts.py # Billboard chart integration
├── power.py # Energy/sleep scheduling
├── dashboard.py # Web dashboard
├── query.py # Natural language music search
├── start_agent.sh # Start autonomous agent
├── stop_agent.sh # Stop autonomous agent
├── pause_at_130.sh # Pause agent during peak hours
└── docs/ # Design docs & implementation plans
Run the autonomous music intelligence agent to batch-analyze your library:
# Analyze one song (test mode)
uv run python agent.py --one
# Run in continuous loop
./start_agent.sh
# Stop
./stop_agent.sh
The agent scans ~/Documents/music/music data/, finds pending tracks, analyzes them using the configured LLM backend, and writes full analysis documents to ~/Documents/music/analyses/.
MIT — use it, fork it, vibe with it.
Run in your terminal:
claude mcp add claud-ear -- npx Security
Low riskAutomated heuristic from public metadata — not a security guarantee.