loading…
Search for a command to run...
loading…
Intelligent CLI tool with AI-powered model selection that analyzes your hardware and recommends optimal LLM models for your system
Intelligent CLI tool with AI-powered model selection that analyzes your hardware and recommends optimal LLM models for your system

Intelligent Ollama Model Selector
AI-powered CLI that analyzes your hardware and recommends optimal LLM models.
Deterministic scoring across 200+ Ollama models and 7k+ variants with a packaged SQLite catalog, live sync, and hardware-calibrated memory estimation.
npm version npm downloads License Discord Node.js
Start Here • Installation • Quick Start • Calibration Quick Start • Docs • Claude MCP • Commands • Scoring • Hardware • Discord
Choosing the right LLM for your hardware is complex. With thousands of model variants, quantization levels, and hardware configurations, finding the optimal model requires understanding memory bandwidth, VRAM limits, and performance characteristics.
LLM Checker solves this. It analyzes your system, scores every compatible model across four dimensions (Quality, Speed, Fit, Context), and delivers actionable recommendations in seconds.
| Feature | Description | |
|---|---|---|
| 200+ | Packaged Model Catalog | Ships with a synced Ollama SQLite catalog and can refresh from Ollama on demand |
| 4D | Scoring Engine | Quality, Speed, Fit, Context — weighted by use case |
| Multi-GPU | Hardware Detection | Apple Silicon, NVIDIA CUDA, AMD ROCm, Intel Arc, CPU, integrated/dedicated inventory visibility |
| Calibrated | Memory Estimation | Bytes-per-parameter formula validated against real Ollama sizes |
| Zero | Native Dependencies | Pure JavaScript — works on any Node.js 16+ system |
| Live | AI Run Metrics | ai-run shows response speed in tokens/sec next to model output |
llmfit)LLM Checker and llmfit solve related but different problems:
| Tool | Primary Focus | Typical Output |
|---|---|---|
| LLM Checker | Hardware-aware model selection for local inference | Ranked recommendations, compatibility scores, pull/run commands |
| llmfit | LLM workflow support and model-fit evaluation from another angle | Different optimization workflow and selection heuristics |
If your goal is: "What should I run on this exact machine right now?", use LLM Checker first.
If your goal is broader experimentation across custom pipelines, using both tools can be complementary.
# Install globally
npm install -g llm-checker
# Or run directly with npx
npx llm-checker hw-detect
Termux (Android):
pkg update
pkg install ollama
npm install -g llm-checker
Requirements:
The package includes a prebuilt model catalog and declares sql.js as an optional dependency for SQLite-powered commands. If your package manager skips optional dependencies and database commands report sql.js missing, reinstall with optional dependencies enabled:
npm install -g llm-checker --include=optional
If you are new, use this exact flow:
# 1) Install
npm install -g llm-checker
# 2) Detect your hardware
llm-checker hw-detect
# 3) Get recommendations by category
llm-checker recommend --category coding
# 4) Refresh the catalog when you want current Ollama references
llm-checker sync
# 5) Run with auto-selection and tokens/sec metrics
llm-checker ai-run --category coding --prompt "Write a hello world in Python"
If you already calibrated routing:
llm-checker ai-run --calibrated --category coding --prompt "Refactor this function"
LLM Checker is published in all primary channels:
If you need the newest release, install from npm (llm-checker), not the scoped GitHub Packages mirror.
If you installed @pavelevich/llm-checker and version looks old:
npm uninstall -g @pavelevich/llm-checker
npm install -g llm-checker@latest
hash -r
llm-checker --version
last_updated metadatasync refreshes the local SQLite catalog from Ollama; recommend, list-models, ai-run, and ai-check now prefer that synced catalog instead of stale scraper cache data.pulls: 0 for the full catalog after sync335m style tags are treated as millions, not billionslatest, small, medium, and large are not guessed into fake parameter countsai-run streams model responses through Ollama and appends measured tokens/sec so users can compare installed models by real local speed.recommend and ai-run:--calibrated [file] support with default discovery path.--policy > --calibrated > deterministic fallback.calibrate --policy-out ... → recommend --calibrated ...docs/ with clearer onboarding paths.Use this only if you explicitly need GitHub Packages. It may not match npm latest.
# 1) Configure registry + token (PAT with read:packages)
echo "@pavelevich:registry=https://npm.pkg.github.com" >> ~/.npmrc
echo "//npm.pkg.github.com/:_authToken=${GITHUB_TOKEN}" >> ~/.npmrc
# 2) Install
npm install -g @pavelevich/llm-checker@latest
# 1. Detect your hardware capabilities
llm-checker hw-detect
# 2. Get full analysis with compatible models
llm-checker check
# 3. Get intelligent recommendations by category
llm-checker recommend
# 4. Refresh the catalog when you want current Ollama metadata
llm-checker sync
llm-checker search qwen --use-case coding
This path produces both calibration artifacts and verifies calibrated routing in one pass.
cp ./docs/fixtures/calibration/sample-suite.jsonl ./sample-suite.jsonl
mkdir -p ./artifacts
llm-checker calibrate \
--suite ./sample-suite.jsonl \
--models qwen2.5-coder:7b llama3.2:3b \
--runtime ollama \
--objective balanced \
--dry-run \
--output ./artifacts/calibration-result.json \
--policy-out ./artifacts/calibration-policy.yaml
Artifacts created:
./artifacts/calibration-result.json (calibration contract)./artifacts/calibration-policy.yaml (routing policy for runtime commands)llm-checker recommend --calibrated ./artifacts/calibration-policy.yaml --category coding
llm-checker ai-run --calibrated ./artifacts/calibration-policy.yaml --category coding --prompt "Refactor this function"
Notes:
--policy <file> has precedence over --calibrated [file].--calibrated has no path, discovery uses ~/.llm-checker/calibration-policy.{yaml,yml,json}.--mode full currently requires --runtime ollama../docs/fixtures/calibration/sample-generated-policy.yaml shows the expected policy structure.LLM Checker includes a built-in Model Context Protocol (MCP) server, allowing Claude Code and other MCP-compatible AI assistants to analyze your hardware and manage local models directly.
# Install globally first
npm install -g llm-checker
# Add to Claude Code
claude mcp add llm-checker -- llm-checker-mcp
Or generate the exact command directly from the CLI:
llm-checker mcp-setup
Or with npx (no global install needed):
claude mcp add llm-checker -- npx llm-checker-mcp
Restart Claude Code and you're done.
Once connected, Claude can use these tools:
Core Analysis:
| Tool | Description |
|---|---|
hw_detect |
Detect your hardware (CPU, GPU, RAM, acceleration backend) |
check |
Full compatibility analysis with all models ranked by score |
recommend |
Top model picks by category (coding, reasoning, multimodal, etc.) |
installed |
Rank your already-downloaded Ollama models |
search |
Search the Ollama model catalog with filters |
smart_recommend |
Advanced recommendations using the full scoring engine |
ollama_plan |
Build a capacity plan for local models with recommended context/parallel/memory settings |
ollama_plan_env |
Return ready-to-paste export ... env vars from the recommended or fallback plan profile |
policy_validate |
Validate a policy file against the v1 schema and return structured validation output |
audit_export |
Run policy compliance export (json/csv/sarif/all) for check or recommend flows |
calibrate |
Generate calibration artifacts from a prompt suite with typed MCP inputs |
Ollama Management:
| Tool | Description |
|---|---|
ollama_list |
List all downloaded models with params, quant, family, and size |
ollama_pull |
Download a model from the Ollama registry |
ollama_run |
Run a prompt against a local model (with tok/s metrics) |
ollama_remove |
Delete a model to free disk space |
Advanced (MCP-exclusive):
| Tool | Description |
|---|---|
ollama_optimize |
Generate optimal Ollama env vars for your hardware (NUM_GPU, PARALLEL, FLASH_ATTENTION, etc.) |
benchmark |
Benchmark a model with 3 standardized prompts — measures tok/s, load time, prompt eval |
compare_models |
Head-to-head comparison of two models on the same prompt with speed + response side-by-side |
cleanup_models |
Analyze installed models — find redundancies, cloud-only models, oversized models, and upgrade candidates |
project_recommend |
Scan a project directory (languages, frameworks, size) and recommend the best model for that codebase |
ollama_monitor |
Real-time system status: RAM usage, loaded models, memory headroom analysis |
cli_help |
List all allowlisted CLI commands exposed through MCP |
cli_exec |
Execute any allowlisted llm-checker CLI command with custom args (policy/audit/calibrate/sync/ai-run/etc.) |
After setup, you can ask Claude things like:
Claude will automatically call the right tools and give you actionable results.
Running llm-checker with no arguments now opens an interactive panel (TTY terminals):
/ to open all commandsEnter to execute--json --limit 5)For scripting and automation, direct command invocation remains unchanged:
llm-checker check --use-case coding --limit 3
llm-checker search "qwen coder" --json
| Command | Description |
|---|---|
hw-detect |
Detect GPU/CPU capabilities, memory, backends |
check |
Full system analysis with compatible models and recommendations |
recommend |
Intelligent recommendations by category (coding, reasoning, multimodal, etc.) |
calibrate |
Generate calibration result + routing policy artifacts from a JSONL prompt suite |
installed |
Rank your installed Ollama models by compatibility |
list-models |
List the synced Ollama catalog by popularity, category, size, or JSON output |
ollama-plan |
Compute safe Ollama runtime env vars (NUM_CTX, NUM_PARALLEL, MAX_LOADED_MODELS) for selected local models |
mcp-setup |
Print/apply Claude MCP setup command and config snippet (--apply, --json, --npx) |
gpu-plan |
Multi-GPU placement advisor with single/pooled model-size envelopes |
verify-context |
Verify practical context-window limits for a local model |
amd-guard |
AMD/Windows reliability guard with mitigation hints |
toolcheck |
Test tool-calling compatibility for local models |
| Command | Description |
|---|---|
sync |
Refresh the local SQLite model catalog from Ollama |
search <query> |
Search the synced catalog with filters and intelligent scoring |
smart-recommend |
Advanced recommendations using the full scoring engine |
| Command | Description |
|---|---|
policy init |
Generate a policy.yaml template for enterprise governance |
policy validate |
Validate a policy file and return non-zero on schema errors |
audit export |
Evaluate policy outcomes and export compliance reports (json, csv, sarif) |
check and recommendBoth check and recommend support --policy <file>.
audit mode, policy violations are reported but the command exits with 0.enforce mode, blocking violations return non-zero (default 1).enforcement.exit_code in policy.yaml.Examples:
llm-checker check --policy ./policy.yaml
llm-checker check --policy ./policy.yaml --use-case coding --runtime vllm
llm-checker recommend --policy ./policy.yaml --category coding
recommend and ai-runrecommend and ai-run now support calibration routing policies generated by calibrate --policy-out.
--calibrated [file]:file is omitted, discovery defaults to ~/.llm-checker/calibration-policy.{yaml,yml,json}.--policy <file> takes precedence over --calibrated for routing resolution.--policy (explicit)--calibrated (explicit file or default discovery)--policy, --calibrated, or default discovery) and the selected route/model.Examples:
llm-checker recommend --calibrated --category coding
llm-checker recommend --calibrated ./calibration-policy.yaml --category reasoning
llm-checker ai-run --calibrated --category coding --prompt "Refactor this function"
llm-checker ai-run --policy ./calibration-policy.yaml --prompt "Summarize this report"
Use audit export when you need machine-readable compliance evidence for CI/CD gates, governance reviews, or security tooling.
# Single report format
llm-checker audit export --policy ./policy.yaml --command check --format json --out ./reports/check-policy.json
# Export all configured formats (json, csv, sarif)
llm-checker audit export --policy ./policy.yaml --command check --format all --out-dir ./reports
--command check|recommend chooses the candidate source.--format all honors reporting.formats in your policy (falls back to json,csv,sarif).enforce mode with blocking violations, reports are still written before non-zero exit.# CI artifact (JSON) for post-processing in pipeline jobs
llm-checker audit export --policy ./policy.yaml --command check --format json --out ./reports/policy-report.json
# Flat CSV for SIEM ingestion (Splunk/ELK/DataDog pipelines)
llm-checker audit export --policy ./policy.yaml --command check --format csv --out ./reports/policy-report.csv
# SARIF for security/code-scanning tooling integrations
llm-checker audit export --policy ./policy.yaml --command check --format sarif --out ./reports/policy-report.sarif
name: Policy Gate
on: [pull_request]
jobs:
policy-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- run: npm ci
- run: node bin/enhanced_cli.js check --policy ./policy.yaml --runtime ollama --no-verbose
- if: always()
run: node bin/enhanced_cli.js audit export --policy ./policy.yaml --command check --format all --runtime ollama --no-verbose --out-dir ./policy-reports
- if: always()
uses: actions/upload-artifact@v4
with:
name: policy-audit-reports
path: ./policy-reports
check, recommend, and audit export outputs include normalized model provenance fields:
sourceregistryversionlicensedigestIf a field is unavailable from model metadata, outputs use "unknown" instead of omitting the field. This keeps downstream parsers deterministic.
License values are canonicalized for policy checks (for example MIT License -> mit, Apache 2.0 -> apache-2.0).
| Command | Description |
|---|---|
ai-check |
AI-powered model evaluation with meta-analysis |
ai-run |
AI-powered model selection and execution with live tokens/sec output |
ai-run — Auto-Select and Runllm-checker ai-run --category coding --prompt "Write a file parser in Node.js"
llm-checker ai-run --benchmark --category general
llm-checker ai-run --reference-only --category reasoning
ai-run chooses the best installed model for the requested category, falls back to the best local alternative when the top catalog pick is not installed, and streams through Ollama directly.
When a response completes, the CLI appends measured local speed:
>>> hi
Hello! How can I help you today?
[42.8 tokens/sec]
Use --reference-only when you only want the recommendation card and pull command without starting a chat. Use --benchmark for a quick measured speed check on the selected local model.
hw-detect — Hardware Analysisllm-checker hw-detect
Summary:
Apple M4 Pro (24GB Unified Memory)
Tier: MEDIUM HIGH
Max model size: 15GB
Best backend: metal
CPU:
Apple M4 Pro
Cores: 12 (12 physical)
SIMD: NEON
Metal:
GPU Cores: 16
Unified Memory: 24GB
Memory Bandwidth: 273GB/s
On hybrid or integrated-only systems, hw-detect now also surfaces GPU topology explicitly:
Dedicated GPUs: NVIDIA GeForce RTX 4060
Integrated GPUs: Intel Iris Xe Graphics
Assist path: Integrated/shared-memory GPU detected, runtime remains CPU
This makes integrated GPUs visible even when the selected runtime backend is still CPU.
recommend — Category Recommendationsllm-checker recommend
Use recommend as the canonical model-picking command. Other recommendation-like
commands have narrower roles:
| Command | Role | Expected to match recommend? |
|---|---|---|
recommend |
Canonical deterministic model recommendations by category | Yes, this is the reference output |
check |
Hardware compatibility report with a compatibility-oriented recommendation card | Not exactly; it prioritizes fit/reporting context |
smart-recommend |
Experimental alternate scoring engine used while scoring ideas are evaluated | No; it may differ until it is unified or retired |
Use optimization profiles to steer ranking by intent:
llm-checker recommend --optimize balanced
llm-checker recommend --optimize speed
llm-checker recommend --optimize quality
llm-checker recommend --optimize context
llm-checker recommend --optimize coding
INTELLIGENT RECOMMENDATIONS BY CATEGORY
Hardware Tier: HIGH | Models Analyzed: 205
Coding:
qwen2.5-coder:14b (14B)
Score: 78/100
Fine-tuning: LoRA+QLoRA
Command: ollama pull qwen2.5-coder:14b
Reasoning:
deepseek-r1:14b (14B)
Score: 86/100
Fine-tuning: QLoRA
Command: ollama pull deepseek-r1:14b
Multimodal:
llama3.2-vision:11b (11B)
Score: 83/100
Fine-tuning: LoRA+QLoRA
Command: ollama pull llama3.2-vision:11b
check, recommend, and ai-check include a fine-tuning suitability label in output to help choose between Full FT, LoRA, and QLoRA paths.
search — Model Searchllm-checker search llama -l 5
llm-checker search coding --use-case coding
llm-checker search qwen --quant Q4_K_M --max-size 8
| Option | Description |
|---|---|
-l, --limit <n> |
Number of results (default: 10) |
-u, --use-case <type> |
Optimize for: general, coding, chat, reasoning, creative, fast |
--max-size <gb> |
Maximum model size in GB |
--quant <type> |
Filter by quantization: Q4_K_M, Q8_0, FP16, etc. |
--family <name> |
Filter by model family |
LLM Checker ships with a pre-synced SQLite snapshot of the Ollama catalog. On first run, that snapshot is copied to ~/.llm-checker/models.db, so recommendations and catalog search work immediately after npm install.
The packaged snapshot currently includes:
Refresh it any time:
llm-checker sync
For release maintainers, the packaged seed can be regenerated from the synced local DB:
npm run sync:seed
recommend, list-models, ai-run, and ai-check prefer the synced SQLite catalog. If the SQLite catalog is unavailable, LLM Checker falls back to the scraped cache and then to the curated catalog.
The curated fallback catalog includes 35+ models from the most popular Ollama families:
| Family | Models | Best For |
|---|---|---|
| Qwen 2.5/3 | 7B, 14B, Coder 7B/14B/32B, VL 3B/7B | Coding, general, vision |
| Llama 3.x | 1B, 3B, 8B, Vision 11B | General, chat, multimodal |
| DeepSeek | R1 8B/14B/32B, Coder V2 16B | Reasoning, coding |
| Phi-4 | 14B | Reasoning, math |
| Gemma 2 | 2B, 9B | General, efficient |
| Mistral | 7B, Nemo 12B | Creative, chat |
| CodeLlama | 7B, 13B | Coding |
| LLaVA | 7B, 13B | Vision |
| Embeddings | nomic-embed-text, mxbai-embed-large, bge-m3, all-minilm | RAG, search |
All available models are automatically combined with locally installed Ollama models for scoring. Ambiguous tags such as latest, cloud-only variants, and aliases without reliable size metadata are kept out of local recommendations unless they can be resolved to concrete parameters or artifact sizes.
Models are evaluated across four dimensions, weighted by use case:
| Dimension | Description |
|---|---|
| Q Quality | Model family reputation + parameter count + quantization penalty |
| S Speed | Estimated tokens/sec based on hardware backend and model size |
| F Fit | Memory utilization efficiency (how well it fits in available RAM) |
| C Context | Context window capability vs. target context length |
Three scoring systems are available, each optimized for different workflows:
Deterministic Selector (primary — used by check and recommend):
| Category | Quality | Speed | Fit | Context |
|---|---|---|---|---|
general |
45% | 35% | 15% | 5% |
coding |
55% | 20% | 15% | 10% |
reasoning |
60% | 10% | 20% | 10% |
multimodal |
50% | 15% | 20% | 15% |
Scoring Engine (experimental — used by smart-recommend and search):
| Use Case | Quality | Speed | Fit | Context |
|---|---|---|---|---|
general |
40% | 35% | 15% | 10% |
coding |
55% | 20% | 15% | 10% |
reasoning |
60% | 15% | 10% | 15% |
chat |
40% | 40% | 15% | 5% |
fast |
25% | 55% | 15% | 5% |
quality |
65% | 10% | 15% | 10% |
All weights are centralized in src/models/scoring-config.js.
Memory requirements are calculated using calibrated bytes-per-parameter values:
| Quantization | Bytes/Param | 7B Model | 14B Model | 32B Model |
|---|---|---|---|---|
| Q8_0 | 1.05 | ~8 GB | ~16 GB | ~35 GB |
| Q4_K_M | 0.58 | ~5 GB | ~9 GB | ~20 GB |
| Q3_K | 0.48 | ~4 GB | ~8 GB | ~17 GB |
The selector automatically picks the best quantization that fits your available memory.
For MoE models, deterministic memory estimation supports explicit sparse metadata when present:
total_params_bactive_params_bexpert_countexperts_active_per_tokenNormalized recommendation variants expose both snake_case and camelCase metadata aliases
(for example: total_params_b + totalParamsB) when available.
MoE parameter path selection is deterministic and uses this fallback order:
active_params_b (assumption source: moe_active_metadata)total_params_b * (experts_active_per_token / expert_count) (assumption source: moe_derived_expert_ratio)total_params_b (assumption source: moe_fallback_total_params)paramsB fallback (assumption source: moe_fallback_model_params)Dense models continue to use the dense parameter path (dense_params) unchanged.
When active_params_b (or a derived active-ratio path) is available, inference memory
uses the sparse-active parameter estimate even if artifact size metadata is present.
MoE speed estimates now include runtime-specific overhead assumptions (routing, communication, offload), instead of using a single fixed MoE boost.
src/models/moe-assumptions.jssrc/models/deterministic-selector.jssrc/models/scoring-engine.jsCurrent runtime profiles:
| Runtime | Routing | Communication | Offload | Max Effective Gain |
|---|---|---|---|---|
ollama |
18% | 13% | 8% | 2.35x |
vllm |
12% | 8% | 4% | 2.65x |
mlx |
16% | 10% | 5% | 2.45x |
llama.cpp |
20% | 14% | 9% | 2.30x |
Recommendation outputs now expose these assumptions through runtime metadata and MoE speed diagnostics.
LLM Checker uses a deterministic pipeline so the same inputs produce the same ranked output, with explicit policy outcomes for governance workflows.
flowchart LR
subgraph Inputs
HW["Hardware detector<br/>CPU/GPU/RAM/backend"]
REG["Synced SQLite Ollama catalog<br/>(packaged seed + live sync)"]
LOCAL["Installed local models"]
FLAGS["CLI options<br/>use-case/runtime/limits/policy"]
end
subgraph Pipeline["Selection Pipeline"]
NORMALIZE["Normalize and deduplicate model pool"]
PROFILE["Hardware profile and memory budget"]
FILTER["Use-case/category filtering"]
QUANT["Quantization fit selection"]
SCORE["Deterministic 4D scoring<br/>Q/S/F/C"]
POLICY["Policy evaluation (optional)<br/>audit or enforce"]
RANK["Rank and explain candidates"]
end
subgraph Outputs
REC["check / recommend output"]
AUDIT["audit export<br/>JSON / CSV / SARIF"]
RUN["pull/run-ready commands"]
end
REG --> NORMALIZE
LOCAL --> NORMALIZE
HW --> PROFILE
FLAGS --> FILTER
FLAGS --> POLICY
NORMALIZE --> FILTER
PROFILE --> QUANT
FILTER --> QUANT
QUANT --> SCORE
SCORE --> POLICY
SCORE --> RANK
POLICY --> RANK
RANK --> REC
POLICY --> AUDIT
RANK --> RUN
audit or enforce mode and records explicit violation metadata.Detect your hardware:
llm-checker hw-detect
Get recommendations for all categories:
llm-checker recommend
Full system analysis with compatible models:
llm-checker check
Find the best coding model:
llm-checker recommend --category coding
Search for small, fast models under 5GB:
llm-checker search "7b" --max-size 5 --use-case fast
Get high-quality reasoning models:
llm-checker smart-recommend --use-case reasoning
git clone https://github.com/Pavelevich/llm-checker.git
cd llm-checker
npm install
node bin/enhanced_cli.js hw-detect
src/
models/
deterministic-selector.js # Primary selection algorithm
scoring-config.js # Centralized scoring weights
scoring-engine.js # Advanced scoring (smart-recommend)
catalog.json # Curated fallback catalog (35+ models, only if dynamic pool unavailable)
ai/
multi-objective-selector.js # Multi-objective optimization
ai-check-selector.js # LLM-based evaluation
hardware/
detector.js # Hardware detection
unified-detector.js # Cross-platform detection
data/
model-database.js # SQLite storage and packaged seed loading
seed/models.db # npm-packaged Ollama catalog snapshot
sync-manager.js # Database sync from Ollama registry
bin/
enhanced_cli.js # CLI entry point
LLM Checker is licensed under NPDL-1.0 (No Paid Distribution License).
See LICENSE for full terms.
GitHub • Releases • npm • GitHub Packages • Issues • Discord
Run in your terminal:
claude mcp add llm-checker -- npx -y llm-checkerpro tip
Just installed Llm Checker? Say to Claude: "remember why I installed Llm Checkerand what I want to try" — it'll save into your Vault.
how this works →CSA PROJECT - FZCO © 2026 IFZA Business Park, DDP, Premises Number 31174 - 001
Security
Low riskAutomated heuristic from public metadata — not a security guarantee.