loading…
Search for a command to run...
loading…
MCP server for the DeepGHS anime AI ecosystem. Dataset discovery, tag search, character dataset finder, and training pipeline code generation.
MCP server for the DeepGHS anime AI ecosystem. Dataset discovery, tag search, character dataset finder, and training pipeline code generation.
A Python MCP server for the DeepGHS anime AI ecosystem. Connect it to any MCP-compatible client (Claude Desktop, Cursor, etc.) to browse datasets, discover pre-built character training sets, look up tags across 18 platforms, and generate complete data pipeline scripts — all directly from your AI assistant.
hatsune_miku, Zerochan uses Hatsune Miku, Pixiv uses 初音ミク — this tool maps them all togetherdeepghs and CyberHarem namespaces for existing character image collectionscheesechaser command to pull what you needgitgit clone https://github.com/citronlegacy/deepghs-mcp.git
cd deepghs-mcp
chmod +x install.sh && ./install.sh
# or without chmod:
bash install.sh
pip install -r requirements.txt
HF_TOKEN is optional for public datasets but strongly recommended — it raises HuggingFace's API rate limit and is required for any gated or private repositories.
Get your token at huggingface.co/settings/tokens (read access is sufficient).
Without it, the server still works for all public DeepGHS datasets.
python deepghs_mcp.py
# or via the venv created by install.sh:
.venv/bin/python deepghs_mcp.py
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"deepghs": {
"command": "/absolute/path/to/.venv/bin/python",
"args": ["/absolute/path/to/deepghs_mcp.py"],
"env": {
"HF_TOKEN": "hf_your_token_here"
}
}
}
}
/absolute/path/to/.venv/bin/python/absolute/path/to/deepghs_mcp.py"What anime datasets does DeepGHS have on HuggingFace?"
The assistant calls deepghs_list_datasets and returns all datasets sorted by download count — Danbooru2024, Sankaku, Gelbooru WebP, BangumiBase, site_tags, and more — with links and update dates.
"What files are in deepghs/danbooru2024? How big is it?"
The assistant calls deepghs_get_repo_info and returns the full file tree — every .tar and .parquet file with individual and total sizes — so you know exactly what you're committing to before you download.
"Is there already a dataset for Rem from Re:Zero I can use for LoRA training?"
The assistant calls deepghs_find_character_dataset, searches both deepghs and CyberHarem namespaces, and returns any matches with download counts and a one-liner download command.
Example response:
## Character Dataset Search: Rem
Found 2 dataset(s):
### CyberHarem/rem_rezero
Downloads: 4.2K | Likes: 31 | Updated: 2024-08-12
Download with cheesechaser:
from cheesechaser.datapool import SimpleDataPool
pool = SimpleDataPool('CyberHarem/rem_rezero')
pool.batch_download_to_directory('./dataset', max_workers=4)
"What is the correct tag for 'Hatsune Miku' on Danbooru, Zerochan, and Pixiv?"
The assistant calls deepghs_search_tags and returns the format for every platform, plus a pre-filtered link to the site_tags dataset viewer and Parquet query code.
Example response:
Tag Lookup: Hatsune Miku
Danbooru: hatsune_miku
Gelbooru: hatsune_miku
Sankaku: character:hatsune_miku
Zerochan: Hatsune Miku
Pixiv: 初音ミク (Japanese preferred)
Yande.re: hatsune_miku
Wallhaven: hatsune miku
This directly solves the MultiBoru tag normalization problem.
"Generate a waifuc script to build a LoRA dataset for Surtr from Arknights, from Danbooru and Pixiv, for SDXL, safe-only."
The assistant calls deepghs_generate_waifuc_script and returns a complete annotated Python script. Here is what that script does when run:
1. Crawls Danbooru (surtr_(arknights)) + Pixiv (スルト アークナイツ)
2. Converts all images to RGB with white background
3. Drops monochrome, sketch, manga panels, and 3D renders
4. Filters to safe rating only
5. Deduplicates near-identical images (差分 / variants)
6. Drops images with no detected face
7. Splits group images — each character becomes its own crop
8. CCIP identity filter — AI verifies every image actually shows Surtr
9. WD14 auto-tagging — writes .txt caption files automatically
10. Resizes and pads to 1024×1024 (SDXL standard)
11. Exports in kohya_ss-compatible folder structure
No manual curation. No manual tagging. Drop the output folder straight into your trainer.
"Give me a cheesechaser script to download post IDs 1234, 5678, 9012 from deepghs/danbooru2024."
The assistant calls deepghs_generate_cheesechaser_script and returns a complete Python script — with options for downloading by post ID list, downloading everything, or filtering from the Parquet index by tag.
"Does DeepGHS have a model for scoring image aesthetics?"
The assistant calls deepghs_list_models with search: "aesthetic" and returns matching models with pipeline task, download counts, and direct HuggingFace links.
"Is there a live demo for the DeepGHS face detection model?"
The assistant calls deepghs_list_spaces with search: "detection" and returns matching Spaces with direct links to the interactive demos.
| Tool | Description | Key Parameters |
|---|---|---|
deepghs_list_datasets |
Browse all DeepGHS datasets with search, sort, and pagination | search, sort, limit, offset |
deepghs_list_models |
Browse all DeepGHS models | search, sort, limit |
deepghs_list_spaces |
Browse all DeepGHS live demo Spaces | search, limit |
deepghs_get_repo_info |
Full file tree + metadata for any dataset/model/space | repo_id, repo_type |
deepghs_search_tags |
Cross-platform tag lookup across 18 platforms via site_tags | tag |
deepghs_find_character_dataset |
Find pre-built LoRA training datasets for a character | character_name |
deepghs_generate_waifuc_script |
Generate complete data collection + cleaning pipeline script | character_name, sources, model_format, content_rating |
deepghs_generate_cheesechaser_script |
Generate targeted dataset download script | repo_id, post_ids, output_dir |
deepghs_list_datasetsLists all public datasets from DeepGHS, sortable and filterable by keyword.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | ❌ | — | Keyword filter, e.g. danbooru, character, face |
sort |
string | ❌ | downloads |
downloads, likes, createdAt, lastModified |
limit |
integer | ❌ | 20 |
Results per page (max 100) |
offset |
integer | ❌ | 0 |
Pagination offset |
response_format |
string | ❌ | markdown |
markdown or json |
deepghs_list_modelsLists all public models — CCIP, WD Tagger Enhanced, aesthetic scorer, face/head/person detectors, anime classifier, furry detector, NSFW censor, style era classifier, and more.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | ❌ | — | Keyword filter, e.g. ccip, tagger, aesthetic, face |
sort |
string | ❌ | downloads |
downloads, likes, createdAt, lastModified |
limit |
integer | ❌ | 20 |
Results per page (max 100) |
offset |
integer | ❌ | 0 |
Pagination offset |
response_format |
string | ❌ | markdown |
markdown or json |
deepghs_list_spacesLists all public Spaces — live demos for face detection, head detection, CCIP character similarity, WD tagger, aesthetic scorer, reverse image search, Danbooru character lookup, and more.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
search |
string | ❌ | — | Keyword filter, e.g. detection, tagger, search |
limit |
integer | ❌ | 20 |
Results per page (max 100) |
response_format |
string | ❌ | markdown |
markdown or json |
deepghs_get_repo_infoGet full metadata for any dataset, model, or space — including the complete file tree with individual file sizes. Essential before downloading a multi-TB dataset.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_id |
string | ✅ | — | Full HF repo ID, e.g. deepghs/danbooru2024 |
repo_type |
string | ❌ | dataset |
dataset, model, or space |
response_format |
string | ❌ | markdown |
markdown or json |
Tip: Datasets containing
.tarfiles automatically get a ready-to-copy cheesechaser snippet appended.
deepghs_search_tagsLook up any tag across 18 platforms using the deepghs/site_tags dataset. Returns per-platform format guidance, a pre-filtered dataset viewer link, and Parquet query code.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
tag |
string | ✅ | — | Tag in any format or language, e.g. hatsune_miku, Hatsune Miku, 初音ミク |
response_format |
string | ❌ | markdown |
markdown or json |
LLM Tip: Call this before
deepghs_generate_waifuc_scriptto confirm the correct Danbooru/Pixiv tag format for a character.
deepghs_find_character_datasetSearches deepghs and CyberHarem on HuggingFace for pre-built character datasets. CyberHarem datasets are built with the full automated pipeline: crawl → CCIP filter → WD14 tag → upload.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
character_name |
string | ✅ | — | Character name, e.g. Rem, Hatsune Miku, surtr arknights |
response_format |
string | ❌ | markdown |
markdown or json |
deepghs_generate_waifuc_scriptGenerates a complete, annotated Python pipeline script using waifuc.
Pipeline actions included (in order):
| Action | What it does | Why it matters for LoRA |
|---|---|---|
ModeConvertAction |
Convert to RGB, white background | Standardizes input format |
NoMonochromeAction |
Drop greyscale/sketch images | Prevents style contamination |
ClassFilterAction |
Keep illustration/anime only | Drops manga panels and 3D |
RatingFilterAction |
Filter by content rating | Keep dataset SFW if needed |
FilterSimilarAction |
Deduplicate similar images | Prevents overfitting to variants |
FaceCountAction |
Require exactly 1 face | Removes group shots and objects |
PersonSplitAction |
Crop each character from group images | Maximizes usable data |
CCIPAction |
AI identity verification | Removes wrong characters (the most important step) |
TaggingAction |
WD14 auto-tagging | Generates .txt captions automatically |
AlignMinSizeAction |
Resize to minimum resolution | Ensures quality floor |
PaddingAlignAction |
Pad to square | Standard training resolution |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
character_name |
string | ✅ | — | Display name, e.g. Rem, Surtr |
danbooru_tag |
string | ❌ | auto-guessed | Danbooru tag, e.g. rem_(re:zero) |
pixiv_query |
string | ❌ | — | Pixiv search query, Japanese preferred. Required if pixiv in sources |
sources |
list | ❌ | ["danbooru"] |
danbooru, pixiv, gelbooru, zerochan, sankaku, auto |
model_format |
string | ❌ | sd1.5 |
sd1.5 (512px), sdxl (1024px), or flux (1024px) |
content_rating |
string | ❌ | safe |
safe, safe_r15, or all |
output_dir |
string | ❌ | ./dataset_output |
Output directory |
max_images |
integer | ❌ | no limit | Cap total images collected |
pixiv_token |
string | ❌ | — | Pixiv refresh token (required for Pixiv source) |
Source tag formats:
| Source | Tag Format | Notes |
|---|---|---|
danbooru |
rem_(re:zero) |
snake_case with series in parens |
gelbooru |
rem_(re:zero) |
same as Danbooru |
pixiv |
レム / rem re:zero |
Japanese preferred for better results |
zerochan |
Rem |
Title Case, strict mode enabled |
sankaku |
rem_(re:zero) |
snake_case |
auto |
character name | uses gchar database — best for game characters |
deepghs_generate_cheesechaser_scriptGenerates a Python download script using cheesechaser to pull specific images from indexed tar datasets without downloading entire archives.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_id |
string | ✅ | — | HF dataset, e.g. deepghs/danbooru2024 |
output_dir |
string | ❌ | ./downloads |
Local download directory |
post_ids |
list[int] | ❌ | — | Specific post IDs. If omitted, downloads all (can be very large) |
max_workers |
integer | ❌ | 4 |
Parallel download threads (1–16) |
| Repo ID | Description | Use Case |
|---|---|---|
deepghs/danbooru2024 |
Full Danbooru archive, 8M+ images | Bulk downloads, data mining |
deepghs/danbooru2024-webp-4Mpixel |
Compressed WebP version | Faster downloads |
deepghs/sankaku_full |
Full Sankaku Channel dataset | Alternative tag ecosystem |
deepghs/gelbooru-webp-4Mpixel |
Gelbooru compressed | Western fanart coverage |
deepghs/site_tags |
2.5M+ tags, 18 platforms | Tag normalization |
deepghs/anime_face_detection |
YOLO face detection labels | Train detection models |
deepghs/bangumibase |
Character frames from anime | Character dataset bootstrapping |
1. deepghs_find_character_dataset → check if dataset exists
2. deepghs_get_repo_info → inspect file sizes
3. deepghs_generate_cheesechaser_script → get download command
4. Run the script → train
1. deepghs_search_tags → find the correct Danbooru tag
2. deepghs_generate_waifuc_script → generate full pipeline script
3. Run the script → crawl, filter, tag, crop
4. Review output → remove any remaining noise
5. Train with kohya_ss
1. deepghs_get_repo_info → inspect dataset Parquet structure
2. deepghs_search_tags → confirm tag names
3. deepghs_generate_cheesechaser_script → generate Parquet-filter script
4. Run the script → downloads only matching images
deepghs_find_character_dataset before generating a waifuc script. CyberHarem has hundreds of ready-to-use LoRA datasets.character_(series) with underscores — rem_(re:zero), not Rem (Re:Zero). Use deepghs_search_tags to confirm.danbooru2024 are multi-TB. Always check deepghs_get_repo_info before recommending a full download. Use cheesechaser with post IDs for targeted access.pixiv_token. If the user hasn't set one up, suggest Danbooru + Gelbooru instead.AlignMinSizeAction and PaddingAlignAction in generated scripts.HF_TOKEN, the Hub API may throttle requests on heavy usage.waifuc installed separately (not in this server's deps). First run downloads ~500MB of CCIP + WD14 model weights.Server won't start:
python --versionbash install.shRate limit / 429 errors:
HF_TOKEN in your MCP env config403 / Forbidden on a dataset:
HF_TOKEN is from an account that has been granted accessCharacter dataset not found:
"Rem", "rem_(re:zero)", "rem re:zero"deepghs_generate_waifuc_scriptwaifuc script fails:
pip install git+https://github.com/deepghs/waifuc.gitpip install "waifuc[gpu]" (much faster CCIP + tagging)Pull requests are welcome! If a tool is returning incorrect data, a script template is outdated, or a new DeepGHS dataset or model should be highlighted, please open an issue or PR.
MIT License — see LICENSE for details.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"deepghs-mcp": {
"command": "npx",
"args": []
}
}
}