HF Dataset

БесплатноНе проверен

An MCP server for the Hugging Face Dataset Viewer API that enables searching, fetching, and filtering datasets on the Hugging Face Hub. It allows users to explo

автор: cfahlgren1

GitHub Embed

Описание

An MCP server for the Hugging Face Dataset Viewer API that enables searching, fetching, and filtering datasets on the Hugging Face Hub. It allows users to explore schemas, perform full-text searches, and analyze dataset statistics through natural language.

README

MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.

Installation

npx @cfahlgren1/hf-dataset-mcp

Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "hf-datasets": {
      "command": "npx",
      "args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
      "env": {
        "HF_TOKEN": "hf_..."
      }
    }
  }
}

Environment Variables

Variable	Description
`HF_TOKEN`	Hugging Face API token (required for private/gated datasets)
`HF_DATASETS_SERVER`	Custom Dataset Viewer API URL (default: `https://datasets-server.huggingface.co`)

Tools

`search_datasets`

Find datasets on the Hugging Face Hub by name, tag, or author.

search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)

`validate_dataset`

Check if a dataset is accessible and which viewer features are available.

validate_dataset(dataset: string)

`list_splits`

Get all available configurations and splits for a dataset.

list_splits(dataset: string)

`get_dataset_info`

Get the schema, metadata, and row counts for a dataset configuration.

get_dataset_info(dataset: string, config: string)

`get_rows`

Fetch a slice of rows from a dataset split.

get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)

`search_dataset`

Full-text search within a dataset split using BM25 ranking.

search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)

`filter_rows`

Filter dataset rows using SQL-like WHERE conditions.

filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)

WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.

Example: "label"=1 AND "text" LIKE '%hello%'

`get_dataset_size`

Get row counts and byte sizes for all configs and splits.

get_dataset_size(dataset: string)

`list_parquet_files`

Get URLs for the dataset's Parquet files for direct download or processing.

list_parquet_files(dataset: string)

`get_statistics`

Get descriptive statistics for each column in a dataset split.

get_statistics(dataset: string, config: string, split: string)

Examples

Find text classification datasets

search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)

Get IMDB dataset info

list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")

Fetch rows from a dataset

get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)

Search for specific content

search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")

Filter rows

filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)

License

MIT

from github.com/cfahlgren1/hf-dataset-mcp

Как установить

Выполни в терминале:

claude mcp add hf-dataset-mcp -- npx

Compare HF Dataset with

HF DatasetvsFetch HF DatasetvsAWS KB Retrieval HF DatasetvsSpring AI MCP Server HF Datasetvsllm-analysis-assistant

Не уверен что выбрать?

Найди свой стек за 60 секунд

Автор?

Embed-бейдж для README

Похожее

Все в категории ai

loading…

Весь каталог

HF Dataset

БесплатноНе проверен

An MCP server for the Hugging Face Dataset Viewer API that enables searching, fetching, and filtering datasets on the Hugging Face Hub. It allows users to explo

автор: cfahlgren1

GitHub Embed

Описание

README

MCP server for the Hugging Face Dataset Viewer API. Search datasets, fetch rows, filter data, and more.

Installation

npx @cfahlgren1/hf-dataset-mcp

Configuration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "hf-datasets": {
      "command": "npx",
      "args": ["-y", "@cfahlgren1/hf-dataset-mcp"],
      "env": {
        "HF_TOKEN": "hf_..."
      }
    }
  }
}

Environment Variables

Variable	Description
`HF_TOKEN`	Hugging Face API token (required for private/gated datasets)
`HF_DATASETS_SERVER`	Custom Dataset Viewer API URL (default: `https://datasets-server.huggingface.co`)

Tools

`search_datasets`

Find datasets on the Hugging Face Hub by name, tag, or author.

search_datasets(search?: string, author?: string, filter?: string[], sort?: string, limit?: number)

`validate_dataset`

Check if a dataset is accessible and which viewer features are available.

validate_dataset(dataset: string)

`list_splits`

Get all available configurations and splits for a dataset.

list_splits(dataset: string)

`get_dataset_info`

Get the schema, metadata, and row counts for a dataset configuration.

get_dataset_info(dataset: string, config: string)

`get_rows`

Fetch a slice of rows from a dataset split.

get_rows(dataset: string, config: string, split: string, offset?: number, length?: number)

`search_dataset`

Full-text search within a dataset split using BM25 ranking.

search_dataset(dataset: string, config: string, split: string, query: string, offset?: number, length?: number)

`filter_rows`

Filter dataset rows using SQL-like WHERE conditions.

filter_rows(dataset: string, config: string, split: string, where: string, orderby?: string, offset?: number, length?: number)

WHERE syntax: Column names in double quotes, strings in single quotes. Supports =, <>, >, <, >=, <=, AND, OR, NOT.

Example: "label"=1 AND "text" LIKE '%hello%'

`get_dataset_size`

Get row counts and byte sizes for all configs and splits.

get_dataset_size(dataset: string)

`list_parquet_files`

Get URLs for the dataset's Parquet files for direct download or processing.

list_parquet_files(dataset: string)

`get_statistics`

Get descriptive statistics for each column in a dataset split.

get_statistics(dataset: string, config: string, split: string)

Examples

Find text classification datasets

search_datasets(filter: ["task_categories:text-classification"], sort: "downloads", limit: 10)

Get IMDB dataset info

list_splits(dataset: "stanfordnlp/imdb")
get_dataset_info(dataset: "stanfordnlp/imdb", config: "plain_text")

Fetch rows from a dataset

get_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", offset: 0, length: 10)

Search for specific content

search_dataset(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", query: "amazing movie")

Filter rows

filter_rows(dataset: "stanfordnlp/imdb", config: "plain_text", split: "train", where: "\"label\"=1", length: 10)

License

MIT

from github.com/cfahlgren1/hf-dataset-mcp

Как установить

Выполни в терминале:

claude mcp add hf-dataset-mcp -- npx

Compare HF Dataset with

HF DatasetvsFetch HF DatasetvsAWS KB Retrieval HF DatasetvsSpring AI MCP Server HF Datasetvsllm-analysis-assistant

Не уверен что выбрать?

Найди свой стек за 60 секунд

Автор?

Embed-бейдж для README

Похожее

Все в категории ai

Command Palette

HF Dataset

Описание

README

Installation

Configuration

Claude Desktop

Environment Variables

Tools

search_datasets

validate_dataset

list_splits

get_dataset_info

get_rows

search_dataset

filter_rows

get_dataset_size

list_parquet_files

get_statistics

Examples

Find text classification datasets

Get IMDB dataset info

Fetch rows from a dataset

Search for specific content

Filter rows

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare HF Dataset with

HF Dataset

Описание

README

Installation

Configuration

Claude Desktop

Environment Variables

Tools

search_datasets

validate_dataset

list_splits

get_dataset_info

get_rows

search_dataset

filter_rows

get_dataset_size

list_parquet_files

get_statistics

Examples

Find text classification datasets

Get IMDB dataset info

Fetch rows from a dataset

Search for specific content

Filter rows

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Compare HF Dataset with

`search_datasets`

`validate_dataset`

`list_splits`

`get_dataset_info`

`get_rows`

`search_dataset`

`filter_rows`

`get_dataset_size`

`list_parquet_files`

`get_statistics`

`search_datasets`

`validate_dataset`

`list_splits`

`get_dataset_info`

`get_rows`

`search_dataset`

`filter_rows`

`get_dataset_size`

`list_parquet_files`

`get_statistics`