loading…
Search for a command to run...
loading…
Enables scraping Zhihu webpages, getting hot questions, and publishing answers using Puppeteer with QR code login authentication.
Enables scraping Zhihu webpages, getting hot questions, and publishing answers using Puppeteer with QR code login authentication.
This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features a simple, rule-based interaction mechanism to handle common elements like cookie banners.
Now easily runnable via npx!
npx package.The recommended way to use this server is via npx, which ensures you're running the latest version without needing to clone or manually install.
Prerequisites: Ensure you have Node.js and npm installed.
Environment Setup (Optional):
You can configure the server using a .env file or shell environment variables.
Example .env file or shell exports:
# Optional (defaults shown)
# TRANSPORT_TYPE=stdio # Options: stdio, sse, http
# PORT=3001 # Only used in sse/http modes
# DISABLE_HEADLESS=true # Uncomment to see the browser in action
Run the Server: Open your terminal and run:
npx -y zhihu-mcp-server
-y flag automatically confirms any prompts from npx.stdio mode. Set TRANSPORT_TYPE=sse or TRANSPORT_TYPE=http for HTTP server modes.For tools that require you to be logged in (like publish-answer), this server uses a cookie-based authentication flow. You no longer need to provide a COOKIE environment variable.
The process is as follows:
login-with-qrcode tool. This will return a QR code.qrcodes/cookies.json).scrape-webpage, get-hot-question, and publish-answer will automatically use these saved cookies to authenticate your session.This means you only need to log in once, and your session will be reused until the cookies expire.
This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator. Here's an example configuration snippet:
{
"mcpServers": {
"web-scraper": {
"command": "npx",
"args": ["-y", "zhihu-mcp-server"],
"env": {
// Optional:
// "TRANSPORT_TYPE": "stdio", // or "sse" or "http"
// "DISABLE_HEADLESS": "true" // To see the browser during operations
}
}
// ... other MCP servers
}
}
When configured this way, the MCP orchestrator will manage the lifecycle of the zhihu-mcp-server process.
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
TRANSPORT_TYPE: (Optional) The transport protocol to use.stdio (default), sse, httpstdio: Direct process communication (recommended for most use cases)sse: Server-Sent Events over HTTP (legacy mode)http: Streamable HTTP transport with session managementPORT: (Optional) The port for the HTTP server in SSE or HTTP mode.3001.DISABLE_HEADLESS: (Optional) Set to true to run the browser in visible mode.false (browser runs in headless mode).The server supports three communication modes:
TRANSPORT_TYPE=sse in your environment.PORT (default: 3001).http://localhost:3001/sseTRANSPORT_TYPE=http in your environment.PORT (default: 3001).http://localhost:3001/mcpThe server provides the following tools:
scrape-webpageScrapes a webpage and returns its content as markdown.
Tool Parameters:
url (string, required): The URL of the webpage to scrape.autoInteract (boolean, optional, default: true): Whether to automatically handle interactive elements.get-hot-questionGets a hot question from the specified URL.
Tool Parameters:
type (string, optional, default: day): The type of hot question list to get. Can be hour, day, or week.publish-answerPublishes an answer to a question on the specified URL.
Tool Parameters:
url (string, required): The URL of the question to answer.answer (string, required): The answer to publish.login-with-qrcodeGets a login QR code from the specified URL.
Tool Parameters:
qrSelector (string, optional): The CSS selector for the QR code element. Defaults to .Qrcode-qrcode.switchQrSelector (string, optional): The CSS selector for the button to switch to QR code login.Response Format:''
The tool returns its result in a structured format:
content: An array containing a single text object with the raw markdown of the scraped webpage.metadata: Contains additional information:message: Status message.success: Boolean indicating success.contentSize: Size of the content in characters (on success).Example Success Response:
{
"content": [
{
"type": "text",
"text": "# Page Title\n\nThis is the content..."
}
],
"metadata": {
"message": "Scraping successful",
"success": true,
"contentSize": 8734
}
}
Example Error Response:
{
"content": [
{
"type": "text",
"text": ""
}
],
"metadata": {
"message": "Error scraping webpage: Failed to load the URL",
"success": false
}
}
The system uses a simple rule-based approach to handle common website interruptions. It searches for buttons containing keywords like "Accept", "Agree", or "Continue" and clicks them to dismiss pop-ups like cookie banners.
After interactions, Mozilla's Readability extracts the main content, which is then sanitized and converted to Markdown using Turndown with custom rules for code blocks and tables.
This project includes a Dockerfile to build and run the server in a containerized environment.
From the project root directory, run:
docker build -t zhihu-mcp-server:latest .
To run the server inside a Docker container, use the following command. You can pass environment variables using the -e flag.
To get the login QR code and persist the session, you need to mount a volume to the container. This ensures the qrcodes/cookies.json file is saved on your host machine.
// 临时调试,交互式运行
mkdir -p ./qrcodes && sudo chown 999:999 ./qrcodes && \
docker run -it --rm \
--user 999:999 \
-e TRANSPORT_TYPE=http \
-e PORT=3001 \
-v $(pwd)/qrcodes:/home/pptruser/qrcodes \
-p 3001:3001 \
zhihu-mcp-server:latest
//
mkdir -p ./qrcodes && sudo chown 999:999 ./qrcodes && \
docker run -d \
--user 999:999 \
-e TRANSPORT_TYPE=http \
-e PORT=3001 \
-p 3001:3001 \
zhihu-mcp-server:latest
When running the server in a Docker container, you can configure it with the following environment variables:
TRANSPORT_TYPE: (Optional) The transport protocol to use.stdio (default), sse, http.-e TRANSPORT_TYPE=httpPORT: (Optional) The port for the HTTP server in sse or http mode. You must also map this port using the -p flag in the docker run command.3001.-e PORT=8080 -p 8080:8080DISABLE_HEADLESS: (Optional) Set to true to run the browser in visible mode. Note: This is primarily for debugging and may require additional X11 forwarding configuration to work correctly with Docker.false (browser runs in headless mode).-e DISABLE_HEADLESS=trueIf you wish to contribute, modify the server, or run a local development version:
git clone https://github.com/morrain/zhihuMcpServer.git
cd zhihuMcpServer
npm install
npm run build
npm start
Or, for automatic rebuilding on changes:npm run dev
You can modify the behavior of the scraper by editing:
src/ai/page-interactions.ts: Add new keywords or logic for handling different types of pop-ups.src/scrapers/webpage-scraper.ts (visitWebPage function): Change Puppeteer options.src/utils/markdown-formatters.ts: Adjust Turndown rules for Markdown conversion.Key dependencies include:
@modelcontextprotocol/sdkpuppeteer, puppeteer-extra@mozilla/readability, jsdomturndown, sanitize-htmlexpress (for SSE/HTTP modes)zodВыполни в терминале:
claude mcp add zhihu-mcp-server -- npx Безопасность
Низкий рискАвтоматическая эвристика по публичным данным — не гарантия безопасности.