ConKurrence

БесплатноHosted

AI evaluation toolkit that measures inter-rater agreement (Fleiss' κ, Kendall's W) across multiple LLM providers. Evaluate prompt reliability, detect contested

автор: AlligatorC0der

GitHub

Описание

AI evaluation toolkit that measures inter-rater agreement (Fleiss' κ, Kendall's W) across multiple LLM providers. Evaluate prompt reliability, detect contested outputs, and track consensus trends over time.

README

One command. Find out if your AI agrees with itself.

ConKurrence is a statistically validated consensus measurement toolkit for AI evaluation pipelines. It uses multiple AI models as independent raters, measures inter-rater reliability with Fleiss' kappa and bootstrap confidence intervals, and routes contested items to human experts.

Install

npm install -g conkurrence

MCP Server

Use ConKurrence as an MCP server in Claude Desktop or any MCP-compatible client:

npx conkurrence mcp

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "conkurrence": {
      "command": "npx",
      "args": ["-y", "conkurrence", "mcp"]
    }
  }
}

Claude Code Plugin

/plugin marketplace add AlligatorC0der/conkurrence

Features

Multi-model evaluation — Run your schema against Bedrock, OpenAI, and Gemini models simultaneously
Statistical rigor — Fleiss' kappa with bootstrap confidence intervals, Kendall's W for validity
Self-consistency mode — No API keys needed; uses the host model via MCP Sampling
Schema suggestion — AI-powered schema design from your data
Trend tracking — Compare runs over time, detect agreement degradation
Cost estimation — Know the cost before running

MCP Tools

Tool	Description
`conkurrence_run`	Execute an evaluation across multiple AI raters
`conkurrence_report`	Generate a detailed markdown report
`conkurrence_compare`	Side-by-side comparison of two runs
`conkurrence_trend`	Track agreement over multiple runs
`conkurrence_suggest`	AI-powered schema suggestion from your data
`conkurrence_validate_schema`	Validate a schema before running
`conkurrence_estimate`	Estimate cost and token usage

License

BUSL-1.1 — Business Source License 1.1

Как установить

Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.

{
  "mcpServers": {
    "conkurrence": {
      "command": "npx",
      "args": []
    }
  }
}

ConKurrence

Описание

README

Install

MCP Server

Claude Desktop Configuration

Claude Code Plugin

Features

MCP Tools

Links

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Command Palette

ConKurrence

Описание

README

Install

MCP Server

Claude Desktop Configuration

Claude Code Plugin

Features

MCP Tools

Links

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant