loading…
Search for a command to run...
loading…
Semantix-Verify is an MCP server for semantic validation of AI/LLM outputs. It exposes a single tool, verify_text_intent(text, intent_description, threshold), w
Semantix-Verify is an MCP server for semantic validation of AI/LLM outputs. It exposes a single tool, verify_text_intent(text, intent_description, threshold), which uses a local quantized NLI cross-encoder (INT8 ONNX) to return a 0.0–1.0 probability that the text satisfies the given intent — and, when it doesn't, a structured correction prompt for agent retry loops. Useful for building com
Provable semantic validation for LLM outputs. Local. Deterministic. Auditable.
Validate every LLM output against an explicit intent. Get back a score, a verdict, and a tamper-evident receipt. Locally. In ~15-50 milliseconds. Without an API key.
pip install semantix-ai
from semantix import Intent, validate_intent
class ResolutionPolite(Intent):
"""The response must acknowledge the customer's issue and propose a concrete next step, in a polite tone."""
@validate_intent(ResolutionPolite, audit=True)
def handle_complaint(message: str) -> str:
return call_my_llm(message)
reply = handle_complaint(incoming)
# Returns the validated reply — or raises SemanticIntentError.
# The audit engine has already written a hash-chained receipt to disk.
LLM applications quietly skip the step where you prove the output was fit for purpose. The common fix — calling a bigger LLM as a judge — has three problems:
semantix replaces that reflex with a local, deterministic validator and a tamper-evident log. Every validation produces a signed JSON-LD certificate hash-chained to the previous one. Modify any entry and every subsequent hash breaks. The regulator doesn't need to trust your database — the math proves the chain is intact.
from semantix import Intent, validate_intent
class MedicalAdvice(Intent):
"""The text provides a medical diagnosis or treatment recommendation."""
@validate_intent(~MedicalAdvice) # Must NOT give medical advice
def chatbot(msg: str) -> str:
return call_my_llm(msg)
Compose with & (all must pass) and | (any must pass):
SafeAndPolite = Polite & ~MedicalAdvice & ~LegalAdvice
from semantix.audit.engine import AuditEngine
engine = AuditEngine()
# Every @validate_intent call with audit=True writes a hash-chained certificate.
engine.verify_chain() # True if no tampering
Each certificate records the hash of the validated text, the intent, the judge identity and configuration, the verdict, the timestamp, and the hash of the previous certificate. Compatible with JSON-LD tooling and standard audit pipelines.
On failure, semantix injects structured feedback so the LLM knows what went wrong:
from typing import Optional
@validate_intent(ResolutionPolite, retries=2)
def reply(msg: str, semantix_feedback: Optional[str] = None) -> str:
prompt = f"Reply to: {msg}"
if semantix_feedback:
prompt += f"\n\n{semantix_feedback}"
return call_llm(prompt)
First call: semantix_feedback is None. On retry: it receives a Markdown report with the score, reason, and rejected output. Measured reliability improves from 21% to 70% across three intent categories.
from semantix import ForensicJudge, QuantizedNLIJudge
judge = ForensicJudge(QuantizedNLIJudge())
# Verdict.reason: "Suspect tokens: [indemnify, forfeit, waive]"
from semantix.testing import assert_semantic
def test_chatbot_is_polite():
response = my_chatbot("handle angry customer")
assert_semantic(response, "polite and professional")
On failure:
AssertionError: Semantic check failed (score=0.12)
Intent: polite and professional
Output: "You're an idiot for asking that."
Reason: Text contains aggressive language
First-class pytest plugin with fixtures, markers, and CI reporting: pytest-semantix.
Drop into your existing stack — retries are handled natively by each framework.
import dspy
from semantix.integrations.dspy import semantic_reward
qa = dspy.ChainOfThought("question -> answer")
refined = dspy.Refine(module=qa, N=3, reward_fn=semantic_reward(Polite))
semantic_reward / semantic_metric also plug into dspy.BestOfN, dspy.Evaluate, and MIPROv2 — local, no API calls, ~15 ms per eval. See benchmarks/ for reproducible comparisons against LLM-judge reward functions.
from semantix.integrations.langchain import SemanticValidator
validator = SemanticValidator(Polite)
chain = prompt | llm | StrOutputParser() | validator
from pydantic_ai import Agent
from semantix.integrations.pydantic_ai import semantix_validator
agent = Agent("openai:gpt-4o", output_type=str)
agent.output_validator(semantix_validator(Polite))
from guardrails import Guard
from semantix.integrations.guardrails import SemanticIntent
guard = Guard().use(SemanticIntent("must be polite and professional"))
from semantix.integrations.instructor import SemanticStr
from pydantic import BaseModel
class Response(BaseModel):
reply: SemanticStr["must be polite and professional", 0.85]
pip install "semantix-ai[mcp,nli]"
mcp run semantix/mcp/server.py
Any MCP-capable agent (Claude Desktop, Cursor, etc.) can validate intents as a tool.
- uses: labrat-akhona/semantic-test-action@v1
with:
test-path: tests/
Posts a semantic test report as a PR comment.
Install extras: pip install "semantix-ai[dspy]", "[langchain]", "[pydantic-ai]", "[guardrails]", "[instructor]", "[mcp]", "[all]".
Choose the speed / accuracy / reasoning trade-off:
from semantix import NLIJudge, EmbeddingJudge, LLMJudge, CachingJudge
@validate_intent(judge=NLIJudge()) # local, ~15 ms, deterministic
@validate_intent(judge=EmbeddingJudge()) # local, ~5 ms, similarity-based
@validate_intent(judge=LLMJudge(model="gpt-4o-mini")) # reasoning, ~500 ms, API
@validate_intent(judge=CachingJudge(NLIJudge(), maxsize=256)) # LRU-wrapped
Quantized mode (INT8 ONNX, ~25 MB, no PyTorch):
pip install "semantix-ai[turbo]"
See Where semantix fits for a comparison against TruLens, DeepEval, Vectara HHEM, Guardrails, RAGAS, and NeMo.
pip install semantix-ai # Core (default NLI judge)
pip install "semantix-ai[turbo]" # Quantized ONNX (smallest footprint)
pip install "semantix-ai[openai]" # LLM judge (GPT-4o-mini)
pip install "semantix-ai[all]" # Everything
Package name on PyPI is
semantix-ai. Import isfrom semantix import ....
See CONTRIBUTING.md for dev setup, testing, and submission guidelines.
MIT — see LICENSE.
Built by Akhona Eland in South Africa
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"semantix-verify": {
"command": "npx",
"args": []
}
}
}Web content fetching and conversion for efficient LLM usage.
Retrieval from AWS Knowledge Base using Bedrock Agent Runtime.
Provides auto-configuration for setting up an MCP server in Spring Boot applications.
A very streamlined mcp client that supports calling and monitoring stdio/sse/streamableHttp, and can also view request responses through the /logs page. It also