Semantix Verify

Бесплатно

Semantix-Verify is an MCP server for semantic validation of AI/LLM outputs. It exposes a single tool, verify_text_intent(text, intent_description, threshold), w

автор: labrat-akhona

GitHub

Описание

Semantix-Verify is an MCP server for semantic validation of AI/LLM outputs. It exposes a single tool, verify_text_intent(text, intent_description, threshold), which uses a local quantized NLI cross-encoder (INT8 ONNX) to return a 0.0–1.0 probability that the text satisfies the given intent — and, when it doesn't, a structured correction prompt for agent retry loops. Useful for building com

README

semantix-ai

Provable semantic validation for LLM outputs. Local. Deterministic. Auditable.

Validate every LLM output against an explicit intent. Get back a score, a verdict, and a tamper-evident receipt. Locally. In ~15-50 milliseconds. Without an API key.

pip install semantix-ai

from semantix import Intent, validate_intent

class ResolutionPolite(Intent):
    """The response must acknowledge the customer's issue and propose a concrete next step, in a polite tone."""

@validate_intent(ResolutionPolite, audit=True)
def handle_complaint(message: str) -> str:
    return call_my_llm(message)

reply = handle_complaint(incoming)
# Returns the validated reply — or raises SemanticIntentError.
# The audit engine has already written a hash-chained receipt to disk.

Why this exists

LLM applications quietly skip the step where you prove the output was fit for purpose. The common fix — calling a bigger LLM as a judge — has three problems:

It drifts. Same input, different score on different runs. A regulator asking "rerun this validation" gets a different answer, which is indistinguishable from evidence the system is broken.
It ships personal information out of your network. Every judge call sends the output to a third-party API. Under POPIA §72 (or GDPR Art. 44, or the EU AI Act's high-risk-system obligations) that's a problem to document, not a default.
It produces no receipt. The validation happened, a score came back, nothing was recorded in a form that survives an audit.

semantix replaces that reflex with a local, deterministic validator and a tamper-evident log. Every validation produces a signed JSON-LD certificate hash-chained to the previous one. Modify any entry and every subsequent hash breaks. The regulator doesn't need to trust your database — the math proves the chain is intact.

What you get

1. Validation as a decorator

from semantix import Intent, validate_intent

class MedicalAdvice(Intent):
    """The text provides a medical diagnosis or treatment recommendation."""

@validate_intent(~MedicalAdvice)  # Must NOT give medical advice
def chatbot(msg: str) -> str:
    return call_my_llm(msg)

Compose with & (all must pass) and | (any must pass):

SafeAndPolite = Polite & ~MedicalAdvice & ~LegalAdvice

2. Tamper-evident audit trail

from semantix.audit.engine import AuditEngine
engine = AuditEngine()

# Every @validate_intent call with audit=True writes a hash-chained certificate.
engine.verify_chain()  # True if no tampering

Each certificate records the hash of the validated text, the intent, the judge identity and configuration, the verdict, the timestamp, and the hash of the previous certificate. Compatible with JSON-LD tooling and standard audit pipelines.

3. Self-healing retries

On failure, semantix injects structured feedback so the LLM knows what went wrong:

from typing import Optional

@validate_intent(ResolutionPolite, retries=2)
def reply(msg: str, semantix_feedback: Optional[str] = None) -> str:
    prompt = f"Reply to: {msg}"
    if semantix_feedback:
        prompt += f"\n\n{semantix_feedback}"
    return call_llm(prompt)

First call: semantix_feedback is None. On retry: it receives a Markdown report with the score, reason, and rejected output. Measured reliability improves from 21% to 70% across three intent categories.

4. Forensic token-level attribution

from semantix import ForensicJudge, QuantizedNLIJudge
judge = ForensicJudge(QuantizedNLIJudge())
# Verdict.reason: "Suspect tokens: [indemnify, forfeit, waive]"

5. pytest integration

from semantix.testing import assert_semantic

def test_chatbot_is_polite():
    response = my_chatbot("handle angry customer")
    assert_semantic(response, "polite and professional")

On failure:

AssertionError: Semantic check failed (score=0.12)
  Intent:  polite and professional
  Output:  "You're an idiot for asking that."
  Reason:  Text contains aggressive language

First-class pytest plugin with fixtures, markers, and CI reporting: pytest-semantix.

Framework integrations

Drop into your existing stack — retries are handled natively by each framework.

DSPy

import dspy
from semantix.integrations.dspy import semantic_reward

qa = dspy.ChainOfThought("question -> answer")
refined = dspy.Refine(module=qa, N=3, reward_fn=semantic_reward(Polite))

semantic_reward / semantic_metric also plug into dspy.BestOfN, dspy.Evaluate, and MIPROv2 — local, no API calls, ~15 ms per eval. See benchmarks/ for reproducible comparisons against LLM-judge reward functions.

LangChain

from semantix.integrations.langchain import SemanticValidator
validator = SemanticValidator(Polite)
chain = prompt | llm | StrOutputParser() | validator

Pydantic AI

from pydantic_ai import Agent
from semantix.integrations.pydantic_ai import semantix_validator
agent = Agent("openai:gpt-4o", output_type=str)
agent.output_validator(semantix_validator(Polite))

Guardrails AI

from guardrails import Guard
from semantix.integrations.guardrails import SemanticIntent
guard = Guard().use(SemanticIntent("must be polite and professional"))

Instructor

from semantix.integrations.instructor import SemanticStr
from pydantic import BaseModel
class Response(BaseModel):
    reply: SemanticStr["must be polite and professional", 0.85]

MCP

pip install "semantix-ai[mcp,nli]"
mcp run semantix/mcp/server.py

Any MCP-capable agent (Claude Desktop, Cursor, etc.) can validate intents as a tool.

GitHub Actions

- uses: labrat-akhona/semantic-test-action@v1
  with:
    test-path: tests/

Posts a semantic test report as a PR comment.

Install extras: pip install "semantix-ai[dspy]", "[langchain]", "[pydantic-ai]", "[guardrails]", "[instructor]", "[mcp]", "[all]".

Pluggable judges

Choose the speed / accuracy / reasoning trade-off:

from semantix import NLIJudge, EmbeddingJudge, LLMJudge, CachingJudge

@validate_intent(judge=NLIJudge())                           # local, ~15 ms, deterministic
@validate_intent(judge=EmbeddingJudge())                     # local, ~5 ms, similarity-based
@validate_intent(judge=LLMJudge(model="gpt-4o-mini"))        # reasoning, ~500 ms, API
@validate_intent(judge=CachingJudge(NLIJudge(), maxsize=256))  # LRU-wrapped

Quantized mode (INT8 ONNX, ~25 MB, no PyTorch):

pip install "semantix-ai[turbo]"

When this is the right tool

You're running an LLM-backed system that processes personal information and need an auditable validation step.
You're optimising a DSPy program and the LLM-judge reward loop is too slow, too expensive, or too non-deterministic.
You need semantic test assertions in pytest / CI that don't call a paid API.
You're in a regulated industry (financial services, insurance, healthcare) and "the model said it was fine" isn't a defensible answer.

When it isn't

Your validation intent requires multi-hop reasoning or world knowledge ("is this compliant with section 4(b) of the 2026 tax code"). NLI can't do this; reasoning LLMs can.
You need the judge to explain why in prose, not just give a score.
You're evaluating fewer than 100 outputs per month and the latency / cost of LLM-as-judge doesn't matter.

See Where semantix fits for a comparison against TruLens, DeepEval, Vectara HHEM, Guardrails, RAGAS, and NeMo.

Key properties

Local inference — NLI model runs on CPU, no data leaves your machine.
Deterministic — same input, same score, every time, on every machine. Seedable.
Fast — ~15-50 ms per check with the quantized judge.
Zero API cost — no tokens burned for validation.
Auditable — hash-chained JSON-LD certificates per check.
Well-tested — 249 tests, MIT licensed.

Installation

pip install semantix-ai                    # Core (default NLI judge)
pip install "semantix-ai[turbo]"           # Quantized ONNX (smallest footprint)
pip install "semantix-ai[openai]"          # LLM judge (GPT-4o-mini)
pip install "semantix-ai[all]"             # Everything

Package name on PyPI is semantix-ai. Import is from semantix import ....

Contributing

See CONTRIBUTING.md for dev setup, testing, and submission guidelines.

License

MIT — see LICENSE.

Built by Akhona Eland in South Africa

from github.com/labrat-akhona/semantix-ai

Как установить

Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.

{
  "mcpServers": {
    "semantix-verify": {
      "command": "npx",
      "args": []
    }
  }
}

Semantix Verify

Бесплатно

Semantix-Verify is an MCP server for semantic validation of AI/LLM outputs. It exposes a single tool, verify_text_intent(text, intent_description, threshold), w

автор: labrat-akhona

GitHub

Описание

README

semantix-ai

Provable semantic validation for LLM outputs. Local. Deterministic. Auditable.

Validate every LLM output against an explicit intent. Get back a score, a verdict, and a tamper-evident receipt. Locally. In ~15-50 milliseconds. Without an API key.

pip install semantix-ai

from semantix import Intent, validate_intent

class ResolutionPolite(Intent):
    """The response must acknowledge the customer's issue and propose a concrete next step, in a polite tone."""

@validate_intent(ResolutionPolite, audit=True)
def handle_complaint(message: str) -> str:
    return call_my_llm(message)

reply = handle_complaint(incoming)
# Returns the validated reply — or raises SemanticIntentError.
# The audit engine has already written a hash-chained receipt to disk.

Why this exists

LLM applications quietly skip the step where you prove the output was fit for purpose. The common fix — calling a bigger LLM as a judge — has three problems:

It drifts. Same input, different score on different runs. A regulator asking "rerun this validation" gets a different answer, which is indistinguishable from evidence the system is broken.
It ships personal information out of your network. Every judge call sends the output to a third-party API. Under POPIA §72 (or GDPR Art. 44, or the EU AI Act's high-risk-system obligations) that's a problem to document, not a default.
It produces no receipt. The validation happened, a score came back, nothing was recorded in a form that survives an audit.

What you get

1. Validation as a decorator

from semantix import Intent, validate_intent

class MedicalAdvice(Intent):
    """The text provides a medical diagnosis or treatment recommendation."""

@validate_intent(~MedicalAdvice)  # Must NOT give medical advice
def chatbot(msg: str) -> str:
    return call_my_llm(msg)

Compose with & (all must pass) and | (any must pass):

SafeAndPolite = Polite & ~MedicalAdvice & ~LegalAdvice

2. Tamper-evident audit trail

from semantix.audit.engine import AuditEngine
engine = AuditEngine()

# Every @validate_intent call with audit=True writes a hash-chained certificate.
engine.verify_chain()  # True if no tampering

3. Self-healing retries

On failure, semantix injects structured feedback so the LLM knows what went wrong:

from typing import Optional

@validate_intent(ResolutionPolite, retries=2)
def reply(msg: str, semantix_feedback: Optional[str] = None) -> str:
    prompt = f"Reply to: {msg}"
    if semantix_feedback:
        prompt += f"\n\n{semantix_feedback}"
    return call_llm(prompt)

4. Forensic token-level attribution

from semantix import ForensicJudge, QuantizedNLIJudge
judge = ForensicJudge(QuantizedNLIJudge())
# Verdict.reason: "Suspect tokens: [indemnify, forfeit, waive]"

5. pytest integration

from semantix.testing import assert_semantic

def test_chatbot_is_polite():
    response = my_chatbot("handle angry customer")
    assert_semantic(response, "polite and professional")

On failure:

AssertionError: Semantic check failed (score=0.12)
  Intent:  polite and professional
  Output:  "You're an idiot for asking that."
  Reason:  Text contains aggressive language

First-class pytest plugin with fixtures, markers, and CI reporting: pytest-semantix.

Framework integrations

Drop into your existing stack — retries are handled natively by each framework.

DSPy

import dspy
from semantix.integrations.dspy import semantic_reward

qa = dspy.ChainOfThought("question -> answer")
refined = dspy.Refine(module=qa, N=3, reward_fn=semantic_reward(Polite))

LangChain

from semantix.integrations.langchain import SemanticValidator
validator = SemanticValidator(Polite)
chain = prompt | llm | StrOutputParser() | validator

Pydantic AI

from pydantic_ai import Agent
from semantix.integrations.pydantic_ai import semantix_validator
agent = Agent("openai:gpt-4o", output_type=str)
agent.output_validator(semantix_validator(Polite))

Guardrails AI

from guardrails import Guard
from semantix.integrations.guardrails import SemanticIntent
guard = Guard().use(SemanticIntent("must be polite and professional"))

Instructor

from semantix.integrations.instructor import SemanticStr
from pydantic import BaseModel
class Response(BaseModel):
    reply: SemanticStr["must be polite and professional", 0.85]

MCP

pip install "semantix-ai[mcp,nli]"
mcp run semantix/mcp/server.py

Any MCP-capable agent (Claude Desktop, Cursor, etc.) can validate intents as a tool.

GitHub Actions

- uses: labrat-akhona/semantic-test-action@v1
  with:
    test-path: tests/

Posts a semantic test report as a PR comment.

Install extras: pip install "semantix-ai[dspy]", "[langchain]", "[pydantic-ai]", "[guardrails]", "[instructor]", "[mcp]", "[all]".

Pluggable judges

Choose the speed / accuracy / reasoning trade-off:

from semantix import NLIJudge, EmbeddingJudge, LLMJudge, CachingJudge

@validate_intent(judge=NLIJudge())                           # local, ~15 ms, deterministic
@validate_intent(judge=EmbeddingJudge())                     # local, ~5 ms, similarity-based
@validate_intent(judge=LLMJudge(model="gpt-4o-mini"))        # reasoning, ~500 ms, API
@validate_intent(judge=CachingJudge(NLIJudge(), maxsize=256))  # LRU-wrapped

Quantized mode (INT8 ONNX, ~25 MB, no PyTorch):

pip install "semantix-ai[turbo]"

When this is the right tool

You're running an LLM-backed system that processes personal information and need an auditable validation step.
You're optimising a DSPy program and the LLM-judge reward loop is too slow, too expensive, or too non-deterministic.
You need semantic test assertions in pytest / CI that don't call a paid API.
You're in a regulated industry (financial services, insurance, healthcare) and "the model said it was fine" isn't a defensible answer.

When it isn't

Your validation intent requires multi-hop reasoning or world knowledge ("is this compliant with section 4(b) of the 2026 tax code"). NLI can't do this; reasoning LLMs can.
You need the judge to explain why in prose, not just give a score.
You're evaluating fewer than 100 outputs per month and the latency / cost of LLM-as-judge doesn't matter.

See Where semantix fits for a comparison against TruLens, DeepEval, Vectara HHEM, Guardrails, RAGAS, and NeMo.

Key properties

Local inference — NLI model runs on CPU, no data leaves your machine.
Deterministic — same input, same score, every time, on every machine. Seedable.
Fast — ~15-50 ms per check with the quantized judge.
Zero API cost — no tokens burned for validation.
Auditable — hash-chained JSON-LD certificates per check.
Well-tested — 249 tests, MIT licensed.

Installation

pip install semantix-ai                    # Core (default NLI judge)
pip install "semantix-ai[turbo]"           # Quantized ONNX (smallest footprint)
pip install "semantix-ai[openai]"          # LLM judge (GPT-4o-mini)
pip install "semantix-ai[all]"             # Everything

Package name on PyPI is semantix-ai. Import is from semantix import ....

Contributing

See CONTRIBUTING.md for dev setup, testing, and submission guidelines.

License

MIT — see LICENSE.

Built by Akhona Eland in South Africa

from github.com/labrat-akhona/semantix-ai

Как установить

Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.

{
  "mcpServers": {
    "semantix-verify": {
      "command": "npx",
      "args": []
    }
  }
}

Command Palette

Semantix Verify

Описание

README

semantix-ai

Why this exists

What you get

1. Validation as a decorator

2. Tamper-evident audit trail

3. Self-healing retries

4. Forensic token-level attribution

5. pytest integration

Framework integrations

DSPy

Pluggable judges

When this is the right tool

When it isn't

Key properties

Installation

Contributing

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant

Semantix Verify

Описание

README

semantix-ai

Why this exists

What you get

1. Validation as a decorator

2. Tamper-evident audit trail

3. Self-healing retries

4. Forensic token-level attribution

5. pytest integration

Framework integrations

DSPy

Pluggable judges

When this is the right tool

When it isn't

Key properties

Installation

Contributing

License

Как установить

Похожие MCP

Fetch

AWS KB Retrieval

Spring AI MCP Server

llm-analysis-assistant