loading…
Search for a command to run...
loading…
An MCP server that provides queryable access to Anti-Money Laundering (AML) red flag knowledge from regulatory documents. It enables compliance officers to ask
An MCP server that provides queryable access to Anti-Money Laundering (AML) red flag knowledge from regulatory documents. It enables compliance officers to ask natural-language questions and receive relevant, sourced red flags from a local vector database.
MCP server exposing AML red flag knowledge as queryable tools. Compliance officers ask natural-language questions; the server returns relevant, sourced red flags from a local vector database.
Three distinct workflows:
scripts/extract.py takes a regulatory document (PDF file or URL), sends its text to an OpenAI model, and writes a structured YAML file into data/source/. Each extracted entry includes a source_url linking back to the original document.
uv sync --extra dev
export OPENAI_API_KEY=sk-...
Step-by-step:
red_flag_sources/pdflinks.txt — one URL per line, in serial order. Line 1 → key 001, line 2 → 002, etc.red_flag_sources/pdf/ named NNN_short_descriptive_name.pdf, where NNN matches its line position in pdflinks.txt.uv run python scripts/build_sources_registry.pyuv run python scripts/extract.py --parallelKey constraint: the
NNN_prefix in the filename must match the line number inpdflinks.txt. Line 1 =001_*.pdf, line 2 =002_*.pdf, etc. This is how the extractor links each PDF to its public source URL.
PDFs are stored in red_flag_sources/pdf/ and must be named with a zero-padded serial prefix:
red_flag_sources/pdf/
001_fincen_alert_russian_sanctions_evasion.pdf
002_ffiec_bsa_aml_examination_manual.pdf
003_fatf_guidance_virtual_assets.pdf
Each serial number maps to a public URL for the source document. Maintain this mapping in red_flag_sources/pdflinks.txt — one URL per line, in serial order:
# FinCEN Russian Sanctions Evasion Alert
https://fincen.gov/sites/default/files/2022-06/Alert%20FIN-2022-Alert001_508C.pdf
# FFIEC BSA/AML Examination Manual
https://bsaaml.ffiec.gov/manual
# FATF Guidance on Virtual Assets
https://www.fatf-gafi.org/...
Blank lines and lines starting with # are ignored. After editing pdflinks.txt, regenerate sources.yaml:
uv run python scripts/build_sources_registry.py
Then run batch extraction:
uv run python scripts/extract.py --parallel
Only new (unprocessed) PDFs are extracted — previously processed sources are skipped automatically.
# Sequential batch
uv run python scripts/extract.py
# Parallel batch (4 workers by default)
uv run python scripts/extract.py --parallel
# Parallel batch with custom worker count
uv run python scripts/extract.py --parallel 8
# Force re-extract everything
uv run python scripts/extract.py --force --parallel
# Process only PDFs in a serial range (e.g. 001 through 005)
uv run python scripts/extract.py --range 001-005
# Range + parallel
uv run python scripts/extract.py --range 001-005 --parallel
# Force re-extract a range
uv run python scripts/extract.py --force --range 001-005 --parallel
Note:
--rangeapplies only to numbered PDFs. Web URLs inWeblinks.mdare excluded when a range is active.
# Extract from a local PDF
uv run python scripts/extract.py red_flag_sources/pdf/001_fincen_alert.pdf
# Extract from a URL
uv run python scripts/extract.py https://example.com/regulatory-guidance
# Re-extract a source that was already processed
uv run python scripts/extract.py --force red_flag_sources/pdf/001_fincen_alert.pdf
For single-source PDFs, add the URL to pdflinks.txt and run build_sources_registry.py first so the extractor can populate source_url in the output.
gpt-4o-mini (override with OPENAI_EXTRACTION_MODEL) to extract every distinct AML red flag indicator as structured JSONRedFlagSource schema; invalid entries are skipped with a warningdata/source/<slug>.yaml, one entry per red flagdata/source/.extracted_sources.yaml to prevent re-processingEach entry in the YAML file has the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique identifier, e.g. 001-fincen-alert-01 |
description |
string | yes | Standalone description of the red flag indicator |
source_url |
string | no | Public URL of the source document |
product_types |
list[string] | no | Financial products this applies to (e.g. depository, crypto, msb) |
industry_types |
list[string] | no | Customer industries or sectors this applies to (e.g. oil_and_gas, government_benefits) |
customer_profiles |
list[string] | no | Customer archetypes this applies to (e.g. small_business, charity_or_nonprofit) |
geographic_footprints |
list[string] | no | Relevant geographies or corridors (e.g. southwest_border, mexico) |
regulatory_source |
string | no | Source document name or authority (e.g. FinCEN Alert FIN-2022-Alert001) |
risk_level |
string | no | high, medium, or low |
category |
string | no | AML typology (e.g. structuring, sanctions_evasion, shell_company) |
simulation_type |
string | no | Optional simulation complexity code (e.g. 1A, 2B) |
data/source/.extracted_sources.yaml tracks every processed source by its canonical path or URL. Sources already in the manifest are skipped in both batch and single-source mode. Use --force to re-extract a source regardless.
After extraction, embed the YAML files and load them into the vector database:
uv run python scripts/ingest.py
For the initial local corpus, ingest only the three target files:
uv run python scripts/ingest.py \
data/source/001_federal_child_nutrition_fraud.yaml \
data/source/002_oil_smuggling_cartels.yaml \
data/source/003_bulk_cash_smuggling_repatriation.yaml
This generates embeddings with nomic-embed-text-v1.5 and upserts records into LanceDB at data/vectors/. Run ingestion before connecting the MCP server to a desktop client; the embedding model downloads on first use and is better cached during ingestion than during server startup.
OPENAI_API_KEY is optional for ingestion. When it is set, ingestion can auto-tag missing metadata into the derived LanceDB records. When it is not set, ingestion preserves available YAML metadata and leaves missing rich consultation fields empty. Source YAML files are not rewritten by ingestion.
# Start server (stdio mode, for Claude Desktop / Claude Code)
uv run python -m redflag_mcp
# Start in MCP inspector
uv run mcp dev src/redflag_mcp/server.py
# Start as HTTP server (for OpenAI agents or other HTTP clients)
MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run python -m redflag_mcp
The server exposes three tools: search_red_flags, get_red_flag, and list_filters. It is fully offline after ingestion — no API keys required at query time.
For local Codex threads, prefer stdio so Codex starts the MCP server automatically:
codex mcp add redflag-mcp -- zsh -lc 'cd /Users/learningmachine/Documents/Python-dev/redflag-mcp && HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 uv run python -m redflag_mcp'
Verify the registration:
codex mcp list
codex mcp get redflag-mcp
Then start a new Codex thread and ask for the server by name, for example:
Use the redflag-mcp MCP server. List the available AML red flag filters.
If you already have the HTTP server running, you can register that instead:
codex mcp add redflag-mcp-http --url http://127.0.0.1:8000/mcp
After ingesting the three target files, verify the tools with:
list_filters
search_red_flags(query="federal child nutrition program sponsor receives reimbursements inconsistent with its profile", product_types=["depository"])
search_red_flags(query="southwest border oil company wires for waste oil or hazardous materials")
search_red_flags(query="bulk cash moved by armored car service to Mexico")
get_red_flag(red_flag_id="001_federal_child_nutrition_fraud-01")
For a vague query such as "what should I look for in business accounts?", the calling agent should first ask a brief consultation question covering product/channel, industry, customer profile, geography, and transaction channel or volume. For a specific query, it should search directly.
uv sync --extra dev # Install dependencies
uv run pytest tests/ # Run tests
uv run ruff check src/ # Lint
uv run mypy src/ # Type check
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"redflag-mcp": {
"command": "npx",
"args": []
}
}
}