Project Code Intelligence

Name: Project Code Intelligence
Availability: InStock
Author: cvandesande

Бесплатно

Hardware-accelerated codebase mapping that indexes Git repositories into Postgres/pgvector and serves code search, relationships, and static analysis results vi

автор: cvandesande

GitHub

Описание

Hardware-accelerated codebase mapping that indexes Git repositories into Postgres/pgvector and serves code search, relationships, and static analysis results via a stdio MCP server.

README

Hardware-Accelerated Codebase Mapping

project-code-intelligence indexes a Git repository into Postgres/pgvector and serves the result through a small stdio MCP server.

The goal is higher-quality agent results: reuse a local code index instead of re-reading the same repository over and over, reducing token and embedding cost while making codebase navigation faster.

It can store:

repository snapshots and file inventory
functions, classes, symbols, docs, config, and other code records
candidate relationships between records
SARIF/static-analysis findings and code-flow steps
semantic embeddings for similarity search

The package is generic by default. Project-specific behavior belongs in code profiles, with example.py as the public example.

Quick Start

Install the CLI once, then pass the repository path you want to index.

cd /path/to/project-code-intelligence
uv tool install --editable .
pci-doctor --skip-db --embedding skip

The first pci-doctor run prints startup commands that fit the current machine. For a fully local setup, start pgvector plus one embedding service. If you already have an external Postgres/pgvector database, skip pgvector and start only the embedding service you want to use. Then verify the chosen services:

pci-doctor --embedding required

Text-only indexing is available as a fallback for bootstrap, debugging, or privacy-sensitive environments. In that case, choose the Postgres-only command and verify with pci-doctor --embedding skip.

Then index a Git repository. Use . when you mean the current directory:

cd /path/to/repo-to-index
pci-index . --dry-run
pci-index .
pci-mcp-smoke

You can also index one or more repositories without changing directories:

pci-index /path/to/repo-to-index
pci-index /path/to/repo-a /path/to/repo-b

For advanced ingest options, put them after --:

pci-index /path/to/repo-to-index -- --limit-files 100

If indexing is interrupted, rerun the same command. pci-index . reuses the same snapshot when the Git tree is unchanged, keeps compatible existing embeddings, and fills in records that are still missing embeddings. In normal incremental mode it only reparses changed files.

For that fallback text-only mode, run pci-index . --no-embed.

To wipe and rebuild the code-intelligence tables in the configured database:

pci-index --reset-code-intel

This drops and recreates this project's project_code_intel_* tables. It does not drop the database or unrelated tables. The command prints the resolved database target, asks for confirmation before deleting anything, and exits without scanning. Run pci-index . afterwards to rebuild the index. For non-interactive automation, add --i-know-this-deletes-code-intel-db.

In a brand-new local repository, make an initial commit before scanning so the indexer has a Git HEAD snapshot.

Installation

For development:

cd /path/to/project-code-intelligence
uv sync --extra dev
export PATH="$PWD/.venv/bin:$PATH"

For normal CLI use, install it once as a uv tool. This command can be run from any directory; it installs console scripts into your user tool path, usually ~/.local/bin.

uv tool install --editable /path/to/project-code-intelligence

After that, run pci-index . from any repository you want to scan:

cd /path/to/repo-to-index
pci-index .

Make sure the tool path is on PATH:

export PATH="$HOME/.local/bin:$PATH"

For use inside a specific repository virtualenv instead:

cd /path/to/repo-to-index
uv venv
. .venv/bin/activate
uv pip install -e /path/to/project-code-intelligence

Without uv, create and activate a virtualenv first:

python -m venv .venv
. .venv/bin/activate
python -m pip install -e /path/to/project-code-intelligence

The installed console scripts are:

pci-index
pci-doctor
pci-mcp
pci-mcp-smoke
pci-embedding-bench
pci-fastembed-server
pci-llama-embed

MCP Setup

Point Codex, Claude Desktop, or another MCP client at pci-mcp:

{
  "mcpServers": {
    "project-code-intelligence": {
      "command": "/path/to/project-code-intelligence/pci-mcp"
    }
  }
}

The default database settings match the local Docker Compose database. For a different Postgres/pgvector instance, prefer one database URL:

export PROJECT_CODE_INTELLIGENCE_DATABASE_URL='postgresql://user:password@host:5432/database?sslmode=prefer'

Percent-encode special characters in the username or password.

The older split PGVECTOR_* variables remain supported, mostly for Docker Compose and compatibility.

The MCP server is read-only by default and applies per-request database safety limits. Expensive queries are bounded by PROJECT_CODE_INTELLIGENCE_MCP_STATEMENT_TIMEOUT_MS, lock waits by PROJECT_CODE_INTELLIGENCE_MCP_LOCK_TIMEOUT_MS, and oversized requests by PROJECT_CODE_INTELLIGENCE_MCP_MAX_REQUEST_BYTES. get_code_intel_record returns concise metadata by default; pass include_content: true when an agent needs the indexed text, capped by PROJECT_CODE_INTELLIGENCE_MCP_MAX_RECORD_CONTENT_CHARS.

For agent-heavy workflows, copy docs/examples/AGENTS.md into the repository being indexed so coding assistants know when to use the MCP index.

Embeddings

Embeddings are the expected path for normal use. They are what make the MCP index useful for semantic search instead of only exact text lookup.

Common paths are CPU FastEmbed, AMD Ryzen AI NPU, AMD GPU, NVIDIA GPU, and remote OpenAI-compatible providers. pci-doctor prints the exact service commands that are available on the current machine.

Local CPU, NPU, and GPU embedding services all publish the same host endpoint by default: http://127.0.0.1:18081/v1/embeddings. Run only one local embedding service at a time. Runtime-specific models have profile defaults; set model environment variables only when overriding those defaults.

Run pci-doctor to see which paths are available on the current machine:

pci-doctor --embedding required

pci-index itself does not download models. The Docker Compose embedding profiles may download models into Docker volumes or ignored local paths.

Remote embedding endpoints receive source-derived text. For private code, use a local endpoint or a provider you trust, and set PROJECT_CODE_INTELLIGENCE_ALLOW_REMOTE_EMBEDDING=1 only intentionally.

Docker Compose Profiles

Profiles are runtime choices, not project modes. The local database is isolated from the embedding services so users with an external Postgres/pgvector database can start embeddings without also starting a local database.

Profile or service	Use when
`pgvector` (`db`)	Local Postgres/pgvector database. Skip this when using an external database.
`cpu` (`fastembed`)	Portable local semantic-search demo with FastEmbed.
`npu` (`lemonade-npu`)	Experimental AMD Ryzen AI/XDNA NPU embeddings.
`amdgpu` (`llama-rocm`)	Experimental AMD ROCm llama.cpp embeddings.
`nvidia` (`llama-cuda`)	Experimental NVIDIA CUDA llama.cpp embeddings.

List the profiles with:

docker compose config --profiles

For a local database, start:

docker compose up -d pgvector

For embeddings only, start the specific service:

docker compose --profile cpu up -d --build fastembed
docker compose --profile npu up -d lemonade-npu
docker compose --profile amdgpu up -d --build llama-rocm
docker compose --profile nvidia up -d --build llama-cuda

Most users should start with cpu, then let pci-doctor suggest hardware specific commands if local acceleration is available.

Docker Lifecycle

Use the exact service commands suggested by pci-doctor. Start pgvector only when you want the local database; omit it when PROJECT_CODE_INTELLIGENCE_DATABASE_URL points at an external database. Use stop when you want to pause containers but keep them around:

docker compose stop

Use down for normal cleanup. This removes containers and the Compose network while keeping the local database and downloaded model caches:

docker compose down

Use down -v only when you intentionally want a fresh database and fresh Docker-managed model caches:

docker compose down -v

That deletes the named volumes for Postgres, FastEmbed, Lemonade, and ROCm runtime caches. It does not delete the bind-mounted ./models directory used by the GPU profiles.

On Apple Silicon, Docker Compose is still useful for Postgres/pgvector. Local Apple GPU embeddings should run on the macOS host, not inside Docker.

What the MCP Server Provides

The server exposes tools for:

checking indexed snapshot and embedding status
text and semantic search over indexed records
fetching individual records
following candidate relationships
searching SARIF/static-analysis findings
fetching CodeQL/SARIF code-flow steps

The MCP server runs over stdio. Docker Compose is used for local dependencies, not for wrapping the MCP process.

Project Profiles

The generic profile covers common source, docs, build files, config files, and SARIF input. A project can add its own profile for domain-specific file roles, metadata, records, or security context.

Private profiles do not need to be registered in this package. Put them on PYTHONPATH and select them with a fully qualified profile path:

PROJECT_CODE_INTELLIGENCE_PROFILE=my_project.code_profile:MyProjectProfile pci-index .

Profiles are ordinary Python code, so load them only from trusted local modules.

Development

Run the local quality gate:

make check

Run the integration smoke. This starts the local Compose pgvector service if needed:

make integration-smoke

Useful docs:

CONTRIBUTING.md: contributor workflow and guardrails
docs/PUBLIC_API.md: supported CLI, MCP, config, and Python import surfaces
docs/BENCHMARKS.md: local CPU/NPU/GPU benchmark notes
.env.example: available environment variables
AGENTS.md: instructions for assistants working on this repo

Privacy

Do not publish database dumps, restore artifacts, SARIF output, embedding caches, model files, vector indexes, local MCP configs, or generated data from private repositories. These can contain source snippets, internal paths, symbols, findings, metadata, and embeddings derived from source text.

License

MIT. See LICENSE.

Как установить

Выполни в терминале:

claude mcp add project-code-intelligence -- npx

Project Code Intelligence

Описание

README

Hardware-Accelerated Codebase Mapping

Quick Start

Installation

MCP Setup

Embeddings

Docker Compose Profiles

Docker Lifecycle

What the MCP Server Provides

Project Profiles

Development

Privacy

License

Как установить

Похожие MCP

Compare Project Code Intelligence with

Postgres

PostgreSQL

Redis

SQLite

Command Palette

Project Code Intelligence

Описание

README

Hardware-Accelerated Codebase Mapping

Quick Start

Installation

MCP Setup

Embeddings

Docker Compose Profiles

Docker Lifecycle

What the MCP Server Provides

Project Profiles

Development

Privacy

License

Как установить

Похожие MCP

Compare Project Code Intelligence with

Postgres

PostgreSQL

Redis

SQLite