loading…
Search for a command to run...
loading…
An MCP server that enables automated analysis and conversion of SSIS packages into Azure Data Factory JSON artifacts directly through GitHub Copilot. It streaml
An MCP server that enables automated analysis and conversion of SSIS packages into Azure Data Factory JSON artifacts directly through GitHub Copilot. It streamlines ETL modernization by detecting complex patterns, translating script tasks, and generating production-ready pipelines following Microsoft best practices.
An MCP (Model Context Protocol) server that reads SSIS packages (.dtsx) and converts them to Azure Data Factory (ADF) JSON artifacts, exposed as tools directly inside GitHub Copilot.
All generated artifacts follow Microsoft Recommended patterns from learn.microsoft.com.
.dtsx file(s) ──┐
│ ┌────────────────────────┐
SQL Agent jobs ───┤ │ Optional configs: │
├─────▶│ • ESI tables JSON │
Config files ───┘ │ • Schema remap JSON │
│ • Shared artifacts dir │
└──────────┬─────────────┘
▼
┌─────────────────────────────┐
│ ssis-adf-agent │ ← MCP stdio server
│ │
│ scan → analyze → convert │
│ → validate → deploy │
│ │
│ Detects: │
│ • Cross-DB / linked server │
│ • Delta / MERGE patterns │
│ • CDM-layer logic │
│ • ESI reuse candidates │
└──────────┬──────────────────┘
▼
ADF JSON artifacts
(pipeline / linkedService / dataset /
dataflow / trigger / stubs)
▼
Azure Data Factory
| Requirement | Version / Notes |
|---|---|
| Python | 3.11 or later |
| pip | Included with Python; or use uv / hatch |
| Git | Required if scanning packages from a Git repository |
| ODBC Driver for SQL Server | 17 or later — required only when scanning packages from SQL Server (msdb) |
| Azure CLI | az — required for az login when deploying from a developer machine |
| VS Code | Latest stable |
| GitHub Copilot extension | With agent / MCP support enabled |
Clone the repository and install in editable mode (recommended for development):
git clone https://github.com/chsimons_microsoft/ssis_adf_agent.git
cd ssis_adf_agent
pip install -e .
To also install development tools (pytest, ruff, mypy):
pip install -e ".[dev]"
To enable automatic C# → Python translation of Script Tasks via Azure OpenAI:
pip install -e ".[llm]"
Verify the installation:
ssis-adf-agent --help
Note: When the package is published to PyPI, you can install it with
pip install ssis-adf-agentwithout cloning the repository.
Add the server to your VS Code settings.json so GitHub Copilot can discover it as a set of agent tools.
Ctrl+Shift+P) → Preferences: Open User Settings (JSON){
"github.copilot.chat.experimental.mcpServers": {
"ssis-adf-agent": {
"type": "stdio",
"command": "ssis-adf-agent",
"args": []
}
}
}
If you installed into a virtual environment, replace
"command": "ssis-adf-agent"with the full path to the script, e.g."C:\\path\\to\\.venv\\Scripts\\ssis-adf-agent.exe"(Windows) or"/path/to/.venv/bin/ssis-adf-agent"(macOS/Linux).
Ctrl+Shift+P → Developer: Reload Window).scan_ssis_packagesanalyze_ssis_packageconvert_ssis_packagevalidate_adf_artifactsdeploy_to_adfThe samples/ directory is intended as a convenient drop zone for .dtsx files you want to experiment with locally.
Copy one or more .dtsx files into samples/:
samples/
MyETLPackage.dtsx
LoadDimCustomer.dtsx
When using any tool that requires a package_path or path_or_connection, supply the absolute path to the file or directory. For example:
C:\Users\you\ssis_adf_agent\samples\MyETLPackage.dtsx/home/you/ssis_adf_agent/samples/MyETLPackage.dtsxFor output, create a directory alongside samples/ (e.g. adf_output/) to keep generated artifacts separate from source packages.
The
samples/directory is.gitignore-friendly — add your test packages there without worrying about committing proprietary SSIS files.
All five tools are invoked from GitHub Copilot Chat in Agent mode. Type your request in natural language and Copilot will call the appropriate tool(s). The sections below show what each tool does and the key parameters it accepts.
Tool: scan_ssis_packages
Discovers all .dtsx files from a local directory, a Git repository, or SQL Server (msdb).
Example prompts:
Scan C:\Projects\LegacyETL for all SSIS packages.
Scan the git repo at https://github.com/myorg/etl-packages for SSIS packages on the release branch.
List all SSIS packages stored in SQL Server at SERVER=MYSERVER;DATABASE=msdb.
Key parameters:
| Parameter | Required | Description |
|---|---|---|
source_type |
Yes | local, git, or sql |
path_or_connection |
Yes | Local directory path, Git repo URL, or SQL connection string |
recursive |
No | Search subdirectories (default: true) |
git_branch |
No | Branch to check out when source_type is git (default: main) |
Tool: analyze_ssis_package
Produces a complexity score, gap analysis, component inventory, cross-database/linked server detection, CDM pattern flags, and optional ESI reuse candidates for a single package. Run this before converting to understand the scope of manual work required.
Example prompts:
Analyze the SSIS package at C:\Projects\LegacyETL\LoadFactSales.dtsx and tell me how complex it is.
Analyze C:\Projects\LegacyETL\LoadFactSales.dtsx with ESI tables config at C:\config\esi_tables.json.
Key parameters:
| Parameter | Required | Description |
|---|---|---|
package_path |
Yes | Absolute path to the .dtsx file |
esi_tables_path |
No | Path to a JSON file mapping source systems to ESI-available tables (see ESI Reuse Detection) |
Complexity score guide:
| Score | Label | Typical Effort |
|---|---|---|
| 0–25 | Low | < 1 day |
| 26–50 | Medium | 1–3 days |
| 51–75 | High | 3–5 days |
| 76–100 | Very High | 1+ weeks |
Score drivers: Script Tasks (+20 each), Data Flow Tasks (+8 each), ForEach/ForLoop containers (+5 each), unknown task types (+10 each), linked server references (+8 each), cross-database references (+3 each).
Key output:
manual_required / warning / infoTool: convert_ssis_package
Converts a single .dtsx file to a complete set of ADF JSON artifacts.
Example prompt:
Convert C:\Projects\LegacyETL\LoadFactSales.dtsx to ADF artifacts and write them to C:\adf_output\LoadFactSales.
Key parameters:
| Parameter | Required | Description |
|---|---|---|
package_path |
Yes | Absolute path to the .dtsx file |
output_dir |
Yes | Directory to write artifacts into |
generate_trigger |
No | Emit a ScheduleTrigger template (default: true) |
llm_translate |
No | Call Azure OpenAI to translate C# Script Tasks to Python. Default: false |
on_prem_ir_name |
No | Integration Runtime name for on-prem connections (default: SelfHostedIR) |
auth_type |
No | Default auth for Azure SQL linked services: SystemAssignedManagedIdentity (default), SQL, or ServicePrincipal |
use_key_vault |
No | Use Azure Key Vault secret references for passwords (default: false) |
kv_ls_name |
No | Name for the Key Vault linked service (default: LS_KeyVault) |
kv_url |
No | Azure Key Vault base URL (default: https://TODO.vault.azure.net/) |
esi_tables_path |
No | Path to ESI tables config JSON for reuse detection |
schema_remap_path |
No | Path to schema remap JSON for database consolidation |
shared_artifacts_dir |
No | Shared directory for cross-package linked service/dataset deduplication |
pipeline_prefix |
No | Prefix for pipeline names (default: PL_) |
Sub-folders are created automatically inside output_dir. See Generated Artifact Structure.
Tool: validate_adf_artifacts
Checks the generated JSON files for structural correctness (required fields, valid activity references) before touching Azure. Always validate before deploying.
Example prompt:
Validate the ADF artifacts in C:\adf_output\LoadFactSales.
Key parameter:
| Parameter | Required | Description |
|---|---|---|
artifacts_dir |
Yes | Directory containing the generated ADF JSON files |
Fix any reported issues in the JSON files, then validate again before proceeding to deployment.
Tool: deploy_to_adf
Deploys the validated artifacts to an existing Azure Data Factory instance. Deployment order is enforced automatically: linked services → datasets → data flows → pipelines → triggers.
Important: Always run a dry run first to confirm what will be deployed without making any Azure API calls.
Example prompt (dry run):
Do a dry run deployment of C:\adf_output\LoadFactSales to my ADF instance named my-adf in resource group rg-data-prod, subscription 00000000-0000-0000-0000-000000000000.
Example prompt (live deployment):
Deploy C:\adf_output\LoadFactSales to ADF instance my-adf in resource group rg-data-prod, subscription 00000000-0000-0000-0000-000000000000.
Key parameters:
| Parameter | Required | Description |
|---|---|---|
artifacts_dir |
Yes | Directory containing generated ADF JSON artifacts |
subscription_id |
Yes | Azure subscription GUID |
resource_group |
Yes | Azure resource group name |
factory_name |
Yes | Azure Data Factory instance name |
dry_run |
No | true to log only without calling Azure APIs (default: false) |
Triggers are always deployed in Stopped state. Activate them manually in the ADF Studio after validating pipeline runs.
These features were designed for large-scale enterprise SSIS migrations where packages share connections, target consolidated databases, or operate alongside existing data platforms (ESI, CDM layers).
On-prem connections are automatically detected (heuristics: localhost, IP addresses, non-.database.windows.net server names). These connections generate SqlServer linked services with a connectVia reference to a Self-Hosted Integration Runtime. Use on_prem_ir_name to override the default name SelfHostedIR.
When use_key_vault=true, linked services reference Azure Key Vault secrets instead of embedding credentials:
{
"password": {
"type": "AzureKeyVaultSecret",
"store": { "referenceName": "LS_KeyVault", "type": "LinkedServiceReference" },
"secretName": "conn-MyDatabase-password"
}
}
A Key Vault linked service (LS_KeyVault) is auto-generated. Customize the name with kv_ls_name and the vault URL with kv_url.
When migrating multiple SSIS packages that share connection managers, pass shared_artifacts_dir to avoid duplicate linked services and datasets:
Convert LoadDimCustomer.dtsx with shared_artifacts_dir=C:\output\shared
Convert LoadFactSales.dtsx with shared_artifacts_dir=C:\output\shared
The generator writes each linked service / dataset only once. Subsequent packages that reference the same connection reuse the existing file.
When consolidating multiple on-prem databases into a single Azure SQL database, provide a schema remap config:
{
"StagingDB": "staging",
"ReportingDB": "reporting",
"DWDB": "dw"
}
Keys are original database names; values are target schemas. Pass the file path via schema_remap_path. The converter replaces cross-database references in SQL text and qualified table names in datasets.
If your organization maintains an ESI (Enterprise Source Integration) layer, you can provide a JSON config mapping source systems to tables already available in the ESI Azure SQL layer:
{
"source_system": "SAP",
"esi_database": "ESI_SAP",
"tables": ["VBAK", "VBAP", "MARA", "KNA1"]
}
Pass this file via esi_tables_path (available on both analyze and convert tools). The analyzer produces INFO-level gap items identifying data flow sources that could read from ESI instead of pulling from the on-prem source via SHIR.
The analyzer automatically detects Common Data Model (CDM) layer patterns:
Detected patterns produce INFO-level gap items with [CDM REVIEW] recommendations and cdm-review-required pipeline annotations. These are informational \u2014 they help teams decide whether to migrate the logic as-is or replace it with existing CDM entities.
When the SSIS package source is a SQL Server (sql_server source type in scan_ssis_packages), the tool reads SQL Agent job schedules from msdb. The converted trigger uses the mapped ADF recurrence:
SQL Agent freq_type |
ADF Recurrence |
|---|---|
| 4 (Daily) | Day with interval from freq_interval |
| 8 (Weekly) | Week with weekday schedule from bitmask |
| 16 (Monthly, day-of-month) | Month with day schedule |
| 32 (Monthly, relative) | Month \u2014 flag for manual review |
If no SQL Agent schedule is available, the trigger falls back to a placeholder daily-at-midnight schedule.
SSIS Script Tasks contain C# (or VB.NET) code that cannot be rule-based converted. By default the converter generates a Python Azure Function stub with TODO comments and the original source embedded as comments. When llm_translate=true is passed to convert_ssis_package, the agent extracts the embedded C# source from the DTSX binary blob and calls Azure OpenAI to produce a working Python implementation body.
DTS:ObjectData/ScriptProject/BinaryData, unzips it, and reads all .cs / .vb source files (excluding AssemblyInfo and designer files).CSharpToPythonTranslator sends the source to Azure OpenAI Chat Completions with a structured prompt that preserves business logic and replaces unsupported patterns (SQL calls, file I/O, SMTP) with # TODO comments pointing to Azure equivalents.stubs/<FunctionName>/__init__.py contains the translated Python body. The original C# is preserved as line comments below the implementation for reference.TODO stub without raising an error. A warning comment is embedded in the stub file.| Variable | Description | Default |
|---|---|---|
AZURE_OPENAI_ENDPOINT |
Your Azure OpenAI resource URL, e.g. https://my-resource.openai.azure.com/ |
required |
AZURE_OPENAI_API_KEY |
Azure OpenAI API key | required |
AZURE_OPENAI_DEPLOYMENT |
Model deployment name | gpt-4o |
The openai SDK is an optional dependency — install it alongside the package:
pip install "ssis-adf-agent[llm]"
Convert C:\Projects\LegacyETL\LoadFactSales.dtsx to C:\adf_output\LoadFactSales and translate all Script Tasks to Python using Azure OpenAI.
Note: Translated code should always be reviewed before deploying to production. The LLM preserves control flow and business logic but replaces infrastructure calls (SQL, file I/O, SMTP) with
# TODOplaceholders that require manual completion.
Three reusable prompt files are included in .vscode/ and can be invoked directly from Copilot Chat to run the full workflow with guided inputs.
| Prompt File | Mode | Description |
|---|---|---|
analyze_packages.prompt.md |
Agent | Scan a source, then analyze every package found and produce a prioritized conversion report |
convert_package.prompt.md |
Agent | Analyze, convert, and validate a single package; produces a Markdown summary with a manual-steps checklist |
deploy_adf.prompt.md |
Agent | Validate artifacts and deploy to ADF with optional dry-run |
To invoke from Copilot Chat:
Ctrl+Alt+I)/ and select the prompt file from the list, or type the prompt nameThe deploy_to_adf tool uses DefaultAzureCredential, which tries the following in order:
| Priority | Method | When to use |
|---|---|---|
| 1 | Environment variables | CI/CD pipelines (service principal) |
| 2 | Workload Identity | Azure-hosted compute (AKS, etc.) |
| 3 | Azure CLI (az login) |
Local developer machines |
| 4 | Azure PowerShell | Local developer machines |
For local development, the simplest approach is:
az login
For CI/CD pipelines, set these environment variables for a service principal:
| Variable | Description |
|---|---|
AZURE_CLIENT_ID |
Service principal application (client) ID |
AZURE_CLIENT_SECRET |
Service principal secret |
AZURE_TENANT_ID |
Azure Active Directory tenant ID |
The service principal must have the Data Factory Contributor role on the target ADF instance.
Set the following environment variables before calling convert_ssis_package with llm_translate=true:
# Windows (PowerShell)
$env:AZURE_OPENAI_ENDPOINT = "https://my-resource.openai.azure.com/"
$env:AZURE_OPENAI_API_KEY = "<your-key>"
$env:AZURE_OPENAI_DEPLOYMENT = "gpt-4o" # optional, defaults to gpt-4o
# macOS / Linux
export AZURE_OPENAI_ENDPOINT="https://my-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<your-key>"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
| SSIS Component | ADF Equivalent | Notes |
|---|---|---|
| Execute SQL Task | Stored Procedure / Script / Lookup Activity | Depends on ResultSetType and SQL pattern |
| Data Flow Task (simple) | Copy Activity | Single source → single destination. Sink pattern varies: insert (full load), upsert with keys (delta/merge). Retry policy: 2 retries, 60s interval. |
| Data Flow Task (complex) | Execute Data Flow Activity (Mapping Data Flow) | Multiple sources, transformations, or fanout. READ_UNCOMMITTED isolation, errorHandlingOption: stopOnFirstError. |
| Execute Package Task | Execute Pipeline Activity | Child pipeline must also be converted |
| Script Task (C# / VB) | Azure Function Activity | Stub generated at stubs/<FunctionName>/__init__.py. When llm_translate=true, C# source is extracted from the DTSX and translated to Python via Azure OpenAI; otherwise a TODO stub is generated. |
| ForEach Loop Container | ForEach Activity | Expression varies by enumerator type |
| For Loop Container | SetVariable (init) + Until Activity + SetVariable (increment) | Condition logic is inverted |
| Sequence Container | Flattened into parent with dependsOn chaining |
No ADF equivalent |
| File System Task | Copy Activity (Azure paths) or Web Activity → Azure Function | Local paths need Azure-path substitution |
| Execute Process Task | Web Activity → Azure Function | Manual: wrap process call in a Function |
| FTP Task | Copy Activity via FTP connector | Requires FTP linked service |
| Send Mail Task | Logic App / Web Activity | No native ADF equivalent |
Event Handler (OnError) |
Pipeline fails path / error handling | Converted to sub-pipeline reference |
Event Handler (OnPostExecute) |
Succeeded dependency path | Converted to sub-pipeline reference |
| Connection Manager (Azure SQL) | Linked Service (AzureSqlDatabase) |
Microsoft Recommended version: server/database/authenticationType. Default auth: SystemAssignedManagedIdentity. |
| Connection Manager (on-prem SQL) | Linked Service (SqlServer) |
Auto-detected. Uses Self-Hosted IR with pooling: false. |
| SQL Agent Job Schedule | Schedule Trigger | Mapped from msdb freq_type/freq_interval. Falls back to placeholder if unavailable. |
convert_ssis_package writes the following directory structure under output_dir:
<output_dir>/
pipeline/
PL_<PackageName>.json ← Main ADF pipeline (prefix configurable)
linkedService/
LS_<ConnectionName>.json ← Microsoft Recommended version format
LS_KeyVault.json ← Auto-generated when use_key_vault=true
dataset/
DS_<DatasetName>.json ← Uses schema+table (not deprecated tableName)
dataflow/
DF_<DataFlowName>.json ← Mapping Data Flow with READ_UNCOMMITTED + error handling
trigger/
TR_<PackageName>.json ← ScheduleTrigger (Stopped state); accurate if SQL Agent schedule provided
stubs/
<FunctionName>/
__init__.py ← Python stub with TODO blocks
function.json ← Azure Function binding definition
Linked services use the Microsoft Recommended version format with discrete properties instead of the legacy connectionString format:
{
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "myserver.database.windows.net",
"database": "mydb",
"encrypt": "mandatory",
"trustServerCertificate": false,
"authenticationType": "SystemAssignedManagedIdentity"
}
}
For on-prem connections, the SqlServer connector type with Self-Hosted IR is used automatically:
{
"type": "SqlServer",
"typeProperties": {
"server": "on-prem-server",
"database": "mydb",
"authenticationType": "Windows",
"pooling": false
},
"connectVia": { "referenceName": "SelfHostedIR", "type": "IntegrationRuntimeReference" }
}
Datasets use separate schema and table properties per Microsoft's recommendation:
{
"type": "AzureSqlTable",
"typeProperties": {
"schema": "dbo",
"table": "MyTable"
}
}
Generated pipelines include automatic annotations based on detected patterns:
ssis-adf-agent — identifies the source toolsource-package:<name> — original SSIS package nameingestion-pattern:delta or ingestion-pattern:merge — when delta/merge patterns detectedhas-cross-db-references — when cross-database or linked server references foundcdm-review-required — when CDM-layer patterns detectedesi-reuse-candidate — when ESI reuse opportunities foundAfter running convert_ssis_package, review the following checklist before deploying:
EncryptAllWithPassword protection level may have missing passwords. When use_key_vault=true, linked services reference Key Vault secrets — verify the secret names exist and are populated. Otherwise fill in plaintext credentials.stubs/<FunctionName>/__init__.py contains TODO comments. If llm_translate=true was used, the stub contains LLM-translated Python. Deploy the Function to Azure Functions before running the pipeline.abfss:// or https:// URLs.manual_required severity items. Replace linked server four-part names with Azure SQL elastic queries, external tables, or separate linked services. Remap three-part names if consolidating databases.cdm-review-required annotation, coordinate with the CDM team to decide whether the transformation logic should migrate as-is or be replaced by existing CDM-layer entities.esi-reuse-candidate annotation, review whether reading from the ESI Azure SQL layer is preferable to re-staging from the on-prem source via SHIR.writeBehavior: "upsert" include detected key columns. Verify these match the target table's unique key. Replace TODO_KEY_COLUMN placeholders where keys could not be auto-detected.validate_adf_artifacts again after all manual edits.Install development dependencies:
pip install -e ".[dev]"
Run tests:
pytest
Lint:
ruff check .
Type-check:
mypy ssis_adf_agent/
The project targets Python 3.11+, uses ruff with line-length = 100, and enforces mypy --strict.
This project is licensed under the MIT License.
MIT License
Copyright (c) 2026 chsimons_microsoft
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Добавь это в claude_desktop_config.json и перезапусти Claude Desktop.
{
"mcpServers": {
"ssis-adf-agent": {
"command": "npx",
"args": []
}
}
}PRs, issues, code search, CI status
Database, auth and storage
Reference / test server with prompts, resources, and tools.
Secure file operations with configurable access controls.