The Problem

The 2am page you don't want.

It's 2:14 AM. PagerDuty jolts you awake. test_payment_webhook is flaky again on the staging pipeline. You know this failure — you saw it last Tuesday.

You stumble to your laptop. Find the run in GitHub Actions. Scroll through 8,000 lines of log output looking for the stack trace. It's a connection timeout to the payment sandbox, like always. You restart the failed job, watch it go green, close the incident, and go back to bed at 3:10 AM.

The next morning, Slack is on fire. Three other pipelines hit the same thing overnight. Two teams are blocked on their deploys. Nobody caught the pattern because nobody was awake when each individual failure hit.

At scale, triage is the bottleneck — not expertise. ops-pilot removes humans from the triage loop, keeps them firmly at the review gate.

Architecture

Four agents. One pipeline.

Each agent has a single responsibility. Typed contracts at every boundary. The whole pipeline runs in under 10 seconds per incident.

01

Watcher

Detects the failure

Polls your CI provider (GitHub Actions, GitLab CI, Jenkins) for failed runs. Builds a structured failure model from the log tail and fails fast on transient errors.

02

Diagnoser

Claude triages the logs

Sends logs + diff to Claude and extracts root cause, severity, and fix confidence into a typed Pydantic object — not a text blob. Context is pre-curated to stay under token budget.

03

Fixer

Opens a draft PR

Asks Claude which file to edit, fetches it, generates a minimal patch, opens a draft PR against your branch. Humans review every merge. Agents never push to main.

04

Notifier

Posts to Slack

Writes a concise Slack message with the fix summary, confidence score, and PR link. Routes by severity to the right channel. Zero false-positive spam.

See It Run

Live demo.

The demo replays three pre-recorded scenarios. Zero API cost, no sign-up. Click through to watch the agents fix a Node.js test failure, a missing Python dependency, and a bad Dockerfile in real time.

ops-pilot demo — AI agents triaging a CI failure and opening a fix PR

Launch live demo ↗

Getting Started

Three commands.

Clone, configure, docker compose up. Full setup below.

Clone the repo

git clone https://github.com/adnanafik/ops-pilot
cd ops-pilot

Configure your secrets

Copy .env.example to .env and fill in your Anthropic API key, a GitHub token with repo scope, and your Slack webhook URL.
```
cp .env.example .env
# edit .env:
# ANTHROPIC_API_KEY=sk-ant-...
# GITHUB_TOKEN=ghp_...
# SLACK_WEBHOOK_URL=https://hooks.slack.com/...
```
Run it
```
docker compose up
```
That's it. The Watcher starts polling within seconds. Trigger a failure on any watched pipeline to see the full loop.

FAQ

Common questions.

Who is this for?

Platform, DevOps, and SRE teams running 10+ CI pipelines where manual triage is already a bandwidth bottleneck. If you have one pipeline that fails once a week, you don't need ops-pilot. If you have hundreds and a 2am rotation, you very much do.

How does it compare to GitHub Copilot or Aider?

Those are interactive — a human prompts them, reviews each suggestion, applies changes. ops-pilot is autonomous — it triggers on CI events without human action and only involves you at the PR review gate. Different layer of the problem. You can absolutely use both.

What LLM does it use? Can I swap it?

Claude (Anthropic) via their API. You bring your own ANTHROPIC_API_KEY. The Diagnoser and Fixer sit behind an LLMProvider abstract base class, so swapping to OpenAI, Bedrock, or a local model is a matter of writing an adapter — not rewriting the agents.

Is my data sent anywhere?

Only to Anthropic, for the LLM calls that power triage and fix generation. No analytics, no telemetry back to me, no third parties. ops-pilot is fully self-hosted — your CI logs, source code, and secrets never leave your infrastructure except for the LLM round-trip. Anthropic's data policy applies to that traffic.

Does it require humans to review merges?

Yes, and by design. Every fix lands as a draft PR that a human has to approve and merge. Agents will never push to main directly. This isn't timidity — it's the correct architecture for 2026. The day agents are reliable enough to merge their own fixes, it won't be any one tool that proves it; it'll be years of production track record across the whole ecosystem.

Can I use it with my custom CI system?

Yes. ops-pilot has a CIProvider abstract base class with seven interface methods (list runs, get logs, fetch diff, etc.). GitHub Actions, GitLab CI, and Jenkins ship as adapters out of the box. Implement the same interface for your system and the agents just work — no changes to the orchestrator.