The 2am page you don't want.
It's 2:14 AM. PagerDuty jolts you awake.
test_payment_webhook is flaky again on the staging
pipeline. You know this failure — you saw it last Tuesday.
You stumble to your laptop. Find the run in GitHub Actions. Scroll through 8,000 lines of log output looking for the stack trace. It's a connection timeout to the payment sandbox, like always. You restart the failed job, watch it go green, close the incident, and go back to bed at 3:10 AM.
The next morning, Slack is on fire. Three other pipelines hit the same thing overnight. Two teams are blocked on their deploys. Nobody caught the pattern because nobody was awake when each individual failure hit.
At scale, triage is the bottleneck — not expertise. ops-pilot removes humans from the triage loop, keeps them firmly at the review gate.
Four agents. One pipeline.
Each agent has a single responsibility. Typed contracts at every boundary. The whole pipeline runs in under 10 seconds per incident.
Watcher
Detects the failure
Polls your CI provider (GitHub Actions, GitLab CI, Jenkins) for failed runs. Builds a structured failure model from the log tail and fails fast on transient errors.
Diagnoser
Claude triages the logs
Sends logs + diff to Claude and extracts root cause, severity, and fix confidence into a typed Pydantic object — not a text blob. Context is pre-curated to stay under token budget.
Fixer
Opens a draft PR
Asks Claude which file to edit, fetches it, generates a minimal
patch, opens a draft PR against your branch. Humans
review every merge. Agents never push to main.
Notifier
Posts to Slack
Writes a concise Slack message with the fix summary, confidence score, and PR link. Routes by severity to the right channel. Zero false-positive spam.
Live demo.
The demo replays three pre-recorded scenarios. Zero API cost, no sign-up. Click through to watch the agents fix a Node.js test failure, a missing Python dependency, and a bad Dockerfile in real time.
Three commands.
Clone, configure, docker compose up. Full setup below.
-
Clone the repo
git clone https://github.com/adnanafik/ops-pilot cd ops-pilot -
Configure your secrets
Copy
.env.exampleto.envand fill in your Anthropic API key, a GitHub token withreposcope, and your Slack webhook URL.cp .env.example .env # edit .env: # ANTHROPIC_API_KEY=sk-ant-... # GITHUB_TOKEN=ghp_... # SLACK_WEBHOOK_URL=https://hooks.slack.com/... -
Run it
docker compose upThat's it. The Watcher starts polling within seconds. Trigger a failure on any watched pipeline to see the full loop.
Common questions.
Who is this for?
Platform, DevOps, and SRE teams running 10+ CI pipelines where manual triage is already a bandwidth bottleneck. If you have one pipeline that fails once a week, you don't need ops-pilot. If you have hundreds and a 2am rotation, you very much do.
How does it compare to GitHub Copilot or Aider?
Those are interactive — a human prompts them, reviews each suggestion, applies changes. ops-pilot is autonomous — it triggers on CI events without human action and only involves you at the PR review gate. Different layer of the problem. You can absolutely use both.
What LLM does it use? Can I swap it?
Claude (Anthropic) via their API. You bring your own
ANTHROPIC_API_KEY. The Diagnoser and Fixer sit
behind an LLMProvider abstract base class, so
swapping to OpenAI, Bedrock, or a local model is a matter of
writing an adapter — not rewriting the agents.
Is my data sent anywhere?
Only to Anthropic, for the LLM calls that power triage and fix generation. No analytics, no telemetry back to me, no third parties. ops-pilot is fully self-hosted — your CI logs, source code, and secrets never leave your infrastructure except for the LLM round-trip. Anthropic's data policy applies to that traffic.
Does it require humans to review merges?
Yes, and by design. Every fix lands as a draft PR
that a human has to approve and merge. Agents will never push to
main directly. This isn't timidity — it's the
correct architecture for 2026. The day agents are reliable enough
to merge their own fixes, it won't be any one tool that proves it;
it'll be years of production track record across the whole
ecosystem.
Can I use it with my custom CI system?
Yes. ops-pilot has a CIProvider abstract base class
with seven interface methods (list runs, get logs, fetch diff,
etc.). GitHub Actions, GitLab CI, and Jenkins ship as adapters
out of the box. Implement the same interface for your system
and the agents just work — no changes to the orchestrator.
Like it? Star the repo.
Open source, MIT licensed, built with Claude. Contributions welcome.