An autonomous agent that ships validated Terraform PRs in under 10 minutes — 3-5× faster than ChatGPT and Claude.
I built an AI agent that takes a Terraform task and delivers a validated GitHub PR — fully autonomous, end-to-end, in under 10 minutes.
I benchmarked it against ChatGPT, Claude, and other AI tools. My agent was 3-5× faster — because it removes the human from the execution loop entirely.
Here's the architecture and what made it fast.
The core insight
Every time you ask ChatGPT to fix something, you become the bottleneck. My agent eliminates that. Human in at the start. Human in at the review. Autonomous everything in between.
The architecture
Python, Kubernetes, AWS Bedrock / Claude.
- Engineers submit tasks via UI — built to connect to Jira as next step.
- Each task gets a unique ID and pushed to a NATS JetStream queue — lightweight pub/sub, extensible to multiple agents, each with its own queue.
- Stateful containers on Kubernetes with PVC mapped to EFS — state survives restarts, cloud-agnostic design (AWS/GCP/Azure).
- Real-time status via NATS/SSE — no black box, users see every step live.
- 11 specialized skills orchestrated end-to-end: repo discovery → clarifying questions → code generation →
tflint→terraform validate→ GitHub PR. - Configurable timeouts (default 7 days) — if user never answers a clarifying question, agent cleans up state and releases resources gracefully.
How we got under 10 minutes
- Smart repo comprehension — extract only structurally relevant context, not the entire codebase.
- Skill specialization — 11 tightly scoped prompts, each optimized for one job.
- Prompt caching — avoid re-sending large repeated context on every LLM call.
- Minimize LLM turns — every unnecessary call adds seconds; at 11 skills it compounds.
- Async execution — lint pre-loads while code generates, no sequential waiting.
The result: near-deterministic IaC automation, 3-5× faster than AI chat tools, running at scale.
The bigger lesson: the next generation of AI tooling isn't smarter chat. It's specialized skills, async execution, persistent state, and humans at the review gate — not the keyboard.