How Docker's Agent Fleet Transforms Software Delivery with Autonomous AI Teams
Introduction
In the fast-paced world of software development, shipping reliable code quickly is a constant challenge. The Coding Agent Sandboxes (sbx) team at Docker has pioneered a novel approach: a fleet of AI agents that act as a virtual team, autonomously handling testing, bug triage, release notes, and even patching defects. This system, built on Claude Code skills, runs both on developers' local machines and in CI pipelines, reducing iteration cycles and freeing human engineers for higher-level work.

What Is the Fleet?
The Fleet is a collection of seven specialized AI agent roles, each with a defined persona, responsibilities, and tool permissions. These agents operate inside secure microVM-based sandboxes, which provide full isolation—complete with their own Docker daemon, network, and filesystem—without affecting the host system. The sandboxes are managed by the sbx CLI tool, which runs on macOS, Linux, and Windows.
Skill-Based Design, Not Script-Based
Instead of writing traditional test scripts or reporting tools, the team created Claude Code skills—markdown files that describe a role rather than a sequence of steps. A skill file says, "You are the build engineer; here is what you know and how you make decisions." This distinction is crucial: agents need judgment, not just instructions. When a test fails unexpectedly, a script halts; a role investigates.
Local First, CI Second
The Fleet's design principle is simple: every skill runs on a developer's machine first. When building the /cli-tester skill (the fleet's exploratory tester), the team didn't start by writing a GitHub workflow. They invoked it locally, watched it build binaries, exercise CLI commands, and report issues. Only after refining the skill on a laptop did they wire it into a CI workflow.
This approach avoids the painful commit-push-wait-read-logs cycle. Local debugging takes seconds—you see the agent think, see where it gets confused, fix the skill file, and re-invoke. CI is just another runtime for the same skill. The /cli-tester that runs nightly on multiple platforms is the exact same skill invoked from a terminal. No separate "CI version," no translation layer.
The Seven Agent Roles
The Fleet comprises seven distinct roles, each designed to handle a specific aspect of the development lifecycle:
- Build Engineer: Compiles the codebase, manages dependencies, and ensures reproducible builds.
- Exploratory Tester: Runs the CLI tool with various commands and flags, looking for crashes or unexpected behavior.
- Integration Tester: Verifies cross-platform compatibility and upgrade paths between versions.
- Load Tester: Sustains heavy usage to catch resource leaks and performance regressions.
- Issue Triage Agent: Scans the backlog, categorizes bugs, and assigns priority.
- Release Note Author: Summarizes changelogs and generates user-friendly release notes.
- Bug Fixer: Analyzes failing tests, proposes patches, and submits pull requests for review.
Daily Workflow in Action
Every night, the Fleet runs across macOS, Linux, and Windows runners in CI. The agents autonomously perform regression tests, monitor memory usage, and report findings. The Issue Triage Agent processes new tickets, grouping duplicates and flagging critical bugs. The Release Note Author compiles a draft of what shipped, using data from merged pull requests.

If a test failure is detected, the Bug Fixer agent steps in: it reviews the error logs, suggests a code change, and even opens a pull request with a fix—all without human intervention. The team then reviews the proposed changes, confident that the agents have already done the heavy analysis.
Scalable and Repeatable
Because the agents use the same skill files locally and in CI, scaling is straightforward. New roles can be added by writing a new skill markdown file, testing it on a developer's machine, and then integrating it into the CI pipeline. The microVM sandboxes ensure that each agent runs in a clean, isolated environment, eliminating flaky tests caused by state pollution.
The Fleet has already reduced the time to detect and fix bugs, improved release quality, and given the human team more time to focus on architecture and innovation.
Conclusion
Docker's Agent Fleet demonstrates a new paradigm for software delivery: autonomous AI teams that operate alongside human developers. By combining skill-based agents with secure sandboxing and a local-first approach, the sbx team ships faster and with fewer regressions. The same principles—role over script, local iteration before CI, and reusable skills—can be applied to any project looking to accelerate its development cycle.