🎧 Prefer to listen?
“A part of our NVIDIA GEAR lab now self-improves tirelessly overnight. We just read the reports in the morning.” That’s Jim Fan, director of AI at Nvidia, describing what happened when his team gave AI coding agents a lab full of robotic arms, some compute resources, and — in his words — a “generous token budget.” The agents didn’t just follow instructions. They wrote their own training programs, tested them on real robots, debugged failures, and iterated until the robots could cut zip ties and insert GPUs into motherboard sockets with 99% accuracy. All while the researchers slept.
What actually happened
Nvidia’s GEAR lab, working with Carnegie Mellon and UC Berkeley, built a framework called ENPIRE — an agentic harness that wraps around AI models and gives them access to robotic hardware, memory, feedback loops, and the ability to write and test code autonomously. It’s not a chatbot that describes how to train robots. It’s a system that lets AI agents actually train them.
The framework has four modules: one that handles automatic resets and verification, one that refines the policies governing robotic behavior, one that evaluates those policies across multiple robots working in parallel, and one that analyzes failures — reading logs, ingesting research papers, and improving the training code itself.
They tested it with three different AI coding agents: OpenAI’s Codex (GPT-5.5), Anthropic’s Claude Code (Opus 4.7), and Moonshot AI’s Kimi Code (Kimi K2.6). Each agent team independently developed different algorithmic approaches, tested them on real hardware, and kept whatever worked. The results: 99% success rate on manipulation tasks including the standard Push-T benchmark, pin organization, zip tie cutting, and — most impressively — GPU insertion into motherboard sockets.
The pin insertion result is particularly striking. The AI agents achieved nearly 100% success faster than a “frontier human-in-the-loop method” developed by the same human researchers. The agents didn’t just match human-designed training — they beat it.
Why team size matters
One of the most interesting findings was about scale. Teams of eight AI coding agents achieved 99% success on the Push-T task in two hours. A four-agent team took three hours. A single agent took nearly five hours. More agents meant more parallel experimentation, faster iteration, and quicker convergence on effective strategies.
But there’s a tradeoff. Larger teams spent more time summarizing each other’s ideas and less time using the actual robots. The robots sat idle while agents were busy reading logs, writing code, debugging, or waiting for the language model to respond. This mirrors what happens in human teams — more people means more coordination overhead. The difference is that AI agents don’t get frustrated by the meeting culture. They just… keep going.
The sweet spot, according to the research, depends on the task complexity and available compute. For straightforward tasks, a single agent works fine. For complex manipulation tasks that benefit from diverse algorithmic approaches, larger teams converge faster — even with the coordination cost.
What this means for non-roboticists
If you’re not building robots, you might think this doesn’t apply to you. But the underlying pattern — AI agents autonomously writing code, testing it, analyzing failures, and iterating — is exactly what’s happening in the tools you already use.
The ENPIRE framework is essentially what I described when writing about hiring AI agents — but applied to physical hardware instead of email triage. The agents have a job description (train the robot), access to tools (robotic arms, compute, code), and the ability to take actions autonomously. They fail, learn, and improve without human intervention.
Nvidia plans to open-source ENPIRE so anyone can host their own “self-running robot lab.” Jim Fan joked that the goal is for the team to “take a holiday and Jensen wouldn’t even notice.” But the serious implication is that autonomous AI agent teams are no longer theoretical — they’re running in production at one of the world’s most advanced AI labs.
The limits nobody’s highlighting
The research also revealed important limitations that the headlines skip.
Agents waste hardware time. While agents are reading logs, writing code, and debugging, the robots sit idle. In a lab with expensive hardware, this is costly. The researchers found that agent efficiency — the ratio of actual robot training time to total elapsed time — varied significantly based on team size and task complexity.
Coordination is expensive. Larger agent teams spent meaningful compute just summarizing each other’s work. This isn’t free — it burns tokens and time. The research didn’t find a perfect team size because the optimal number depends on the specific task, available compute, and how much parallelism the hardware supports.
The agents needed human-designed frameworks. ENPIRE didn’t emerge from AI agents designing themselves (yet). Human researchers built the harness, defined the task spaces, and set the evaluation criteria. The agents optimized within those constraints. This is the pattern across all AI tools — they’re powerful within defined boundaries, but the boundaries still need humans to define them.
The bottom line
AI coding agents training robots overnight isn’t science fiction — it’s happening at Nvidia right now, with open-source tools expected to be released. For solo builders and small businesses, the immediate takeaway is that the agent paradigm is expanding from digital tasks to physical ones. The same principles that make AI agents effective for your business — autonomy, iteration, parallel experimentation — are now being applied to robotics at scale. Whether you’re building robots or not, understanding this pattern helps you see where AI tools are heading. For more on getting started with AI agents, check out /start-here/.