🎧 Prefer to listen?
Here’s a sentence I didn’t expect to write: the next generation of AI agents might learn to navigate the real world by watching teenagers play Fortnite. A startup called General Intuition just raised $300 million — backed by Jeff Bezos and Eric Schmidt — to train AI agents using a dataset of 2 billion video game clips per year. Their valuation? Over $2 billion. And the reason this matters isn’t the money. It’s the approach: instead of building AI that processes text, they’re building AI that understands space, time, and movement. If that sounds abstract, stick with me — this is going to change how AI agents work.
What General Intuition actually does
General Intuition spun out of Medal, a platform where 10 million monthly users upload and share video game clips. That’s 2 billion first-person gameplay videos per year — every kill, every movement, every decision captured from the player’s perspective. Most people would see a highlight reel. General Intuition sees a training dataset for spatial intelligence.
The startup trains what researchers call “world models” — AI systems that understand how objects move through space, how actions cause consequences, and how to anticipate what happens next. Traditional AI models (like the ones powering ChatGPT) learn from text. World models learn from watching the physical world operate. And video games, it turns out, are an incredibly efficient proxy for the physical world — they contain physics, spatial reasoning, cause and effect, and real-time decision-making, all in a format that’s cheaper and safer to collect than real-world data.
The CEO, Pim de Witte, describes the approach as building agents that can “perceive, anticipate, and interact in real time.” Not chatbots that respond to text. Agents that move through environments, make decisions, and learn from consequences — the way a human learns to drive by watching traffic, not by reading a manual. It’s the same leap I described when writing about AI agents becoming employees — just applied to the physical world instead of the digital one.
Why video game data is different
You might wonder: why not just train AI on real-world video? Companies like Runway and World Labs are doing that. The problem is scale and cost. Real-world video is expensive to collect, hard to annotate, and full of privacy complications. Video game data is none of those things.
Gameplay footage is inherently structured. Every frame contains spatial information — where objects are, how they’re moving, what the player is looking at. Every action has a measurable outcome — did the player survive, score, complete the objective? The feedback loop is built in. An AI watching gameplay can learn not just what happened, but what worked.
This is fundamentally different from how most AI tools are trained. Language models learn patterns in text. World models learn patterns in physical reality (or a simulation of it). The difference matters because the agents that will actually be useful in the real world — robots, autonomous systems, AI employees — need to understand space and time, not just language.
OpenAI apparently recognized this — they reportedly tried to acquire Medal for its dataset. When that didn’t work, they weren’t the only major AI lab to come knocking. The data advantage is real, and General Intuition has it.
What this means for solo builders
You might think this is irrelevant to your business. It’s not — and here’s why.
The AI agents you can deploy today (like Slackbot or the agents I described hiring) are language-based. They process text, make decisions in text, and take actions through APIs. They’re powerful, but they’re limited to the digital world. They can’t see. They can’t navigate physical space. They can’t understand what’s happening in a video.
World models change that. When General Intuition releases its product (expected late summer or early fall 2026), the first applications will likely be in gaming and robotics simulation. But the downstream effect is broader: AI agents that understand spatial reasoning can eventually monitor physical spaces, navigate warehouses, inspect infrastructure, or guide autonomous systems. The agent economy isn’t just digital.
For now, the practical takeaway is this: the AI tools you’re using today are the text-based layer. The spatial layer is coming. Companies that understand both — that can deploy language agents for digital work and spatial agents for physical work — will have a structural advantage. You don’t need to build world models. You need to know they’re coming and position your workflows to integrate them when they arrive.
The competitive landscape
General Intuition isn’t alone. The world model space is heating up fast:
- Runway started with AI video generation for filmmakers and is pivoting toward world models for real-world simulation
- Decart recently released a world model that can simulate hours of photorealistic driving
- World Labs (from Fei-Fei Li) launched Marble, its first commercial world model product
- Google’s Genie 3 began integrating Google Maps data to simulate real streets
What makes General Intuition unique isn’t the technology alone — it’s the data. While competitors are scraping the internet or building synthetic datasets, General Intuition has 2 billion new gameplay videos flowing in every month from Medal’s user base. That’s a data moat that’s extremely hard to replicate.
The company plans to use the new funding to scale compute and release a product by late summer. If they deliver, it’ll be the first world model trained on interactive, first-person data at scale. That’s a differentiator that matters.
The bottom line
General Intuition’s $2B valuation isn’t just about gaming. It’s a bet that the next generation of AI agents needs to understand the physical world — and that video games are the fastest way to teach them. For solo builders and small businesses, the immediate lesson is that the agent economy is expanding beyond text and into space. Start thinking about what tasks in your business could benefit from an AI that can see and move, not just read and write. For more on where AI agents are heading, check out /start-here/.