Tech Entrepreneurs Bet Big on AI That Understands the Physical World

PROVIDENCE, R.I. — Computer scientist Louis Castricato spent eight years studying the AI technology that powers chatbots like ChatGPT and Claude before concluding that the field had largely run its course as a research discipline.

“We basically have passed the point of doing real fundamental LLM research,” Castricato said. “Now it’s just applications.”

He walked away from his studies at Brown University and launched a startup called Overworld — a name that reflects his new mission: building AI that can understand and navigate the physical world, not just process language.

Chatbot-based AI still represents enormous business opportunity, with investors committing trillions of dollars to companies like Anthropic and OpenAI. But a rising number of AI entrepreneurs are setting their sights on what they consider the next major breakthrough: “world models” — systems designed to teach AI, and sometimes robots, how to function in real physical environments.

Among those leading this charge is Fei-Fei Li, widely known as the “Godmother of AI,” who describes the world model concept as “one of the most important and most overloaded terms in AI today.”

The core idea behind world model research is that true intelligence requires more than reading text. An AI system also needs to understand the environment around it.

“Where language models learn the statistical structure of text, world models learn the statistical structure of space and time: how light falls on a surface, how a garden looks from an angle no camera has captured, how objects respond to force and follow the laws of physics,” wrote Li, who founded the San Francisco startup World Labs, in a recently published essay.

AI pioneer Yann LeCun is another major voice in this space. He stepped down last year from his role as Meta’s chief AI scientist to launch Paris-based Advanced Machine Intelligence Labs.

“World model is quickly becoming a buzzword,” LeCun said on a recent episode of the “Unsupervised Learning” podcast, describing it as something that allows an AI agent “to predict the consequences of its own actions.”

Definitions of world models vary widely, often shaped by what a researcher or entrepreneur hopes to build — whether that’s a more capable robot or a more dynamic video game.

Current AI language models were trained on vast amounts of human-generated text and visual content, producing assistants that are transforming office work and creative industries. But some experts see fundamental limits in generative AI systems that work by predicting the next word or pixel.

Martin Hebert, dean of computer science at Carnegie Mellon University, points out that chatbots can’t pick up a coffee mug.

“There’s all the geometry of the world, the dynamic of how I move my hand, the physical interaction of the contact with the cup,” Hebert said. “This is much more complex than just predicting the next word in a sentence.”

For Hebert, who has spent more than four decades in robotics research, world models represent a faster and more affordable path to what the tech industry calls “physical AI.”

“Some people may have different definitions, but physical and embodied AI are kind of the evolution of what we used to call robotics,” he said. He compared the concept to the way the human nervous system operates — allowing the body to adapt instinctively without conscious thought.

“In your body and spinal cord you have a very general model of how to balance, how to walk around, and you can adapt to your knee hurting in the morning, so you now walk a little differently,” Hebert said. “You don’t need to think about that. You have a general model somewhere in your nervous system and brain that allows your body to adapt very quickly.”

Robots aren’t the only destination for this technology. Castricato founded Overworld last year, and his small Rhode Island-based startup is currently developing video game environments where scenes — like a creepy forest — shift and respond as a virtual character moves through and interacts with them.

“There’s no other world model where you can just walk through doors or where you can interact with a detailed environment like this,” he said. “We optimize for interaction above anything else.”

While practical applications aren’t as immediately obvious as AI coding tools, world model companies are drawing significant interest from investors. Venture capitalist Steve Jang, co-founder and managing partner at Kindred Ventures, is backing Overworld along with other world model startups, including Causal Labs, which is developing AI for weather forecasting, and Extropic, which is building specialized computer chips designed for world model applications.

“I think that the future is many different types of models with many different philosophies and architectures,” Jang said. “I don’t think that it’ll be one large, dense model to rule them all.”

In her recent essay, Li attempted to establish a framework for understanding the competing visions in this field. She noted the confusion that comes from using the same term to describe very different technologies.

“A video model that produces gorgeous but physically impossible flames, a language model improvising a playable game, and a physics engine that faithfully simulates combustion all go by the same name,” she wrote.

Li sorted world models into three categories: “renderers,” which focus on visual realism but aren’t reliable for teaching robots; “simulators,” which create training environments that accurately mirror physical reality; and “planners,” which try to determine what an AI agent or robot should do when placed in an unpredictable setting.

“A robot that can plan is a robot that can work, and the entire industry is racing to be the one that gets there first,” she wrote.