SAN FRANCISCO (Realist English). The world’s leading artificial intelligence groups are accelerating efforts to develop so-called “world models” — systems designed to understand and navigate physical environments — as questions mount over whether large language models (LLMs) are reaching the limits of their progress.
Google DeepMind, Meta and Nvidia are among the companies racing to build technology that learns from video and robotic data rather than text alone. Proponents argue that this approach could unlock advances in robotics, autonomous vehicles and industrial automation, pushing AI closer to human-level reasoning and planning.
The shift comes amid signs of slower performance gains from successive generations of LLMs such as OpenAI’s ChatGPT, Google’s Gemini and Elon Musk’s xAI models, despite billions of dollars invested. Nvidia’s Rev Lebaredian, vice-president of Omniverse and simulation technology, estimated the potential market for world models could reach “$100tn” if AI can successfully operate in the physical world, transforming industries from manufacturing to healthcare.
World models are trained on vast datasets of real and simulated environments and are seen as key to advancing robotics and so-called AI agents. But they require enormous computing power and remain an unsolved technical challenge.
Recent months have brought a wave of developments. Google DeepMind last month unveiled Genie 3, a video model that generates frame by frame based on prior interactions, rather than producing entire clips at once. Meta’s AI research lab FAIR, led by chief scientist Yann LeCun, is testing its V-JEPA model on robots, seeking to mimic how children learn passively from observation. LeCun, one of AI’s pioneers, argues LLMs alone will never master reasoning and planning.
Despite this, Meta CEO Mark Zuckerberg has increased investment in both LLMs and alternative models, hiring Alexandr Wang, founder of Scale AI, to lead its AI division, with LeCun now reporting to him.
Start-ups are also moving quickly. World Labs, founded by Fei-Fei Li, is building systems to generate 3D environments for video games from a single image. Runway, which supplies video-generation tools to Hollywood studios, has launched a product that uses world models to create interactive gaming settings with personalised storylines and characters.
San Francisco-based Niantic, the company behind Pokémon Go, has mapped 10mn locations globally through gameplay interactions, data it now uses to construct large-scale spatial models. After selling the game to Scopely in June, Niantic continues to collect anonymised scans of landmarks from millions of active players.
Nvidia, meanwhile, is leveraging its Omniverse platform to generate advanced simulations, expanding its push into robotics. Chief executive Jensen Huang has said the company’s next growth phase will come from “physical AI”, predicting world models will reshape industries in the same way personal computers transformed knowledge work.
Experts caution that human-level intelligence may still be a decade away. But many in the field see the pursuit of world models as the most promising path toward the next era of AI — one that extends beyond text and code into the physical world itself.














