Connect with us

AI Research

MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation

Published

on


A new research framework helps AI agents explore three-dimensional spaces they can’t directly detect. Called MindJourney, the approach addresses a key limitation in vision-language models (VLMs), which give AI agents their ability to interpret and describe visual scenes.  

While VLMs are strong at identifying objects in static images, they struggle to interpret the interactive 3D world behind 2D images. This gap shows up in spatial questions like “If I sit on the couch that is on my right and face the chairs, will the kitchen be to my right or left?”—tasks that require an agent to interpret its position and movement through space. 

People overcome this challenge by mentally exploring a space, imagining moving through it and combining those mental snapshots to work out where objects are. MindJourney applies the same process to AI agents, letting them roam a virtual space before answering spatial questions. 

How MindJourney navigates 3D space

To perform this type of spatial navigation, MindJourney uses a world model—in this case, a video generation system trained on a large collection of videos captured from a single moving viewpoint, showing actions such as going forward and turning left of right, much like a 3D cinematographer. From this, it learns to predict how a new scene would appear from different perspectives.

At inference time, the model can generate photo-realistic images of a scene based on possible movements from the agent’s current position. It generates multiple possible views of a scene while the VLM acts as a filter, selecting the constructed perspectives that are most likely to answer the user’s question.

These are kept and expanded in the next iteration, while less promising paths are discarded. This process, shown in Figure 1, avoids the need to generate and evaluate thousands of possible movement sequences by focusing only on the most informative perspectives.

Figure 1. Given a spatial reasoning query, MindJourney searches through the imagined 3D space using a world model and improves the VLM's spatial interpretation through generated observations when encountering a new  challenges.
Figure 1. Given a spatial reasoning query, MindJourney searches through the imagined 3D space using a world model and improves the VLM’s spatial interpretation through generated observations when encountering new challenges. 

 

To make its search through a simulated space both effective and efficient, MindJourney uses a spatial beam search—an algorithm that prioritizes the most promising paths. It works within a fixed number of steps, each representing a movement. By balancing breadth with depth, spatial beam search enables MindJourney to gather strong supporting evidence. This process is illustrated in Figure 2.

MindJourney pipeline diagram
Figure 2. The MindJourney workflow starts with a spatial beam search for a set number of steps before answering the query. The world model interactively generates new observations, while a VLM interprets the generated images, guiding the search throughout the process.

By iterating through simulation, evaluation, and integration, MindJourney can reason about spatial relationships far beyond what any single 2D image can convey, all without the need for additional training. On the Spatial Aptitude Training (SAT) benchmark, it improved the accuracy of VLMs by 8% over their baseline performance.

Azure AI Foundry Labs

Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.


Building smarter agents  

MindJourney showed strong performance on multiple 3D spatial-reasoning benchmarks, and even advanced VLMs improved when paired with its imagination loop. This suggests that the spatial patterns that world models learn from raw images, combined with the symbolic capabilities of VLMs, create a more complete spatial capability for agents. Together, they enable agents to infer what lies beyond the visible frame and interpret the physical world more accurately. 

It also demonstrates that pretrained VLMs and trainable world models can work together in 3D without retraining either one—pointing toward general-purpose agents capable of interpreting and acting in real-world environments. This opens the way to possible applications in autonomous robotics, smart home technologies, and accessibility tools for people with visual impairments. 

By converting systems that simply describe static images into active agents that continually evaluate where to look next, MindJourney connects computer vision with planning. Because exploration occurs entirely within the model’s latent space—its internal representation of the scene—robots would be able to test multiple viewpoints before determining their next move, potentially reducing wear, energy use, and collision risk. 

Looking ahead, we plan to extend the framework to use world models that not only predict new viewpoints but also forecast how the scene might change over time. We envision MindJourney working alongside VLMs that interpret those predictions and use them to plan what to do next. This enhancement could enable agents more accurately interpret spatial relationships and physical dynamics, helping them to operate effectively in changing environments.





Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Who is Shawn Shen? The Cambridge alumnus and ex-Meta scientist offering $2M to poach AI researchers

Published

on


Shawn Shen, co-founder and Chief Executive Officer of the artificial intelligence (AI) startup Memories.ai, has made headlines for offering compensation packages worth up to $2 million to attract researchers from top technology companies. In a recent interview with Business Insider, Shen explained that many scientists are leaving Meta, the parent company of Facebook, due to constant reorganisations and shifting priorities.“Meta is constantly doing reorganizations. Your manager and your goals can change every few months. For some researchers, it can be really frustrating and feel like a waste of time,” Shen told Business Insider, adding that this is a key reason why researchers are seeking roles at startups. He also cited Meta Chief Executive Officer Mark Zuckerberg’s philosophy that “the biggest risk is not taking any risks” as a motivation for his own move into entrepreneurship.With Memories.ai, a company developing AI capable of understanding and remembering visual data, Shen is aiming to build a niche team of elite researchers. His company has already recruited Chi-Hao Wu, a former Meta research scientist, as Chief AI Officer, and is in talks with other researchers from Meta’s Superintelligence Lab as well as Google DeepMind.

From full scholarships to Cambridge classrooms

Shen’s academic journey is rooted in engineering, supported consistently by merit-based scholarships. He studied at Dulwich College from 2013 to 2016 on a full scholarship, completing his A-Level qualifications.He then pursued higher education at the University of Cambridge, where he was awarded full scholarships throughout. Shen earned a Bachelor of Arts (BA) in Engineering (2016–2019), followed by a Master of Engineering (MEng) at Trinity College (2019–2020). He later continued at Cambridge as a Meta PhD Fellow, completing his Doctor of Philosophy (PhD) in Engineering between 2020 and 2023.

Early career: Internships in finance and research

Alongside his academic pursuits, Shen gained early experience through internships and analyst roles in finance. He worked as a Quantitative Research Summer Analyst at Killik & Co in London (2017) and as an Investment Banking Summer Analyst at Morgan Stanley in Shanghai (2018).Shen also interned as a Research Scientist at the Computational and Biological Learning Lab at the University of Cambridge (2019), building the foundations for his transition into advanced AI research.

From Meta’s Reality Labs to academia

After completing his PhD, Shen joined Meta (Reality Labs Research) in Redmond, Washington, as a Research Scientist (2022–2024). His time at Meta exposed him to cutting-edge work in generative AI, but also to the frustrations of frequent corporate restructuring. This experience eventually drove him toward building his own company.In April 2024, Shen began his academic career as an Assistant Professor at the University of Bristol, before launching Memories.ai in October 2024.

Betting on talent with $2M offers

Explaining his company’s aggressive hiring packages, Shen told Business Insider: “It’s because of the talent war that was started by Mark Zuckerberg. I used to work at Meta, and I speak with my former colleagues often about this. When I heard about their compensation packages, I was shocked — it’s really in the tens of millions range. But it shows that in this age, AI researchers who make the best models and stand at the frontier of technology are really worth this amount of money.”Shen noted that Memories.ai is looking to recruit three to five researchers in the next six months, followed by up to ten more within a year. The company is prioritising individuals willing to take a mix of equity and cash, with Shen emphasising that these recruits would be treated as founding members rather than employees.By betting heavily on talent, Shen believes Memories.ai will be in a strong position to secure additional funding and establish itself in the competitive AI landscape.His bold $2 million offers may raise eyebrows, but they also underline a larger truth: in today’s technology race, the fiercest competition is not for customers or capital, it’s for talent.





Source link

Continue Reading

AI Research

JUPITER: Europe’s First Exascale Supercomputer Powers AI and Climate Research | Ukraine news

Published

on


The Jupiter supercomputer at the Jülich Research Centre, Germany, September 5, 2025.
Getty Images/INA FASSBENDER/AFP

As reported by the European Commission’s press service

At the Jülich Research Center in Germany, on September 5, the ceremonial opening of the supercomputer JUPITER took place – the first in Europe to surpass the exaflop performance threshold. The system is capable of performing more than one quintillion operations per second, according to the European Commission’s press service.

According to the EU, JUPITER runs entirely on renewable energy sources and features advanced cooling and heat disposal systems. It also topped the Green500 global energy-efficiency ranking.

The supercomputer is located on a site covering more than 2,300 square meters and comprises about 50 modular containers. It is currently the fourth-fastest supercomputer in the world.

JUPITER is capable of running high-resolution climate and meteorological models with kilometer-scale resolution, which allows more accurate forecasts of extreme events – from heat waves to floods.

Role in the European AI ecosystem and industrial developments

In addition, the system will form the backbone of the future European AI factory JAIF, which will train large language models and other generative technologies.

The investment in JUPITER amounts to about 500 million euros – a joint project of the EU and Germany under the EuroHPC programme. This is part of a broader strategy to build a network of AI gigafactories that will provide industry and science with the capabilities to develop new models and technologies.

It is expected that the deployment of JUPITER will strengthen European research-industrial initiatives and enhance the EU’s competitiveness on the global stage in the field of artificial intelligence and scientific developments.

More interesting materials:





Source link

Continue Reading

AI Research

PH kicks off 2025 Development Policy Research Month on AI in governance

Published

on


THE Philippines cannot rely on new technology alone to thrive in the age of artificial intelligence. Strong governance policies must come first — this was the central call of the 2025 Development Policy Research Month (DPRM), which opened on Sept. 1 with a push for AI rules that reflect national realities.

“Policy research provides the guardrails that help governments adopt technology responsibly,” said PIDS president Dr. Philip Arnold Tuano. Without such guardrails, he warned, the benefits of AI may never outweigh the risks.

CONFERENCE HIGHLIGHT The 2025 Development Policy Research Month kicked off with a push for AI rules that reflect the country’s realities. PHOTO FROM PIDS

CONFERENCE HIGHLIGHT The 2025 Development Policy Research Month kicked off with a push for AI rules that reflect the country’s realities. PHOTO FROM PIDS

Established under Proclamation 247 (2002), DPRM highlights the role of policy research in shaping evidence-based strategies. This year’s theme, “Reimagining Governance in the Age of AI,” underscores that while AI offers tools for efficiency and transparency, policies must come first to address risks such as digital exclusion, bias, cybersecurity threats, and workforce displacement.

PIDS, as lead coordinator, works with an interagency steering committee that includes the BSP, CSC, DBM, DILG, legislative policy offices, PIA, PMS, and now the Department of Science and Technology, which joins for the first time, given its role in AI research and governance.

Get the latest news


delivered to your inbox

Sign up for The Manila Times newsletters

By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

The highlight is the 11th Annual Public Policy Conference on Sept. 18 at New World Hotel Makati, featuring global experts. Activities nationwide will amplify the campaign, supported by the hashtag #AIforGoodGovernance.

Learn more at https://dprm.pids.gov.ph.




Source link

Continue Reading

Trending