AI Research

AI Research Review 25.07.03 – by Patrick McGuinness

Published

4 days ago

July 4, 2025

Figure 1. AI art – Coral brain.

Since the introduction of the o1 reasoning model, there have been significant advances in AI reasoning. DeepSeek shared the RL post-training process used to instill reasoning into DeepSeek-R1, and this year many papers have presented refined RL post-training algorithms for AI reasoning.

This week’s AI research review covers papers that expose limits to these methods and extend AI reasoning capabilities – extending reasoning to the visual domain using RL techniques, building structured reasoning from the ground up, instilling causal reasoning, and examining strategic reasoning capabilities:

GLM-4.1V-9B-Thinking
ASTRO: Teaching LLMs to Reason with Search
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
Causal Reasoning in LLMs: Reality or Mirage?

Researchers from Zhipu AI & Tsinghua University have introduced GLM-4.1V-Thinking, a vision-language model (VLM) engineered for general-purpose multimodal reasoning. Addressing the challenge of achieving broad-spectrum reasoning capabilities in VLMs, the researchers presents a novel reasoning-centric training framework to train GLM-4.1V-Thinking in the paper “GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.”

The GLM-4.1V-Thinking architecture leverages a ViT Encoder (AIMv2-Huge) for visual processing, an MLP Projector for feature alignment, and the GLM4 LLM as the decoder. It handles native image and video resolutions and incorporates 3D-RoPE for enhanced spatial and temporal awareness.

The training pipeline progresses through three stages: multimodal pre-training with a diverse knowledge-intensive corpus; supervised fine-tuning using meticulously curated long Chain-of-Thought (CoT) data for reasoning style; and a critical RL phase, that utilizes both Reinforcement Learning with Verifiable Rewards (RLVR) and Human Feedback (RLHF), underpinned by a robust, multi-domain reward system crucial for preventing training collapse.

They further boost RL results using a novel technique called Reinforcement Learning with Curriculum Sampling (RLCS). RLCS dynamically adjusts sampling difficulty based on the model’s evolving competence, significantly boosting learning efficiency.

A list of data on a white background

AI-generated content may be incorrect. — Figure 2. Domain-specific reward design in the RL reward system for GLM-4.1V-Thinking, used to improve its reasoning across a broad range of tasks in several modalities.

Their use of RLCS and domain-specific reward system innovations in RL post-training substantially boosts the model’s performance, with gains of up to 7.3% on reasoning benchmarks. As a result, GLM-4.1V-9B-Thinking demonstrates state-of-the-art performance among models of comparable size:

In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities.

Figure 3. (A) GLM-4.1V-9B-Thinking matches or outperforms the much larger Qwen2.5-VL-72B and the closed-source GPT-4o across a range of tasks. (B) Reinforcement learning substantially boosts the model’s performance, with gains of up to +7.3%.

GLM-4.1V-9B-Thinking is open-source and available on HuggingFace.

Research from Meta AI published in ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context addresses the challenge of systematically teaching LLMs to internalize structured reasoning. They introduce the ASTRO (“Autoregressive Search-Taught Reasoner”) framework, which teaches an LLM to reason like a classical search algorithm, imbuing self-reflection, backtracking, and exploration through the reasoning training process.

The ASTRO framework instills robust reasoning capabilities into LLMs by inserting search processes into training inputs. An ASTRO-trained model generates the entire search trajectory, complete with its twists and turns, as a coherent stream of thought. The key innovation is to make the model internalize the entire search process—including exploration, self-reflection on intermediate steps, and backtracking from errors—within a single, continuous autoregressive generation.

Training a model with ASTRO operates in three key stages. First, ASTRO generates a synthetic dataset of search trajectories by applying Monte Carlo Tree Search (MCTS) to mathematical problem-solving. These search traces are then linearized and converted into natural language Chain-of-Thoughts (CoTs), which crucially injects explicit self-reflection and backtracking phrases into training from the search.

This dataset subsequently informs a supervised fine-tuning (SFT) stage, bootstrapping models with a rich prior for autoregressive search. Finally, reinforcement learning (RL) with verifiable rewards further optimizes the model’s search and reasoning proficiencies.

Applying ASTRO to the Llama 3 family of models yielded significant performance improvements on challenging mathematical reasoning benchmarks. Llama-3.1-70B-ASTRO-RL achieved absolute gains of 16% on MATH-500, 26.9% on AMC 2023, and 20% on AIME 2024, surpassing other advanced baselines. A critical finding is that search-based reasoning traces are essential: Models trained with explicit self-reflection and backtracking significantly outperformed those without.

This paper poses its key question in the title: Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess. Strategic reasoning involves “the ability to plan, anticipate adversary actions, and make decisions in multiagent environments.”

This paper rigorously investigates the capacity to train LLMs for strategic reasoning, by applying RL to an LLM in the domain of chess. It surprisingly concludes that while RL with dense, expert-derived rewards improves tactical performance, LLMs consistently plateau far below human expert levels:

Our experiments show that our distillation-based dense rewards often outperform sparse binary rewards. However, surprisingly, all models plateau far below expert levels.

The experimental setup was designed to isolate the impact of RL on strategic reasoning. The methodology involved fine-tuning Qwen 2.5 and Llama 3.1 models with Group Relative Policy Optimization (GRPO) on a Lichess puzzle dataset. A novel aspect was employing a pre-trained chess expert network to provide dense, continuous reward signals based on move quality, effectively a knowledge distillation process.

A diagram of a computer system

AI-generated content may be incorrect. — Figure 4. Overview of the chess training process. (a) A data sample from the Lichess puzzle dataset is formatted into a prompt that includes the current board state and the set of legal moves. (b) At each GRPO step, the policy model generates multiple rollouts of predicted actions. A reward model evaluates these rollouts with dense feedback, including sub-optimal actions.

The performance of this approach was compared against training with sparse binary rewards. Key results are that distillation-based dense rewards substantially outperform sparse binary rewards, yet all models plateau at 25-30% puzzle accuracy, well below expert performance (60-80%). Even with additional supervised fine-tuning on expert reasoning traces, performance did not improve, as models struggled with basic chess rules and board state comprehension.

This leads to the paper’s critical insight that this failure stems from a deficit in the pretrained model’s internal world model:

“RL alone may not be able to fully overcome [the] deficit in the pretrained models’ internal understanding of chess.”

RL primarily amplifies existing capabilities in pre-trained LLMs rather than teaching new domain knowledge; RL cannot create strategic understanding that does not already exist in the foundation. Since RL cannot impart complex strategic understanding without contextual knowledge, it suggests that adequate domain-specific exposure during pre-training is essential for developing advanced strategic reasoning in complex new environments.

The research paper Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? critically assesses whether LLMs exhibit genuine human-like causal reasoning or merely leverage memorized knowledge. LLMs often appear to demonstrate causal reasoning, correctly identifying cause-and-effect relationships in text, but a critical open question is whether this is genuine reasoning, or a “mirage” created by retrieving causal patterns memorized from training data.

The authors propose a distinction between “level-1” (shallow, knowledge-retrieval based) and “level-2” (genuine, deduction-based, new knowledge generation) causal reasoning, arguing that current LLMs primarily operate at level-1.

They first empirically validate this hypothesis with a new causal Q&A benchmark called CausalProbe-2024. They find that:

The LLMs exhibit a significant performance drop on CausalProbe-2024 compared to earlier benchmarks, indicating the fact that they primarily engage in level-1 causal reasoning.

Figure 5. A diagram illustrates how autoregression fails to capture the correct causal knowledge. The paper argues that the autoregressive, next-token-prediction mechanism at the heart of transformer-based LLMs is not inherently causal.

To bridge this gap in causal reasoning, the paper proposes G2-Reasoner, a framework inspired by human reasoning that integrates external general knowledge via Retrieval-Augmented Generation (RAG) and goal-oriented thinking via prompts to guide LLMs. These steer the model towards a causal inference process.

In evaluations, G2-Reasoner demonstrated that it “significantly enhances LLMs’ causal reasoning capability,” particularly in fresh and counterfactual contexts, outperforming vanilla, CoT, and RAG baselines. This suggests that while LLMs may not possess innate causal reasoning, their capabilities can be substantially enhanced by augmenting them with external knowledge and more structured reasoning frameworks.

While G2-Reasoner offers a promising initial step towards fostering more genuine, deductive causal reasoning, achieving full human-like level-2 capability remains a significant challenge requiring further exploration into broader knowledge integration and sophisticated reasoning mechanisms.

These research papers show that current AI reasoning models are limited in deeper forms of reasoning, such as strategic and causal reasoning. Currently, AI reasoning models learn to reason by getting trained via RL on reasoning traces, a form of “cognitive distillation” that trains the AI model on specific patterns of thinking. This trains models to follow known chains of thought, but it is not sufficient to build models that can think in more complex, deeper, and powerful ways.

We’ll need more breakthroughs to get to AGI-level reasoning. However, these results give possible directions on what those breakthroughs might include:

Breadth: GLM-4.1V-Thinking improves its reasoning with cross-domain reasoning challenges covering a broad range of tasks in several modalities.
Self-learning: ASTRO distills the algorithmic process of MCTS into a natural language format, effectively teaching the model how to bootstrap its own learning.
Knowledge context support: The causal reasoning paper proposes a new framework, G²-Reasoner, which integrates external knowledge via RAG to augment the model’s core capabilities. This may overcome the barrier identified in the chess paper, which showed that RL training cannot overcome gaps in foundation AI model domain knowledge.

Source link

AI Research

How Is AI Changing The Way Students Learn At Business School?

Published

37 minutes ago

July 7, 2025

The Editors

Artificial intelligence is the skill set that employers increasingly want from future hires. Find out how b-schools are equipping students to use AI

In 2025, AI is rapidly reshaping future careers. According to GMAC’s latest Corporate Recruiters Survey, global employers predict that knowledge of AI tools will be the fastest growing essential skill for new business hires over the next five years.

Business students are already seeing AI’s value. More than three-quarters of business schools have already integrated AI into their curricula—from essay writing to personal tutoring, career guidance to soft-skill development.

BusinessBecause hears from current business students about how AI is reshaping the business school learning experience.

The benefits and drawbacks of using AI for essay writing

Many business school students are gaining firsthand experience of using AI to assist their academic work. At Rotterdam School of Management, Erasmus University in the Netherlands, students are required to use AI tools when submitting essays, alongside a log of their interactions.

“I was quite surprised when we were explicitly instructed to use AI for an assignment,” said Lara Harfner, who is studying International Business Administration (IBA) at RSM. “I liked the idea. But at the same time, I wondered what we would be graded on, since it was technically the AI generating the essay.”

Lara decided to approach this task as if she were writing the essay herself. She began by prompting the AI to brainstorm around the topic, research areas using academic studies and build an outline, before asking it to write a full draft.

However, during this process Lara encountered several problems. The AI-generated sources were either non-existent or inappropriate, and the tool had to be explicitly instructed on which concepts to focus on. It tended to be too broad, touching on many ideas without thoroughly analyzing any of them.

“In the end, I felt noticeably less connected to the content,” Lara says. “It didn’t feel like I was the actual author, which made me feel less responsible for the essay, even though it was still my name on the assignment.”

Despite the result sounding more polished, Lara thought she could have produced a better essay on her own with minimal AI support. What’s more, the grades she received on the AI-related assignments were below her usual average. “To me, that shows that AI is a great support tool, but it can’t produce high-quality academic work on its own.”

AI-concerned employers who took part in the Corporate Recruiters Survey echo this finding, stating that they would rather GME graduates use AI as a strategic partner in learning and strategy, than as a source for more and faster content.

How business students use AI as a personal tutor

Daniel Carvalho, a Global Online MBA student, also frequently uses AI in his academic assignments, something encouraged by his professors at Porto Business School (PBS).

However, Daniel treats AI as a personal tutor, asking it to explain complex topics in simple terms and deepen the explanation. On top of this, he uses it for brainstorming ideas, summarizing case studies, drafting presentations and exploring different points of view.

“My MBA experience has shown me how AI, when used thoughtfully, can significantly boost productivity and effectiveness,” he says.

Perhaps one of the most interesting ways Daniel uses AI is by turning course material into a personal podcast. “I convert text-based materials into audio using text-to-speech tools, and create podcast-style recaps to review content in a more conversational and engaging way. This allows me to listen to the materials on the go—in the car or at the gym.”

While studying his financial management course, Daniel even built a custom GPT using course materials. Much like a personal tutor, it would ask him questions about the material, validate his understanding, and explain any questions he got wrong. “This helped reinforce my knowledge so effectively that I was able to correctly answer all multiple-choice questions in the final exam,” he explains.

Similarly, at Villanova School of Business in the US, Master of Science in Business Analytics and AI (MSBAi) students are building personalized AI bots with distinct personalities. Students embed reference materials into the bot which then shape how the bot responds to questions.

“The focus of the program is to apply these analytics and AI skills to improve business results and career outcomes,” says Nathan Coates, MSBAi faculty director at the school. “Employers are increasingly looking for knowledge and skills for leveraging GenAI within business processes. Students in our program learn how AI systems work, what their limitations are, and what they can do better than existing solutions.”

The common limitations of using AI for academic work

Kristiina Esop, who is studying a doctorate in Business Administration and Management at Estonian Business School, agrees that AI in education must always be used critically and with intention. She warns students should always be aware of AI’s limitations.

Kristiina currently uses AI tools to explore different scenarios, synthesize large volumes of information, and detect emerging debates—all of which are essential for her work both academically and professionally.

However, she cautions that AI tools are not 100% accurate. Kristiina once asked ChatGPT to map actors in circular economy governance, and it returned a neat, simplified diagram that ignored important aspects. “That felt like a red flag,” she says. “It reminded me that complexity can’t always be flattened into clean logic. If something feels too easy, too certain—that’s when it is probably time to ask better questions.”

To avoid this problem, Kristiina combines the tools with critical thinking and contextual reading, and connects the findings back to the core questions in her research. “I assess the relevance and depth of the sources carefully,” she says. “AI can widen the lens, but I still need to focus it myself.”

She believes such critical thinking when using AI is essential. “Knowing when to question AI-generated outputs, when to dig deeper, and when to disregard a suggestion entirely is what builds intellectual maturity and decision-making capacity,” she says.

This is also what Wharton management professor Ethan Mollick, author of Co Intelligence: Living and Working with AI and co-director of the Generative AI Lab believes. He says the best way to work with [generative AI] is to treat it like a person. “So you’re in this interesting trap,” he says. “Treat it like a person and you’re 90% of the way there. At the same time, you have to remember you are dealing with a software process.”

Hult International Business School, too, expects its students to use AI in a balanced way, encouraging them to think critically about when and how to use it. For example, Rafael Martínez Quiles, a Master’s in Business Analytics student at Hult, uses AI as a second set of eyes to review his thinking.

“I develop my logic from scratch, then use AI to catch potential issues or suggest improvements,” he explains. “This controlled, feedback-oriented approach strengthens both the final product and my own learning.”

At Hult, students engage with AI to solve complex, real-world challenges as part of the curriculum. “Practical business projects at Hult showed me that AI is only powerful when used with real understanding,” says Rafael. “It doesn’t replace creativity or business acumen, it supports it.”

As vice president of Hult’s AI Society, N-AIble, Rafael has seen this mindset in action. The society’s members explore AI ethically, using it to augment their work, not automate it. “These experiences have made me even more confident and excited about applying AI in the real world,” he says.

The AI learning tools students are using to improve understanding

In other business schools, AI is being used to offer faculty a second pair of hands. Nazarbayev University Graduate School of Business has recently introduced an ‘AI Jockey’. Appearing live on a second screen next to the lecturer’s slides, this AI tool acts as a second teacher, providing real-time clarifications, offering alternate examples, challenging assumptions, and deepening explanations.

“Students gain access to instant, tailored explanations that complement the lecture, enhancing understanding and engagement,” says Dr Tom Vinaimont, assistant professor of finance, Nazarbayev University Graduate School of Business, who uses the AI jockey in his teaching.

Rather than replacing the instructor, the AI enhances the learning experience by adding an interactive, AI-driven layer to traditional teaching, transforming learning into a more dynamic, responsive experience.

“The AI Jockey model encourages students to think critically about information, question the validity of AI outputs, and build essential AI literacy. It helps students not only keep pace with technological change but also prepares them to lead in an AI-integrated world by co-creating knowledge in real time,” says Dr Vinaimont.

How AI can be used to encourage critical thinking among students

So, if you’re looking to impress potential employers, learning to work with AI while a student is a good place to start. But simply using AI tools isn’t enough. You must think critically, solve problems creatively and be aware of AI’s limitations.

Most of all, you must be adaptable. GMAC’s new AI-powered tool, Advancery, helps you find graduate business programs tailored to your career goals, with AI-readiness in mind.

After all, working with AI is a skill in itself. And in 2025, it is a valuable one.

Source link

AI Research

The new frontier of medical malpractice

Published

3 hours ago

July 7, 2025

The Editors

Although the beginnings of modern artificial intelligence (AI) can be traced
as far back as 1956, modern generative AI, the most famous example of which is
arguably ChatGPT, only began emerging in 2019. For better or worse, the steady
rise of generative AI has increasingly impacted the medical field. At this time, AI has begun to advance in a way that creates
potential liability…

Source link