AI Research

Evaluating and enhancing probabilistic reasoning in language models

Published

11 months ago

October 21, 2024

To understand the probabilistic reasoning capabilities of three state-of-the-art LLMs (Gemini, GPT family models), we define three distinct tasks: estimating percentiles, drawing samples, and calculating probabilities. These tasks reflect key aspects of interpreting probability distributions, such as understanding where a sample falls within a distribution (percentiles), generating representative data (sampling), and assessing the likelihood of outcomes (probabilities). By testing these abilities, we aimed to assess how well LLMs can reason over both idealized and real-world distributions.

Since no publicly available dataset existed for LLM-based probabilistic reasoning, we developed a new dataset combining real-world and idealized distributions. For the real-world distributions, data was collected from three domains: health, finance, and climate. The health data were de-identified and sampled from 100,000 Fitbit users in the U.S. aged 18–65 who consented to their data being used for research. These data included metrics like step count, resting heart rate, sleep duration, and exercise minutes. Financial data were obtained from the U.S. Census Bureau’s American Community Survey, and climate data came from NOAA’s Global Historical Climatology Network. The datasets were manually curated to ensure relevant filtering (e.g., erroneous data removal).

In addition, we programmatically generated idealized distributions using Python libraries to complement the real-world data and better test the probabilistic reasoning capabilities of language models. While we generated 12 idealized distributions, this blog post will focus on three: normal, log normal, and power law. See the paper to learn about all of the generated distributions.

We evaluated Gemini, GPT family models on the three tasks using 12 idealized distributions and 12 real-world distributions. To enhance probabilistic reasoning, we explored three strategies for providing more context to the LLMs:

Anchoring examples from within a distribution or its family: We provided anchoring examples from the same distribution or related distributions. For instance, when estimating percentiles for a normal distribution, we included examples from the same distribution with different value–percentile pairs, allowing the model to interpolate and make more accurate predictions.
Adding real-world context: We added real-world context by introducing domain-specific data, such as U.S. rental prices from the American Community Survey when estimating the percentile of monthly rent values. This enabled the model to reason using practical, real-world information.
Leveraging summary statistics to approximate a normal distribution: We used summary statistics and normal approximations to simplify complex distributions. For example, income data, which typically follows a power law distribution, was approximated as normal to help the model make reasonably accurate predictions despite the complexity of the actual, underlying distribution.

Source link

AI Research

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment – The Standard (HK)

Published

2 hours ago

September 17, 2025

The Standard 英文虎報

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment The Standard (HK)

Source link

AI Research

[2506.08171] Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models

Published

2 hours ago

September 17, 2025

Daniel Koh, Yannic Noller, Corina S. Pasareanu, Adrians Skapars, Youcheng Sun

[Submitted on 9 Jun 2025 (v1), last revised 16 Sep 2025 (this version, v2)]

View a PDF of the paper titled Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models, by Daniel Koh and 4 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) have demonstrated strong performance on coding tasks such as generation, completion and repair, but their ability to handle complex symbolic reasoning over code still remains underexplored. We introduce the task of worst-case symbolic constraints analysis, which requires inferring the symbolic constraints that characterise worst-case program executions; these constraints can be solved to obtain inputs that expose performance bottlenecks or denial-of-service vulnerabilities in software systems. We show that even state-of-the-art LLMs (e.g., GPT-5) struggle when applied directly on this task. To address this challenge, we propose WARP, an innovative neurosymbolic approach that computes worst-case constraints on smaller concrete input sizes using existing program analysis tools, and then leverages LLMs to generalise these constraints to larger input sizes. Concretely, WARP comprises: (1) an incremental strategy for LLM-based worst-case reasoning, (2) a solver-aligned neurosymbolic framework that integrates reinforcement learning with SMT (Satisfiability Modulo Theories) solving, and (3) a curated dataset of symbolic constraints. Experimental results show that WARP consistently improves performance on worst-case constraint reasoning. Leveraging the curated constraint dataset, we use reinforcement learning to fine-tune a model, WARP-1.0-3B, which significantly outperforms size-matched and even larger baselines. These results demonstrate that incremental constraint reasoning enhances LLMs’ ability to handle symbolic reasoning and highlight the potential for deeper integration between neural learning and formal methods in rigorous program analysis.

Submission history

From: Daniel Koh [view email]
[v1]
Mon, 9 Jun 2025 19:33:30 UTC (1,462 KB)
[v2]
Tue, 16 Sep 2025 10:35:33 UTC (1,871 KB)

Source link

AI Research

‘AI Learning Day’ spotlights smart campus and ecosystem co-creation

Published

3 hours ago

September 17, 2025

The Editors

When artificial intelligence (AI) can help you retrieve literature, support your research, and even act as a “super assistant”, university education is undergoing a profound transformation.

On 9 September, XJTLU’s Centre for Knowledge and Information (CKI) hosted its third AI Learning Day, themed “AI-Empowered, Ecosystem-Co-created”. The event showcased the latest milestones of the University’s “Education + AI” strategy and offered in-depth discussions on the role of AI in higher education.

In her opening remarks, Professor Qiuling Chao, Vice President of XJTLU, said: “AI offers us an opportunity to rethink education, helping us create a learning environment that is fairer, more efficient and more personalised. I hope today’s event will inspire everyone to explore how AI technologies can be applied in your own practice.”

Professor Qiuling Chao

In his keynote speech, Professor Youmin Xi, Executive President of XJTLU, elaborated on the University’s vision for future universities. He stressed that future universities would evolve into human-AI symbiotic ecosystems, where learning would be centred on project-based co-creation and human-AI collaboration. The role of educators, he noted, would shift from transmitters of knowledge to mentors for both learning and life.

Professor Youmin Xi

At the event, Professor Xi’s digital twin, created by the XJTLU Virtual Engineering Centre in collaboration with the team led by Qilei Sun from the Academy of Artificial Intelligence, delivered Teachers’ Day greetings to all staff.

(Teachers’ Day message from President Xi’s digital twin)

“Education + AI” in diverse scenarios

This event also highlighted four case studies from different areas of the University. Dr Ling Xia from the Global Cultures and Languages Hub suggested that in the AI era, curricula should undergo de-skilling (assigning repetitive tasks to AI), re-skilling, and up-skilling, thereby enabling students to focus on in-depth learning in critical thinking and research methodologies.

Dr Xiangyun Lu from International Business School Suzhou (IBSS) demonstrated how AI teaching assistants and the University’s Junmou AI platform can offer students a customised and highly interactive learning experience, particularly for those facing challenges such as information overload and language barriers.

Dr Juan Li from the School of Science shared the concept of the “AI amplifier” for research. She explained that the “double amplifier” effect works in two stages: AI first amplifies students’ efficiency by automating tasks like literature searches and coding. These empowered students then become the second amplifier, freeing mentors from routine work so they can focus on high-level strategy. This human-AI partnership allows a small research team to achieve the output of a much larger one.

Jing Wang, Deputy Director of the XJTLU Learning Mall, showed how AI agents are already being used to support scheduling, meeting bookings, news updates and other administrative and learning tasks. She also announced that from this semester, all students would have access to the XIPU AI Agent platform.

Students and teachers are having a discussion at one of the booths

AI education system co-created by staff and students

The event’s AI interactive zone also drew significant attention from students and staff. From the Junmou AI platform to the E

-Support chatbot, and from AI-assisted creative design to 3D printing, 10 exhibition booths demonstrated the integration of AI across campus life.

These innovative applications sparked lively discussions and thoughtful reflections among participants. In an interview, Thomas Durham from IBSS noted that, although he had rarely used AI before, the event was highly inspiring and motivated him to explore its use in both professional and personal life. He also shared his perspective on AI’s role in learning, stating: “My expectation for the future of AI in education is that it should help students think critically. My worry is that AI’s convenience and efficiency might make students’ understanding too superficial, since AI does much of the hard work for them. Hopefully, critical thinking will still be preserved.”

Year One student Zifei Xu was particularly inspired by the interdisciplinary collaboration on display at the event, remarking that it offered her a glimpse of a more holistic and future-focused education.

Dr Xin Bi, XJTLU’s Chief Officer of Data and Director of the CKI, noted that, supported by robust digital infrastructure such as the Junmou AI platform, more than 26,000 students and 2,400 staff are already using the University’s AI platforms. XJTLU’s digital transformation is advancing from informatisation and digitisation towards intelligentisation, with AI expected to empower teaching, research and administration, and to help staff and students leap from knowledge to wisdom.

Dr Xin Bi

“Looking ahead, we will continue to advance the deep integration of AI in education, research, administration and services, building a data-driven intelligent operations centre and fostering a sustainable AI learning ecosystem,” said Dr Xin Bi.

By Qinru Liu

Edited by Patricia Pieterse

Translated by Xiangyin Han

Source link