AI Research

Evaluating progress of LLMs on scientific problem-solving

Published

3 months ago

April 3, 2025

Programmatic and model-based evaluations

Tasks in CURIE are varied and have ground-truth annotations in mixed and heterogeneous form, e.g., as JSONs, latex equations, YAML files, or free-form text. Evaluating free-form generation is challenging because answers are often descriptive, and even when a format is specified, as in most of our cases, the response to each field can have differing forms. For example, materials grid points may sometimes be specified as “[p, q, r]” and at other times as “p × q × r”. Hence, in addition to the programmatic evaluation metrics, such as ROUGE-L, intersection-over-inion (used for BIOGR), and identity ratio (used in PDB), we propose two model-based evaluation metrics.

(1) LMScore: Prompts an LLM asking how closely the predictions match ground truth on a 3-point scale: “good” if the prediction has few minor errors, “okay” if there are many minor errors, and “bad” if there are major errors. We consider the weighted average of the log-likelihood scores of the tokens to produce a final confidence.

(2) LLMSim: Is used for retrieval tasks where we ask the model to exhaustively extract many details, e.g., descriptors, properties and values of materials from a research document, and provide as output an unordered list of dictionaries or records. We use a chain-of-thought (CoT) prompt that asks the LLM to look at each ground-truth record and identify the predicted records that correctly match each field (key) and value of the ground truth. Once we match the ground-truth records with predicted records, we can then measure precision and recall for the retrieval task, and compute the mean average precision, recall and F1 scores across all documents.

Source link

AI Research

Scientists create biological ‘artificial intelligence’ system

Published

15 minutes ago

July 8, 2025

The Editors

Credit: Pixabay/CC0 Public Domain

Australian scientists have successfully developed a research system that uses ‘biological artificial intelligence’ to design and evolve molecules with new or improved functions directly in mammal cells. The researchers said this system provides a powerful new tool that will help scientists develop more specific and effective research tools or gene therapies.

Named PROTEUS (PROTein Evolution Using Selection) the system harnesses ‘directed evolution’, a lab technique that mimics the natural power of evolution. However, rather than taking years or decades, this method accelerates cycles of evolution and natural selection, allowing them to create molecules with new functions in weeks.

This could have a direct impact on finding new, more effective medicines. For example, this system can be applied to improve gene editing technology like CRISPR to improve its effectiveness.

“This means PROTEUS can be used to generate new molecules that are highly tuned to function in our bodies, and we can use it to make new medicine that would be otherwise difficult or impossible to make with current technologies.” says co-senior author Professor Greg Neely, Head of the Dr. John and Anne Chong Lab for Functional Genomics at the University of Sydney.

“What is new about our work is that directed evolution primarily work in bacterial cells, whereas PROTEUS can evolve molecules in mammal cells.”

PROTEUS can be given a problem with uncertain solution like when a user feeds in prompts for an artificial intelligence platform. For example the problem can be how to efficiently turn off a human disease gene inside our body.

PROTEUS then uses directed evolution to explore millions of possible sequences that have yet to exist naturally and finds molecules with properties that are highly adapted to solve the problem. This means PROTEUS can help find a solution that would normally take a human researcher years to solve if at all.

The researchers reported they used PROTEUS to develop improved versions of proteins that can be more easily regulated by drugs, and nanobodies (mini versions of antibodies) that can detect DNA damage, an important process that drives cancer. However, they said PROTEUS isn’t limited to this and can be used to enhance the function of most proteins and molecules.

The findings were reported in Nature Communications, with the research performed at the Charles Perkins Centre, the University of Sydney with collaborators from the Centenary Institute.

Unlocking molecular machine learning

The original development of directed evolution, performed first in bacteria, was recognized by the 2018 Noble Prize in Chemistry.

“The invention of directed evolution changed the trajectory of biochemistry. Now, with PROTEUS, we can program a mammalian cell with a genetic problem we aren’t sure how to solve. Letting our system run continuously means we can check in regularly to understand just how the system is solving our genetic challenge,” said lead researcher Dr. Christopher Denes from the Charles Perkins Centre and School of Life and Environmental Sciences

The biggest challenge Dr. Denes and the team faced was how to make sure the mammalian cell could withstand the multiple cycles of evolution and mutations and remain stable, without the system “cheating” and coming up with a trivial solution that doesn’t answer the intended question.

They found the key was using chimeric virus-like particles, a design consisting of taking the outside shell of one virus and combining it with the genes of another virus, which blocked the system from cheating.

The design used parts of two significantly different virus families creating the best of both worlds. The resulting system allowed the cells to process many different possible solutions in parallel, with improved solutions winning and becoming more dominant while incorrect solutions instead disappear.

“PROTEUS is stable, robust and has been validated by independent labs. We welcome other labs to adopt this technique. By applying PROTEUS, we hope to empower the development of a new generation of enzymes, molecular tools and therapeutics,” Dr. Denes said.

“We made this system open source for the research community, and we are excited to see what people use it for, our goals will be to enhance gene-editing technologies, or to fine tune mRNA medicines for more potent and specific effects,” Professor Neely said.

More information:
Alexander J. Cole et al, A chimeric viral platform for directed evolution in mammalian cells, Nature Communications (2025). DOI: 10.1038/s41467-025-59438-2

Provided by
University of Sydney

Citation:
Scientists create biological ‘artificial intelligence’ system (2025, July 8)
retrieved 8 July 2025
from https://medicalxpress.com/news/2025-07-scientists-biological-artificial-intelligence.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

Source link

AI Research

Hungarian Researchers Reveal Why Surprising Experiences Are Key to Learning

Published

1 hour ago

July 8, 2025

Ádám Bráder

Hungarian Researchers Reveal Why Surprising Experiences Are Key to Learning – Hungarian Conservative

Hungarian researchers have used AI-inspired mathematical models to explore how human memory works. Their study shows that surprising experiences play a uniquely important role in learning, challenging older theories about what the brain should remember.

Surprising experiences play a crucial role in learning, say researchers from Hungary’s HUN-REN Wigner Research Centre and Germany’s Max Planck Institute. Using mathematical models developed in artificial intelligence research, they found that unusual events help the brain update its understanding of the world more efficiently than routine experiences.

The findings, published in Nature Reviews Psychology, challenge the traditional view that rare or unexpected memories are less ‘worth storing’. Instead, the study argues that it is precisely these moments—those that deviate just enough from the norm—that serve as anchors for deeper learning.

‘Memory isn’t flawless. Sometimes, we remember things that never actually happened,’ the researchers wrote in a statement by the Hungarian Research Network (HUN-REN). But these recurring ‘mistakes’ can actually help uncover the principles that govern how memory works—and why certain details stick while others fade.

The team, led by Gergő Orbán of the HUN-REN Wigner Centre, and working with Dávid Gergely Nagy and Charley Wu in Tübingen, applied concepts from machine learning to better understand how different human memory systems interact. Instead of simply cataloguing memory errors, their goal was to uncover the logic behind them—specifically how they relate to learning and data compression strategies used by the brain.

‘Information theory helps us understand what’s worth remembering and what’s better forgotten,’ the researchers explained. Traditional information theory might suggest that very rare events aren’t useful to remember—but human memory doesn’t behave this way. On the contrary, people tend to retain surprising experiences more vividly.

The authors conclude that these standout moments play a crucial role in updating what we know. While routine memories help us predict future outcomes, surprising events act as catalysts that refresh our knowledge and adjust our expectations.

In practical terms, the findings also offer valuable insight into how we learn—or teach—most effectively. The researchers argue that machine learning models don’t just help us understand what we’ll remember or forget, but also guide us in optimizing when to repeat a concept and when it’s time to move on to something new.

Related articles:

Ádám Bráder graduated from the Faculty of Humanities of Eötvös Loránd University in 2021 as an English major specializing in English in the Media and Applied Linguistics. From 2017, he worked as an assistant editor at TV2’s news programme. After graduating, he continued his work as an online journalist, which led to him joining the Hungarian Conservative team in 2022.