Connect with us

AI Research

How a Research Lab Made Entirely of LLM Agents Developed Molecules That Can Block a Virus

Published

on


GPT-4o-based agents led by a human collaborated to develop experimentally validated nanobodies against SARS-CoV-2. Discover with me how we are entering the era of automated reasoning and work with actionable outputs, and let’s imagine how this will go even further when robotic lab technicians can also carry out the wet lab parts of any research project!

In an era where artificial intelligence continues to reshape how we code, write, and even reason, a new frontier has emerged: AI conducting real scientific research, as several companies (from major established players like Google to dedicated spin offs) are trying to achieve. And we are not talking just about simulations, automated summarization, data crunching, or theoretical outputs, but actually about producing experimentally validated materials, such as biological designs with potential clinical relevance, in this case that I bring you today.

That future just got much closer; very close indeed!

In a groundbreaking paper just published in Nature by researchers from Stanford and the Chan Zuckerberg Biohub, a novel system called the Virtual Lab demonstrated that a human researcher working with a team of large language model (LLM) agents can design new nanobodies—these are tiny, antibody-like proteins that can bind others to block their function—to target fast-mutating variants of SARS-CoV-2. This was not just a narrow chatbot interaction or a tool-assisted paper; it was an open-ended, multi-phase research process led and executed by AI agents, each having a specialized expertise and role, resulting in real-world validated biological molecules that could perfectly move on to downstream studies for actual applicability in disease (in this case Covid-19) treatment.

Let’s delve into it to see how this is serious, replicable research presenting an approach to AI-human (and actually AI-AI) collaborative science works.

From “simple” applications to materializing a full-staff AI lab

Although there are some precedents to this, the new system is unlike anything before it. And one of the coolest things is that it is not based on a special-purpose trained LLM or tool; rather, it uses GPT-4o instructed with prompts that make it play the role of the different kinds of people often involved in a research team.

Until recently, the role of LLMs in science was limited to question-answering, summarizing, writing support, coding and perhaps some direct data analysis. Useful, yes, but not transformative. The Virtual Lab presented in this new Nature paper changes that by elevating LLMs from assistants to autonomous researchers that interact with one another and with a human user (who brings the research question, runs experiments when required, and eventually concludes the project) in structured meetings to explore, hypothesize, code, analyze, and iterate.

The core idea at the heart of this work was indeed to simulate an interdisciplinary lab staffed by AI agents. Each agent has a scientific role—say, immunologist, computational biologist, or machine learning specialist—and is instantiated from GPT-4o with a “persona” crafted via careful prompt engineering. These agents are led by a Principal Investigator (PI) Agent and monitored by a Scientific Critic Agent, both virtual agents.

The Critic Agent challenges assumptions and pinpoints errors, acting as the lab’s skeptical reviewer; as the paper explored, this was a key element of the workflow without which too many errors and overlooks happened that went in detriment of the project.

The human researcher sets high-level agendas, injects domain constraints, and ultimately runs the outputs (especially the wet-lab experiments). But the “thinking” (maybe I should start considering removing the quotation marks?) is done by the agents.

How it all worked

The Virtual Lab was tasked with solving a real and urgent biomedical challenge: design new nanobody binders for the KP.3 and JN.1 variants of SARS-CoV-2, which had evolved resistance to existing treatments. Instead of starting from scratch, the AI agents decided (yes, they took this decision by themselves) to mutate existing nanobodies that were effective against the ancestral strain but no longer worked as well. Yes, they decided on that!

As all interactions are tracked and documented, we can see exactly how the team moved forward with the project.

First of all, the human defined only the PI and Critic agents. The PI agent then created the scientific team by spawning specialized agents. In this case they were an Immunologist, a Machine Learning Specialist and a Computational Biologist. In a team meeting, the agents debated whether to design antibodies or nanobodies, and whether to design from scratch or mutate existing ones. They chose nanobody mutation, justified by faster timelines and available structural data. They then discussed about what tools to use and how to implement them. They went for the ESM protein language model coupled to AlphaFold multimer for structure prediction and Rosetta for binding energy calculations. For implementation, the agents decided to go with Python code. Very interestingly, the code needed to be reviewed by the Critic agent multiple times, and was refined through multiple asynchronous meetings.

From the meetings and multiple runs of code, an exact strategy was devised on how to propose the final set of mutations to be tested. Briefly, for the bioinformaticians reading this post, the PI agent designed an iterative pipeline that uses ESM to score all point mutations on a nanobody sequence by log-likelihood ratio, selects top mutants to predict their structures in complex with the target protein using AlphaFold-Multimer, scores the interfaces via ipLDDT, then uses Rosetta to estimate binding energy, and finally combines the scores to come up with a ranking of all proposed mutations. This actually was repeated in a cycle, to introduce more mutations as needed.

The computational pipeline generated 92 nanobody sequences that were synthesized and tested in a real-world lab, finding that most of them were actually proteins that can be produced and handled. Two of these proteins gained affinity to the SARS-CoV-2 proteins they were designed to bind, both to modern mutants and to ancestral forms.

These success rates are similar to those coming from analogous projects running in traditional form (that is executed by humans), but they took much much less time to conclude. And it hurts to say it, but I’m pretty sure the virtual lab entailed much lower costs overall, as it involved much fewer people (hence salaries).

Like in a human group of scientists: meetings, roles, and collaboration

We saw above how the Virtual Lab mimics how human science happens: via structured interdisciplinary meetings. Each meeting is either a “Team Meeting”, where multiple agents discuss broad questions (the PI starts, others contribute, the Critic reviews, and the PI summarizes and decides); or an “Individual Meeting” where a single agent (with or without the Critic) works on a specific task, e.g., writing code or scoring outputs.

To avoid hallucinations and inconsistency, the system also uses parallel meetings; that is, the same task is run multiple times with different randomness (i.e. at high “temperature”). It is interesting that the outcomes from these several meetings is the condensed in a single low-temperature merge meeting that is much more deterministic and can quite safely decide which conclusions, among all coming from the various meetings, make more sense.

Clearly, these ideas can be applied to any other kind of multi-agent interaction, and for any purpose!

How much did the humans do?

Surprisingly little, for the computational part of course–as the experiments can’t be so much automated yet, though keep reading to find some reflections about robots in labs!

In this Virtual Lab round, LLM agents wrote 98.7% of the total words (over 120,000 tokens), while the human researcher contributed just 1,596 words in total across the entire project. The Agents wrote all the scripts for ESM, AlphaFold-Multimer post-processing, and Rosetta XML workflows. The human only helped running the code and facilitated the real-world experiments. The Virtual Lab pipeline was built in 1-2 days of prompting and meetings, and the nanobody design computation ran in ~1 week.

Why this matters (and what comes next)

The Virtual Lab could serve as the prototype for a fundamentally new research model–and actually for a fundamentally new way to work, where everything that can be done on a computer is left automated, with humans only taking the very critical decisions. LLMs are clearly shifting from passive tools to active collaborators that, as the Virtual Lab shows, can drive complex, interdisciplinary projects from idea to implementation.

The next ambitious leap? Replace the hands of human technicians, who ran the experiments, with robotic ones. Clearly, the next frontier is in automatizing the physical interaction with the real world, which is essentially what robots are. Imagine then the full pipeline as applied to a research lab:

  • A human PI defines a high-level biological goal.
  • The team does research of existing information, scans databases, brainstorms ideas.
  • A set of AI agents selects computational tools if required, writes and runs code and/or analyses, and finally proposes experiments.
  • Then, robotic lab technicians, rather than human technicians, carry out the protocols: pipetting, centrifuging, plating, imaging, data collection.
  • The results flow back into the Virtual Lab, closing the loop.
  • Agents analyze, adapt, iterate.

This would make the research process truly end-to-end autonomous. From problem definition to experiment execution to result interpretation, all components would be run by an integrated AI-robotics system with minimal human intervention—just high-level steering, supervision, and global vision.

Robotic biology labs are already being prototyped. Emerald Cloud Lab, Strateos, and Transcriptic (now part of Colabra) offer robotic wet-lab-as-a-service. Future House is a non-profit building AI agents to automate research in biology and other complex sciences. In academia, some autonomous chemistry labs exist whose robots can explore chemical space on their own. Biofoundries use programmable liquid handlers and robotic arms for synthetic biology workflows. Adaptyv Bio automates protein expression and testing at scale.

Such kinds of automatized laboratory systems coupled to emerging systems like Virtual Lab could radically transform how science and technology progress. The intelligent layer drives the project and give those robots work to do, whose output then feeds back into the thinking tool in a closed-loop discovery engine that would run 24/7 without fatigue or scheduling conflicts, conduct hundreds or thousands of micro-experiments in parallel, and rapidly explore vast hypothesis spaces that are just not feasible for human labs. Moreover, the labs, virtual labs, and managers don’t even need to be physically together, allowing to optimize how resources are spread.

There are challenges, of course. Real-world science is messy and nonlinear. Robotic protocols must be incredibly robust. Unexpected errors still need judgment. But as robotics and AI continue to evolve, those gaps will certainly shrink.

Final thoughts

We humans were always confident that technology in the form of smart computers and robots would kick us out of our highly repetitive physical jobs, while creativity and thinking would still be our domain of mastery for decades, perhaps centuries. However, despite quite some automation via robots, the AI of the 2020s has shown us that technology can also be better than us at some of our most brain-intensive jobs.

In the near future, LLMs don’t just answer our questions or barely help us with work. They will ask, argue, debate, decide. And sometimes, they will discover!

References and further reads

The Nature paper analyzed here:

https://www.nature.com/articles/s41586-025-09442-9

Other scientific discoveries by AI systems:

https://pub.towardsai.net/two-new-papers-by-deepmind-exemplify-how-artificial-intelligence-can-help-human-intelligence-ae5143f07d49



Source link

AI Research

Physicians Lose Cancer Detection Skills After Using Artificial Intelligence

Published

on


Artificial intelligence shows great promise in helping physicians improve both their diagnostic accuracy of important patient conditions. In the realm of gastroenterology, AI has been shown to help human physicians better detect small polyps (adenomas) during colonoscopy. Although adenomas are not yet cancerous, they are at risk for turning into cancer. Thus, early detection and removal of adenomas during routine colonoscopy can reduce patient risk of developing future colon cancers.

But as physicians become more accustomed to AI assistance, what happens when they no longer have access to AI support? A recent European study has shown that physicians’ skills in detecting adenomas can deteriorate significantly after they become reliant on AI.

The European researchers tracked the results of over 1400 colonoscopies performed in four different medical centers. They measured the adenoma detection rate (ADR) for physicians working normally without AI vs. those who used AI to help them detect adenomas during the procedure. In addition, they also tracked the ADR of the physicians who had used AI regularly for three months, then resumed performing colonoscopies without AI assistance.

The researchers found that the ADR before AI assistance was 28% and with AI assistance was 28.4%. (This was a slight increase, but not statistically significant.) However, when physicians accustomed to AI assistance ceased using AI, their ADR fell significantly to 22.4%. Assuming the patients in the various study groups were medically similar, that suggests that physicians accustomed to AI support might miss over a fifth of adenomas without computer assistance!

This is the first published example of so-called medical “deskilling” caused by routine use of AI. The study authors summarized their findings as follows: “We assume that continuous exposure to decision support systems such as AI might lead to the natural human tendency to over-rely on their recommendations, leading to clinicians becoming less motivated, less focused, and less responsible when making cognitive decisions without AI assistance.”

Consider the following non-medical analogy: Suppose self-driving car technology advanced to the point that cars could safely decide when to accelerate, brake, turn, change lanes, and avoid sudden unexpected obstacles. If you relied on self-driving technology for several months, then suddenly had to drive without AI assistance, would you lose some of your driving skills?

Although this particular study took place in the field of gastroenterology, I would not be surprised if we eventually learn of similar AI-related deskilling in other branches of medicine, such as radiology. At present, radiologists do not routinely use AI while reading mammograms to detect early breast cancers. But when AI becomes approved for routine use, I can imagine that human radiologists could succumb to a similar performance loss if they were suddenly required to work without AI support.

I anticipate more studies will be performed to investigate the issue of deskilling across multiple medical specialties. Physicians, policymakers, and the general public will want to ask the following questions:

1) As AI becomes more routinely adopted, how are we tracking patient outcomes (and physician error rates) before AI, after routine AI use, and whenever AI is discontinued?

2) How long does the deskilling effect last? What methods can help physicians minimize deskilling, and/or recover lost skills most quickly?

3) Can AI be implemented in medical practice in a way that augments physician capabilities without deskilling?

Deskilling is not always bad. My 6th grade schoolteacher kept telling us that we needed to learn long division because we wouldn’t always have a calculator with us. But because of the ubiquity of smartphones and spreadsheets, I haven’t done long division with pencil and paper in decades!

I do not see AI completely replacing human physicians, at least not for several years. Thus, it will be incumbent on the technology and medical communities to discover and develop best practices that optimize patient outcomes without endangering patients through deskilling. This will be one of the many interesting and important challenges facing physicians in the era of AI.



Source link

Continue Reading

AI Research

AI exposes 1,000+ fake science journals

Published

on


A team of computer scientists led by the University of Colorado Boulder has developed a new artificial intelligence platform that automatically seeks out “questionable” scientific journals.

The study, published Aug. 27 in the journal “Science Advances,” tackles an alarming trend in the world of research.

Daniel Acuña, lead author of the study and associate professor in the Department of Computer Science, gets a reminder of that several times a week in his email inbox: These spam messages come from people who purport to be editors at scientific journals, usually ones Acuña has never heard of, and offer to publish his papers — for a hefty fee.

Such publications are sometimes referred to as “predatory” journals. They target scientists, convincing them to pay hundreds or even thousands of dollars to publish their research without proper vetting.

“There has been a growing effort among scientists and organizations to vet these journals,” Acuña said. “But it’s like whack-a-mole. You catch one, and then another appears, usually from the same company. They just create a new website and come up with a new name.”

His group’s new AI tool automatically screens scientific journals, evaluating their websites and other online data for certain criteria: Do the journals have an editorial board featuring established researchers? Do their websites contain a lot of grammatical errors?

Acuña emphasizes that the tool isn’t perfect. Ultimately, he thinks human experts, not machines, should make the final call on whether a journal is reputable.

But in an era when prominent figures are questioning the legitimacy of science, stopping the spread of questionable publications has become more important than ever before, he said.

“In science, you don’t start from scratch. You build on top of the research of others,” Acuña said. “So if the foundation of that tower crumbles, then the entire thing collapses.”

The shake down

When scientists submit a new study to a reputable publication, that study usually undergoes a practice called peer review. Outside experts read the study and evaluate it for quality — or, at least, that’s the goal.

A growing number of companies have sought to circumvent that process to turn a profit. In 2009, Jeffrey Beall, a librarian at CU Denver, coined the phrase “predatory” journals to describe these publications.

Often, they target researchers outside of the United States and Europe, such as in China, India and Iran — countries where scientific institutions may be young, and the pressure and incentives for researchers to publish are high.

“They will say, ‘If you pay $500 or $1,000, we will review your paper,'” Acuña said. “In reality, they don’t provide any service. They just take the PDF and post it on their website.”

A few different groups have sought to curb the practice. Among them is a nonprofit organization called the Directory of Open Access Journals (DOAJ). Since 2003, volunteers at the DOAJ have flagged thousands of journals as suspicious based on six criteria. (Reputable publications, for example, tend to include a detailed description of their peer review policies on their websites.)

But keeping pace with the spread of those publications has been daunting for humans.

To speed up the process, Acuña and his colleagues turned to AI. The team trained its system using the DOAJ’s data, then asked the AI to sift through a list of nearly 15,200 open-access journals on the internet.

Among those journals, the AI initially flagged more than 1,400 as potentially problematic.

Acuña and his colleagues asked human experts to review a subset of the suspicious journals. The AI made mistakes, according to the humans, flagging an estimated 350 publications as questionable when they were likely legitimate. That still left more than 1,000 journals that the researchers identified as questionable.

“I think this should be used as a helper to prescreen large numbers of journals,” he said. “But human professionals should do the final analysis.”

A firewall for science

Acuña added that the researchers didn’t want their system to be a “black box” like some other AI platforms.

“With ChatGPT, for example, you often don’t understand why it’s suggesting something,” Acuña said. “We tried to make ours as interpretable as possible.”

The team discovered, for example, that questionable journals published an unusually high number of articles. They also included authors with a larger number of affiliations than more legitimate journals, and authors who cited their own research, rather than the research of other scientists, to an unusually high level.

The new AI system isn’t publicly accessible, but the researchers hope to make it available to universities and publishing companies soon. Acuña sees the tool as one way that researchers can protect their fields from bad data — what he calls a “firewall for science.”

“As a computer scientist, I often give the example of when a new smartphone comes out,” he said. “We know the phone’s software will have flaws, and we expect bug fixes to come in the future. We should probably do the same with science.”

Co-authors on the study included Han Zhuang at the Eastern Institute of Technology in China and Lizheng Liang at Syracuse University in the United States.



Source link

Continue Reading

AI Research

The Artificial Intelligence Is In Your Home, Office And The IRS Edition

Published

on




Source link

Continue Reading

Trending