AI Research
NIH researchers develop AI agent that improves accuracy of gene set analysis by leveraging expert-curated databases

Monday, July 28, 2025
Researchers at the National Institutes of Health (NIH) have developed an artificial intelligence (AI) agent powered by a large language model (LLM) that creates more accurate and informative descriptions of biological processes and their functions in gene set analysis than current systems.
The system, called GeneAgent, cross-checks its own initial predictions—also known as claims—for accuracy against information from established, expert-curated databases and returns a verification report detailing its successes and failures. The AI agent can help researchers interpret high-throughput molecular data and identify relevant biological pathways or functional modules, which can lead to a better understanding of how different diseases and conditions affect groups of genes individually and together.
AI-generated content is produced by LLMs trained on enormous amounts of text data from across the internet. LLMs use those data to recognize patterns and predict what words might follow each other in a sentence. However, LLMs are not designed to verify truth, meaning AI-generated content can be false, misleading, or fabricated, a phenomenon called AI hallucinations. Additionally, LLMs are prone to circular reasoning—fact-checking their generated results against their own data—which makes them sound more confident in the output even when the information is false.
Staving off AI hallucinations is important when using LLM tools for gene set analysis—the process of generating collective functional descriptions of grouped genes and their potential interactions. Previous studies that taught LLMs to answer genomic questions or summarize biological processes in a given gene set did not explicitly address hallucinations in the generated content.
GeneAgent mitigates this issue by taking its own claims and independently comparing them to established knowledge compiled in external, expert-curated databases. The research team first tested GeneAgent on 1,106 gene sets sourced from existing databases with known functions and process names. For each gene set, GeneAgent first generated an initial list of functional claims. It then independently used its self-verification agent module to cross-check these claims against the curated databases and create a verification report that noted whether each of its claims was supported, partially supported, or refuted.
To best determine its accuracy in the self-verification step, the researchers next brought in two human experts to manually review 10 randomly selected gene sets with a cumulative 132 claims and judge whether GeneAgent’s self-verification reports were correct, partially correct, or incorrect. Of the self-verification reports generated by GeneAgent, the experts determined that 92% of its decisions were correct, indicating high performance in its ability to conduct self-verification, especially when compared to GPT-4. Their detailed review confirmed the model’s effectiveness in minimizing hallucinations and generating more reliable analytical narratives.
The research team also looked at real-world application of GeneAgent on animal-model gene sets. When applied to seven novel gene sets derived from mouse melanoma cell lines, GeneAgent was able to offer valuable insight into novel functionalities for specific genes. This could mean knowledge discovery for things such as potential new drug targets for diseases like cancer.
While LLMs such as GeneAgent are still limited by the information they can use and their inability to reason as humans, GeneAgent’s ability for self-driven fact-checking shows remarkable promise in mitigating AI hallucinations.
About the National Library of Medicine (NLM): NLM is a leader in research in biomedical informatics and data science and the world’s largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. It creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information. Additional information is available at https://www.nlm.nih.gov.
About the National Institutes of Health (NIH): NIH, the nation’s medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit www.nih.gov.
NIH…Turning Discovery Into Health®
Reference
Wang, Z., Jin, Q., Wei, CH. et al. GeneAgent: self-verification language agent for gene-set analysis using domain databases. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02748-6
AI Research
The Machine Learning Lessons I’ve Learned This Month

in machine learning are the same.
Coding, waiting for results, interpreting them, returning back to coding. Plus, some intermediate presentations of one’s progress. But, things mostly being the same does not mean that there’s nothing to learn. Quite on the contrary! Two to three years ago, I started a daily habit of writing down lessons that I learned from my ML work. In looking back through some of the lessons from this month, I found three practical lessons that stand out:
- Keep logging simple
- Use an experimental notebook
- Keep overnight runs in mind
Keep logging simple
For years, I used Weights & Biases (W&B)* as my go-to experiment logger. In fact, I have once been in the top 5% of all active users. The stats in below figure tell me that, at that time, I’ve trained close to 25000 models, used a cumulative 5000 hours of compute, and did more than 500 hyperparameter searches. I used it for papers, for big projects like weather prediction with large datasets, and for tracking countless small-scale experiments.
And W&B really is a great tool: if you want beautiful dashboards and are collaborating** with a team, W&B shines. And, until recently, while reconstructing data from trained neural networks, I ran multiple hyperparameter sweeps and W&B’s visualization capabilities were invaluable. I could directly compare reconstructions across runs.
But I realized that for most of my research projects, W&B was overkill. I rarely revisited individual runs, and once a project was done, the logs just sat there, and I did nothing with them ever after. When I then refactored the mentioned data reconstruction project, I thus explicitly removed the W&B integration. Not because anything was wrong with it, but because it wasn’t necessary.
Now, my setup is much simpler. I just log selected metrics to CSV and text files, writing directly to disk. For hyperparameter searches, I rely on Optuna. Not even the distributed version with a central server — just local Optuna, saving study states to a pickle file. If something crashes, I reload and continue. Pragmatic and sufficient (for my use cases).
The key insight here is this: logging is not the work. It’s a support system. Spending 99% of your time deciding on what you want to log — gradients? weights? distributions? and at which frequency? — can easily distract you from the actual research. For me, simple, local logging covers all needs, with minimal setup effort.
Maintain experimental lab notebooks
In December 1939, William Shockley wrote down an idea into his lab notebook: replace vacuum tubes with semiconductors. Roughly 20 years later, Shockley and two colleagues at Bell Labs were awarded Nobel Prizes for the invention of the modern transistor.
While most of us aren’t writing Nobel-worthy entries into our notebooks, we can still learn from the principle. Granted, in machine learning, our laboraties don’t have chemicals or test tubes, as we all envision when we think about a laboratory. Instead, our labs often are our computers; the same device that I use to write these lines has trained countless models over the years. And these labs are inherently portably, especially when we are developing remotely on high-performance compute clusters. Even better, thanks to highly-skilled administrative stuff, these clusters are running 24/7 — so there’s always time to run an experiment!
But, the question is, which experiment? Here, a former colleague introduced me to the idea of mainting a lab notebook, and lately I’ve returned to it in the simplest form possible. Before starting long-running experiments, I write down:
what I’m testing, and why I’m testing it.
Then, when I come back later — usually the next morning — I can immediately see which results are ready and what I had hoped to learn. It’s simple, but it changes the workflow. Instead of just “rerun until it works,” these dedicated experiments become part of a documented feedback loop. Failures are easier to interpret. Successes are easier to replicate.
Run experiments overnight
That’s a small, but painful lessons that I (re-)learned this month.
On a Friday evening, I discovered a bug that might affect my experiment results. I patched it and reran the experiments to validate. By Saturday morning, the runs had finished — but when I inspected the results, I realized I had forgotten to include a key ablation. Which meant … another full day of waiting.
In ML, overnight time is precious. For us programmers, it’s rest. For our experiments, it’s work. If we don’t have an experiment running while we sleep, we’re effectively wasting free compute cycles.
That doesn’t mean you should run experiments just for the sake of it. But whenever there is a meaningful one to launch, starting them in the evening is the perfect time. Clusters are often under-utilized and resources are more quickly available, and — most importantly — you will have results to analyse the next morning.
A simple trick is to plan this deliberately. As Cal Newport mentions in his book “Deep Work”, good workdays start the night before. If you know tomorrow’s tasks today, you can set up the right experiments in time.
* That ain’t bashing W&B (it would have been the same with, e.g., MLFlow), but rather asking users to evaluate what their project goals are, and then spend the majority of time on pursuing that goals with utmost focus.
** Footnote: mere collaborating is in my eyes not enough to warrant using such shared dashboards. You need to gain more insights from such shared tools than the time spent setting them up.
AI Research
How is artificial intelligence affecting job searches?

Artificial intelligence programs like ChatGPT use AI to do thinking or writing or creating for you. Pretty amazing, but also a little terrifying. What happens to the people who used to do those jobs?
Olivia Fair graduated four years ago. “I’ve applied to probably over a hundred jobs in the past, I don’t know, six months,” she said. “And yeah, none of them are landing.”
She’s had a series of short-term jobs – one was in TV production, transcribing interviews. “But now they don’t have a bunch of people transcribing,” she said. “They have maybe one person overseeing all of that, and AI doing the rest. Which I think is true for a lot of entry-level positions. And it can be a very useful tool for those people doing that work. But then there’s less people needed.”
According to Laura Ullrich, director of economic research at Indeed, the job-listings website, job postings have declined year over year by 6.7 percent. “This is a tough year,” she said. “Younger job seekers, specifically those who are recent grads, are having a harder time finding work.”
Asked if there is a correlation between the rise in AI and the decline in jobs for recent graduates, Ullrich said, “I think there is a cause-and-effect, but it’s maybe not as significant as a lot of people would think. If you look specifically at tech jobs, job postings are down 36% compared to pre-pandemic numbers. But that decline started happening prior to AI becoming commonly used.”
Ullrich said in 2021-22, as the effects of the COVID pandemic began to ebb, there was a hiring boom in some sectors, including tech: “Quite frankly, I think some companies overhired,” she said.
The uncertain national situation (tariffs, taxes, foreign policy) doesn’t help, either. Ullrich said, “Some other people have used the analogy of, like, driving through fog. If it’s foggy, you slow down a bit. But if it’s really foggy, you pull over. And unfortunately, some companies have pulled over to sit and wait to see what is gonna happen.”
That sounds a little more nuanced than some recent headlines, which make it pretty clear that AI is taking jobs:
“I read today an interview with a guy who said, you know, ‘By 2027, we will be jobless, lonely, crime on the streets,'” said David Autor, a labor economist at MIT. “And I said, ‘How do I take the other side of that bet?’ ‘Cause that’s just not true. I’m sure of that. My view is, look, there is great potential and great risk. I think that it’s not nearly as imminent on either direction as most people think.”
I said, “But what it does seem to do is relieve the newcomers, the beginning, incoming novices we don’t need anymore.”
“This is really a concern,” Autor said. “Judgment, expertise, it’s acquired slowly. It’s a product of immersion, right? You know, how do I care for this patient, or land this plane, or remodel this building? And it’s possible that we could strip out so much of the supporting work, that people never get the expertise. I don’t think it’s an insurmountable concern. But we shouldn’t take for granted that it will solve itself.”
Let’s cut to the chase. What are the jobs we’re going to lose? Laura Ullrich said, “We analyzed 2,800 specific skills, and 30% of them could be, at least partially, done by AI.” (Which means, 70% of job skills are not currently at risk of AI.)
So, which jobs will AI be likely to take first? Most of it is jobs in front of a screen:
- Coding
- Accounting
- Copy writing
- Translation
- Customer service
- Paralegal work
- Illustration
- Graphic design
- Songwriting
- Information management
As David Autor puts it: “What will market demand be for this thing? How much should we order? How much should we keep in stock?”
AI will have a much harder time taking jobs requiring empathy, creative thinking, or physicality:
- Healthcare
- Teaching
- Social assistance
- Mental health
- Police and fire
- Engineering
- Construction
- Wind and solar
- Tourism
- Trades (like plumbing and electrical)
And don’t forget about the new job categories that AI will create. According to Autor, “A lot of the work that we do is in things that we just didn’t do, you know, 50 or 100 years ago – all this work in solar and wind generation, all types of medical specialties that were unthinkable.”
I asked, “You can’t sit here and tell me what the new fields and jobs will be?”
“No. We’re bad at predicting where new work will appear, what skills it will need, how much there will be,” Autor said, adding, “There will be new things, absolutely.”
“So, it sounds like you don’t think we are headed to becoming a nation of people who cannot find any work, who spend the day on the couch watching Netflix?”
Autor said, “No, I don’t see that. Of course, people will be displaced, certain types of occupations will disappear. People will lose careers. That’s going to happen. But we might actually get much better at medicine. We might figure out a way to generate energy more cheaply and with less pollution. We might figure out a better way to do agriculture that isn’t land-intensive and so ecologically intensive.”
Whatever is going to happen, will likely take a while to happen. The latest headlines look like these:
Until then, Laura Ullrich has some advice for young job seekers: “The number one piece of advice I would give is, move forward. So, whether that is getting another job, getting a part-time job, finding a post-graduate internship – reach out to the professors that you had. They have a whole network of former students, right? Reach out to other alumni who graduated from the school you went to, or majored in the same thing you majored in. It might be what gets you a job this year.”
So far, Olivia Fair is doing all of the above. I asked her, “You’re interested in creativity and writing and production. So let me hear, as a human, your pitch, why you’d be better than AI doing those jobs?”
“Okay,” Fair replied. “Hmm. I’m a person, and not a robot?”
For more info:
Story produced by Gabriel Falcon. Editor: Chad Cardin.
See also:
AI Research
Bay Area home sales are cooling — but AI-bolstered SF is heating up

The Bay Area housing market is in something of a lull, with sales down slightly this year compared with 2024. But sales in San Francisco are on the rise, a trend real estate agents attribute to the artificial intelligence boom and renewed optimism about the city’s future.
The number of Bay Area homes, including condominiums and co-ops, sold from January to July is down more than 2% from the same period last year, according to data from online real estate brokerage Redfin. But in San Francisco, sales are up 5%, rising from about 2,870 in 2024 to 3,010 in 2025.
The rise of AI companies in San Francisco, and the city’s affordable housing shortage, has already contributed to a surge in rents. While the growth in sales hasn’t yet translated to rising home prices — the city’s typical home value of $1.27 million is about 1% lower than it was last year, according to real estate company Zillow — some real estate agents believe that could soon change.
“I don’t want to say (the market’s) hot, but it’s very, very, very warm,” said Ruth Krishnan, a San Francisco real estate agent with Compass.
When mortgage rates spiked in 2022, home sales in San Francisco — and just about everywhere else — plummeted. Markets heated up again after those rates dipped in 2023, with prices soaring in Silicon Valley, where return-to-office policies and strong tech stock growth drove competition among buyers. This year, the market cooled again, thanks to a combination of tech layoffs, volatility in the stock market and the unlikelihood that mortgage rates will fall much further.
But San Francisco is proving to be a partial exception. Sales have gradually recovered over the past two years, even nearing pre-pandemic levels. Krishnan said renewed enthusiasm about the city and its new mayor, as well as AI companies’ move to the city, have led to a jump in sales. Some buyers may also be capitalizing on the dip in San Francisco home prices suspecting that prices will soon increase, she added.
The single-family home market is the primary driver behind the increase in sales, Redfin data shows, with condo sales practically flat from last year. Condo listings in San Francisco and San Mateo County have actually dipped, said Redfin senior economist Asad Khan, possibly indicating that some condo sellers have simply given up on finding a buyer. A May report by the company showed that more than a third of for-sale condos in the San Francisco metropolitan area were at risk of selling at a loss.
The condo market could eventually shift as competition for homes near tech offices heats up, said Patrick Carlisle, chief market analyst at Compass. But the biggest impact — at least at first — will probably be on rents, he added.
Besides San Francisco, only a few mid- and large-size Bay Area cities have seen home sales rise from last year, with sales in Vacaville and Oakland rising by about 10% and 5%, respectively. But both cities are much further from their pre-pandemic sales numbers than San Francisco.
-
Tools & Platforms3 weeks ago
Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks
-
Ethics & Policy1 month ago
SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية
-
Business2 days ago
The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial
-
Events & Conferences3 months ago
Journey to 1000 models: Scaling Instagram’s recommendation system
-
Jobs & Careers2 months ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Funding & Business2 months ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Education2 months ago
VEX Robotics launches AI-powered classroom robotics system
-
Podcasts & Talks2 months ago
Happy 4th of July! 🎆 Made with Veo 3 in Gemini
-
Podcasts & Talks2 months ago
OpenAI 🤝 @teamganassi
-
Jobs & Careers2 months ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle