Connect with us

AI Research

4 new studies about agentic AI from the MIT Initiative on the Digital Economy

Published

on


Over time, artificial intelligence tools are being given more autonomy. Beyond serving as human assistants, they are being programmed to be agents themselves — negotiating contracts, making decisions, exploring legal arguments, and so on.

This evolution raises important questions about how well AI can perform the kinds of tasks that have historically depended on human judgment. As AI takes over some tasks from people, will it demonstrate the requisite reasoning and decision-making skills?

MIT Sloan professor of management, IT, and marketing Sinan Aral and postdoctoral fellow Harang Ju have been exploring these questions and more in several areas of new research that range from how AI agents negotiate to how they can be made more flexible in their interpretation of rules. Aral is the director of the MIT Initiative on the Digital Economy, where Ju is a member of the research team. 

“A lot of people in industry and computer science research are creating fancy agents, but very few are looking at the interactions between humans and these tools,” Ju said. “That’s where we come in. That’s the theme of our work.”

“We are already well into the Agentic Age [of AI],” Aral said. “Companies are developing and deploying autonomous, multimodal AI agents in a vast array of tasks. But our understanding of how to work with AI agents to maximize productivity and performance, as well as the societal implications of this dramatic turn toward agentic AI, is nascent, if not nonexistent.

“At the MIT Initiative on the Digital Economy,” he continued, “we have doubled down on analyzing rigorous, large-scale experiments to help managers and policymakers unlock the promise of agentic AI while avoiding its pitfalls.”

Below are four recent insights from this research program, which aims to more fully explore the frontiers of AI development.

AI can be taught to handle exceptions 

In a new paper co-authored by Matthew DosSantos DiSorbo, Aral and Ju presented  people and AI alike with a simple scenario: To bake a birthday cake for a friend, you are tasked with buying flour for $10 or less. When you arrive at the store, you find that flour sells for $10.01. What do you do?

Most humans (92%) went ahead with the purchase. Almost universally, across thousands of iterations, AI models did the opposite, citing the fact that the price was too high.

“With the status quo, you tell models what to do and they do it,” Ju said. “But we’re increasingly using this technology in ways where it encounters situations in which it can’t just do what you tell it to, or where just doing that isn’t always the right thing. Exceptions come into play.” Paying an extra cent for the flour for a friend’s cake, he noted, makes sense; paying an extra cent per item does not necessarily make sense when Walmart is ordering a large number of items from suppliers.

The researchers found that providing models with information about both how and why humans opted to purchase the flour — essentially giving them insight into human reasoning — corrected this problem, giving the models a degree of flexibility. The AI models then made decisions like people, justifying their choices with comments like “It’s only a penny more” and “One cent is not going to break the bank.” The models were able to generalize this flexibility of mind to cases beyond purchasing flour for a cake, like hiring, lending, university admissions, and customer service.   

Read the working paper: Teaching AI to Handle Exceptions 

The performance of human-AI pairs depends on how the AI is designed 

How does work change when people collaborate with AI instead of with other people? Does productivity increase? Does performance improve? Do processes change?

To tackle these questions, Aral and Ju developed a new experimental platform called Pairit (formerly MindMeld), which pairs people with either another person or an AI agent to perform collaborative tasks. In one situation documented in a recent paper, participants were asked to create marketing campaigns for a real organization’s year-end annual report, including generating ad images, writing copy, and editing headlines. The entire task unfolded in a controlled and observable environment.

“We believe the Pairit platform will revolutionize AI research,” Aral said. “It injects randomness into human-AI collaboration to discover causal drivers of productivity, performance, and quality improvements in human-AI teams.” 

Aral said the scientific community can use the platform to discover process, reskilling, and intangible investment strategies that unlock productivity gains from AI, and Aral and Ju plan to make the platform freely available to researchers to study AI agents across diverse settings. 

In their study, Aral and Ju found that human-AI pairs excelled at some tasks and underperformed human-human pairs on others. Humans paired with AI were better at creating text but worse at creating images, though campaigns from both groups performed equally well when deployed in real ads on social media site X. 

Looking beyond performance, the researchers found that the actual process of how people worked changed when they were paired with AI . Communication (as measured by messages sent between partners) increased for human-AI pairs, with less time spent on editing text and more time spent on generating text and visuals. Human-AI pairs sent far fewer social messages, such as those typically intended to build rapport.

“The human-AI teams focused more on the task at hand and, understandably, spent less time socializing, talking about emotions, and so on,” Ju said. “You don’t have to do that with agents, which leads directly to performance and productivity improvements.”

As a final part of the study, the researchers varied the assigned personality of the AI agents using the Big Five personality traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism.  

The AI personality pairing experiments revealed that programming AI personalities to complement human personalities greatly enhanced collaboration. For example, conscientious humans paired with “open” AI agents improved image quality, while extroverted humans paired with “conscientious” AI agents reduced the quality of text, images, and clicks. Men and women worked better with different types of AI personalities. While men were more productive and produced better-performing ads with “agreeable” AI, they were less productive and produced lower-quality work with “neurotic” AI. Women were more productive and produced better-quality work with “neurotic” AI but were not pushed to be their best with “agreeable” AI. 

Different AI personalities also worked better in different cultures. For example, working with “extroverted” AI boosted performance among Latin American workers but degraded performance with East Asian workers, Aral said. “Neurotic” AI boosted human performance in Asia but degraded performance in Latin America and the Middle East.

Aral and Ju said these effects were “so strong and so meaningful” that they built a company, Pairium AI, “designed to build the personalization layer of the Agentic Age.” Pairium AI is building technology, like the Pairit tool, that pairs humans with different types of AI to get the most out of both humans and the AI.

Read the working paper: Collaborating with AI agents 

Negotiating with AI bots requires novel approaches 

A new paper by Aral and Ju along with three other MIT researchers — professor , doctoral student Michelle Vaccaro, and doctoral student Michael Caosun — examines how to create the most effective AI negotiation bot.

For their study, the researchers developed an international competition, attracting “300 or 400 of the world’s top negotiation experts from companies and universities to iteratively design and refine prompts for a negotiation bot,” Ju said. “This allowed us to really efficiently explore the space of negotiation strategy using AI.”

They found that bots with killer instincts — those focused exclusively on taking as much of the pie as possible — were less effective than those that expressed warmth during negotiation; the latter type was more likely to keep counterparts at the table and thus more likely to reach a deal.

That said, to capture value in the process of negotiation, bots had to possess a degree of dominance alongside their warmth; warmth alone was a losing strategy. The most successful bot negotiators thus confirmed fundamental principles in existing negotiation theory.

The competition also revealed novel tactics that apply only to AI bots — things like prompt injection, in which one bot pushes another bot to reveal its negotiation strategy. Given this, the researchers noted that a new theory of negotiation that pertains specifically to AI must be developed alongside theory previously developed around how humans negotiate with each other.

Read the working paper: Advancing AI negotiations 

Trust varies in AI search results 

It is well known that generative AI sometimes “hallucinates” by inventing information in response to questions. Yet generative AI is an increasingly popular tool applied to internet searches. New research by Aral and MIT Sloan PhD student Haiwen Li studied how much trust people place in results returned by generative AI. They found that on average, people trust conventional search results more than those produced by generative AI — though these levels of trust vary by demographics. People with a college degree or higher, those who work in the tech sector, and Republicans tend to place more trust in generative AI.

The researchers also explored how different interventions affect this trust. When a generative AI search provides reference links for its results, people trust the tool more, even if those links have been fabricated. Offering information about how the models work boosts trust as well. However, the practice of “uncertainty highlighting,” where the model highlights information in different colors depending on its confidence in the result, decreases trust in results. 

Levels of trust, in turn, are related to a person’s willingness to share that information with others: More trust indicates a greater willingness to share.

Read the working paper: Human Trust in AI Search 


AI Executive Academy

In person at MIT Sloan




Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Australia’s China AI quandary is a dealmaker’s opportunity

Published

on


It is not surprising that reactions to Chinese ambassador Xiao Qian’s suggestion that Australia and China cooperate more on artificial intelligence as part of an expanded Free Trade Agreement have been hawkish. However, it highlights the need for Australian organisations to broaden their view on the AI world.

It would take a dramatic shift in policy position for Australia to suddenly start collaborating with China on AI infrastructure such as data centres and the equipment that runs them. But it would be wrong to assume that advances in capability will always come from America first.

Loading…



Source link

Continue Reading

AI Research

Joint UT, Yale research develops AI tool for heart analysis – The Daily Texan

Published

on


A study published on June 23 in collaboration with UT and Yale researchers developed an artificial intelligence tool capable of performing and analyzing the heart using echocardiography. 

The app, PanEcho, can analyze echocardiograms, or pictures of the heart, using ultrasounds. The tool was developed and trained on nearly one million echocardiographic videos. It can perform 39 echocardiographic tasks and accurately detect conditions such as systolic dysfunction and severe aortic stenosis.

“Our teammates helped identify a total of 39 key measurements and labels that are part of a complete echocardiographic report — basically what a cardiologist would be expected to report on when they’re interpreting an exam,” said Gregory Holste, an author of the study and a doctoral candidate in the Department of Electrical and Computer Engineering. “We train the model to predict those 39 labels. Once that model is trained, you need to evaluate how it performs across those 39 tasks, and we do that through this robust multi site validation.” 

Holste said out of the functions PanEcho has, one of the most impressive is its ability to measure left ventricular ejection fraction, or the proportion of blood the left ventricle of the heart pumps out, far more accurately than human experts. Additionally, Holste said PanEcho can analyze the heart as a whole, while humans are limited to looking at the heart from one view at a time. 

“What is most unique about PanEcho is that it can do this by synthesizing information across all available views, not just curated single ones,” Holste said. “PanEcho integrates information from the entire exam — from multiple views of the heart to make a more informed, holistic decision about measurements like ejection fraction.” 

PanEcho is available for open-source use to allow researchers to use and experiment with the tool for future studies. Holste said the team has already received emails from people trying to “fine-tune” the application for different uses. 

“We know that other researchers are working on adapting PanEcho to work on pediatric scans, and this is not something that PanEcho was trained to do out of the box,” Holste said. “But, because it has seen so much data, it can fine-tune and adapt to that domain very quickly. (There are) very exciting possibilities for future research.”



Source link

Continue Reading

AI Research

Google launches AI tools for mental health research and treatment

Published

on


Google announced two new artificial intelligence initiatives on July 7, 2025, designed to support mental health organizations in scaling evidence-based interventions and advancing research into anxiety, depression, and psychosis treatments.

The first initiative involves a comprehensive field guide developed in partnership with Grand Challenges Canada and McKinsey Health Institute. According to the announcement from Dr. Megan Jones Bell, Clinical Director for Consumer and Mental Health at Google, “This guide offers foundational concepts, use cases and considerations for using AI responsibly in mental health treatment, including for enhancing clinician training, personalizing support, streamlining workflows and improving data collection.”

The field guide addresses the global shortage of mental health providers, particularly in low- and middle-income countries. According to analysis from the McKinsey Health Institute cited in the document, “closing this gap could result in more years of life for people around the world, as well as significant economic gains.”

Summary

Who: Google for Health, Google DeepMind, Grand Challenges Canada, McKinsey Health Institute, and Wellcome Trust, targeting mental health organizations and task-sharing programs globally.

What: Two AI initiatives including a practical field guide for scaling mental health interventions and a multi-year research investment for developing new treatments for anxiety, depression, and psychosis.

When: Announced July 7, 2025, with ongoing development and research partnerships extending multiple years.

Where: Global implementation with focus on low- and middle-income countries where mental health provider shortages are most acute.

Why: Address the global shortage of mental health providers and democratize access to quality, evidence-based mental health support through AI-powered scaling solutions and advanced research.

The 73-page guide outlines nine specific AI use cases for mental health task-sharing programs, including applicant screening tools, adaptive training interfaces, real-time guidance companions, and provider-client matching systems. These tools aim to address challenges such as supervisor shortages, inconsistent feedback, and protocol drift that limit the effectiveness of current mental health programs.

Task-sharing models allow trained non-mental health professionals to deliver evidence-based mental health services, expanding access in underserved communities. The guide demonstrates how AI can standardize training, reduce administrative burdens, and maintain quality while scaling these programs.

According to the field guide documentation, “By standardizing training and avoiding the need for a human to be involved at every phase of the process, AI can help mental health task-sharing programs effectively scale evidence-based interventions throughout communities, maintaining a high standard of psychological support.”

The second initiative represents a multi-year investment from Google for Health and Google DeepMind in partnership with Wellcome Trust. The funding, which includes research grant funding from the Wellcome Trust, will support research projects developing more precise, objective, and personalized measurement methods for anxiety, depression, and psychosis conditions.

The research partnership aims to explore new therapeutic interventions, potentially including novel medications. This represents an expansion beyond current AI applications into fundamental research for mental health treatment development.

The field guide acknowledges that “the application of AI in task-sharing models is new and only a few pilots have been conducted.” Many of the outlined use cases remain theoretical and require real-world validation across different cultural contexts and healthcare systems.

For the marketing community, these developments signal growing regulatory attention to AI applications in healthcare advertising. Recent California guidance on AI healthcare supervision and Google’s new certification requirements for pharmaceutical advertising demonstrate increased scrutiny of AI-powered health technologies.

The field guide emphasizes the importance of regulatory compliance for AI mental health tools. Several proposed use cases, including triage facilitators and provider-client matching systems, could face classification as medical devices requiring regulatory oversight from authorities like the FDA or EU Medical Device Regulation.

Organizations considering these AI tools must evaluate technical infrastructure requirements, including cloud versus edge computing approaches, data privacy compliance, and integration with existing healthcare systems. The guide recommends starting with pilot programs and establishing governance committees before full-scale implementation.

Technical implementation challenges include model selection between proprietary and open-source systems, data preparation costs ranging from $10,000 to $90,000, and ongoing maintenance expenses of 10 to 30 percent of initial development costs annually.

The initiatives build on growing evidence that task-sharing approaches can improve clinical outcomes while reducing costs. Research cited in the guide shows that mental health task-sharing programs are cost-effective and can increase the number of people treated while reducing mental health symptoms, particularly in low-resource settings.

Real-world implementations highlighted in the guide include The Trevor Project’s AI-powered crisis counselor training bot, which trained more than 1,000 crisis counselors in approximately one year, and Partnership to End Addiction’s embedded AI simulations for peer coach training.

These organizations report improved training efficiency and enhanced quality of coach conversations through AI implementation, suggesting practical benefits for established mental health programs.

The field guide warns that successful AI adoption requires comprehensive planning across technical, ethical, governance, and sustainability dimensions. Organizations must establish clear policies for responsible AI use, conduct risk assessments, and maintain human oversight throughout implementation.

According to the World Health Organization principles referenced in the guide, responsible AI in healthcare must protect autonomy, promote human well-being, ensure transparency, foster responsibility and accountability, ensure inclusiveness, and promote responsive and sustainable development.

Timeline

  • July 7, 2025: Google announces two AI initiatives for mental health research and treatment
  • January 2025California issues guidance requiring physician supervision of healthcare AI systems
  • May 2024: FDA reports 981 AI and machine learning software devices authorized for medical use
  • Development ongoing: Field guide created through 10+ discovery interviews, expert summit with 20+ specialists, 5+ real-life case studies, and review of 100+ peer-reviewed articles



Source link

Continue Reading

Trending