Tools & Platforms

It’s Time for Humanity’s Best Exam for AI

Published

2 months ago

July 19, 2025

Better benchmarks can unlock the social benefits of AI technology.

The rhythm of artificial intelligence (AI) development has become unsettlingly familiar. A new model is unveiled, and with it comes a predictable flurry of media attention. One cluster of articles dissects its intricate training data and architecture; another marvels, often breathlessly, at its newfound capabilities; and a third, almost inevitably, scrutinizes its performance on a battery of standardized tests. These benchmarks have become our primary yardsticks for AI progress. Yet, they predominantly paint a picture skewed toward raw technical prowess and potential peril, leaving the public with a pervasive feeling that each impressive step forward for AI might translate into two regrettable steps back for the rest of us.

Many of these evaluations concentrate on the technical capacity of the model or its computational horsepower. Others, with growing urgency, assess the likelihood of misuse—could this advanced AI empower rogue actors to design a bioweapon or destabilize critical infrastructure through sophisticated cyberattacks? A significant portion of evaluations also measures AI against human performance in specific job tasks, fueling widespread anxieties about automation and diminished human agency. The reporting on these tests, frequently framed by alarming headlines, understandably casts AI advancements more as a societal regression than a leap forward. The very branding of prominent benchmarks, such as the ominously titled Humanity’s Last Exam, amplifies these negative connotations. That benchmark and others like it tend to measure a model’s capacity to complete bespoke tests, aid bad actors engaging in harmful conduct, or some combination of the two. It is difficult, if not impossible, to read coverage of such an assessment and come away with a hopeful, or even neutral, view of AI’s trajectory.

This is not to argue that assessing risks or understanding the deep mechanics of AI is unimportant. Vigilance and technical scrutiny are crucial components of responsible development. The current benchmarking landscape, however, is dangerously imbalanced. Those of us who recognize AI’s immense transformative potential to address some of the world’s most intractable problems—including revolutionizing medical diagnostics, accelerating climate solutions, and personalizing education for every child—currently lack a prominent, public-facing benchmark designed to track, celebrate, and encourage these positive developments.

It is time we introduce “Humanity’s Best Exam”—a benchmark that strives to capture a model’s capacity to address public policy problems and otherwise serve the general welfare.

Imagine a new form of evaluation that challenges AI systems not with abstract logic puzzles but with tangible goals vital to human flourishing. Consider a benchmark that tasks AI models with identifying early-stage diabetic retinopathy from retinal scans with over 95 percent accuracy, a leap that could surpass current screening efficacy and save millions from preventable blindness. Picture a test that spurs the design of three novel antibiotic compounds that are effective against stubborn, drug-resistant bacteria within a single year. In the realm of climate science, Humanity’s Best Exam might push AI to develop a groundbreaking, cost-effective catalyst for the direct air capture of carbon dioxide, improving efficiency by a significant margin—say, 20 percent—over existing technologies. Or it could encourage the creation of predictive models for localized flash floods that offer vulnerable regions a critical six-hour lead time with 90 percent accuracy. Or, in education, the challenge could be to generate personalized six-month learning plans for diverse student profiles in foundational STEM subjects, demonstrably elevating learning outcomes by an average of two grade levels.

The creation and widespread adoption of Humanity’s Best Exam would serve several critical, society-shaping purposes.

First, it would powerfully harness the intense competitive spirit of AI laboratories for the global good. AI developers are profoundly motivated by benchmark performance—the race to the top of the leaderboards is fierce. Channeling this potent drive toward solving clearly defined societal problems could positively redirect research priorities and resource allocation within these influential organizations.

Second, such a benchmark would be instrumental in reshaping the public discourse surrounding artificial intelligence. The narrative around any powerful new technology is inevitably shaped by the information that is most readily available and most prominently featured. If the most visible AI assessments continue to highlight dangers and disruptions, public perception will remain tinged with fear and skepticism. Humanity’s Best Exam would provide a steady stream of positive, concrete examples of AI’s potential, offering a more balanced and hopeful counter-narrative. This perspective is essential for fostering a more informed and constructive public conversation, which is, in turn, vital for democratic oversight of this transformative technology.

Finally, a benchmark focused on positive societal impact would provide invaluable guidance for policymakers, investors, and researchers. As a law professor whose research centers on accelerating AI innovation through thoughtful legal and policy reforms, I see a pressing need for clearer signals to guide governance away from reactive, fear-driven legislation and toward proactive, enabling frameworks. Humanity’s Best Exam would illuminate areas where AI is poised to deliver significant societal returns, helping policymakers to direct strategic funding more effectively and to develop supportive, rather than stifling, regulatory environments. Investors would gain a clearer view of emerging opportunities where AI can create substantial financial and social value. Researchers across numerous disciplines could more easily identify how cutting-edge AI capabilities can be leveraged within their fields, potentially sparking new collaborations and accelerating vital research.

But who would build and oversee such an ambitious undertaking, and how could we navigate the inherent challenges? The establishment of Humanity’s Best Exam would necessitate a dedicated, independent, and broadly representative multi-stakeholder governing consortium. This body should ideally include experts from leading academic institutions, established nonprofits with proven experience in managing “grand challenges”—akin to the XPrize Foundation model that involves hosting competitions to achieve societally beneficial breakthroughs—relevant international organizations, domain specialists from fields such as public health, environmental science, and education, as well as ethicists and, critically, representatives from civil society organizations to ensure public accountability. Funding could be drawn from a diverse portfolio, including major philanthropic sources, government grants earmarked for scientific and societal advancement, and perhaps even a coalition of AI laboratories and technology firms committed to socially beneficial AI development.

To address the valid concern that defining “societal benefit” can be subjective, a primary task for this consortium would be to establish a transparent and evolving framework for identifying and prioritizing challenge areas, perhaps drawing inspiration from established global agendas such as the United Nation’s Sustainable Development Goals. The specific tasks within the benchmark would need to be rigorously defined, objectively measurable, and, crucially, regularly updated by diverse expert panels. This dynamism is key to preventing the benchmark from becoming stale, to avoiding the pitfalls of “teaching to the test” in a way that stifles genuine innovation, and to ensuring continued relevance as AI capabilities and societal needs evolve. Although no benchmark can ever be entirely immune to attempts at superficial optimization, focusing on complex, real-world problems with multifaceted success criteria makes simplistic gaming far more difficult than it is on narrower, purely technical tests. Furthermore, a portion of the assessment could incorporate qualitative reviews by expert panels, evaluating the robustness, safety, ethical considerations, and real-world applicability of the proposed AI tools.

The current, almost myopic focus on AI’s potential downsides, although born of a necessary caution, is inadvertently creating an innovation ecosystem shrouded in anxiety. We are meticulously documenting every conceivable way AI could go wrong, while failing to champion, encourage, and measure systematically its profound potential to go spectacularly right.

It is time to correct this imbalance. A crucial first step would be for leading philanthropic organizations, forward-thinking academic consortia, and ethically minded AI developers to convene a foundational summit. The purpose of such a gathering would be to begin outlining the charter, initial problem sets, and robust governance structure for Humanity’s Best Exam. This is far more than a mere intellectual exercise; it is a necessary reorientation of our collective focus and a deliberate effort to harness the awesome power of artificial intelligence for the betterment of all. Let us not only brace for AI’s potential last exam but actively architect its very best.

Source link

Related Topics:AI AI regulation Artificial intelligence

Up Next

Redefining Support with AI and Human Assistants

Don't Miss

How China’s open-source AI is helping DeepSeek, Alibaba take on Silicon Valley

The Regulatory Review

Click to comment

Tools & Platforms

How Malawi is taking AI technology to small-scale farmers who don’t have smartphones

Published

1 hour ago

September 13, 2025

ABC News

MULANJE, Malawi — Alex Maere survived the destruction of Cyclone Freddy when it tore through southern Malawi in 2023. His farm didn’t.

The 59-year-old saw decades of work disappear with the precious soil that the floods stripped from his small-scale farm in the foothills of Mount Mulanje.

He was used to producing a healthy 850 kilograms (1,870 pounds) of corn each season to support his three daughters and two sons. He salvaged just 8 kilograms (17 pounds) from the wreckage of Freddy.

“This is not a joke,” he said, remembering how his farm in the village of Sazola became a wasteland of sand and rocks.

Freddy jolted Maere into action. He decided he needed to change his age-old tactics if he was to survive.

He is now one of thousands of small-scale farmers in the southern African country using a generative AI chatbot designed by the non-profit Opportunity International for farming advice.

The Malawi government is backing the project, having seen the agriculture-dependent nation hit recently by a series of cyclones and an El Niño-induced drought. Malawi’s food crisis, which is largely down to the struggles of small-scale farmers, is a central issue for its national elections next week.

More than 80% of Malawi’s population of 21 million rely on agriculture for their livelihoods and the country has one of the highest poverty rates in the world, according to the World Bank.

The AI chatbot suggested Maere grow potatoes last year alongside his staple corn and cassava to adjust to his changed soil. He followed the instructions to the letter, he said, and cultivated half a soccer field’s worth of potatoes and made more than $800 in sales, turning around his and his children’s fortunes.

“I managed to pay for their school fees without worries,” he beamed.

Artificial intelligence has the potential to uplift agriculture in sub-Saharan Africa, where an estimated 33-50 million smallholder farms like Maere’s produce up to 70-80% of the food supply, according to the U.N.’s International Fund for Agricultural Development. Yet productivity in Africa — with the world’s fast-growing population to feed — is lagging behind despite vast tracts of arable land.

As AI’s use surges across the globe, so it is helping African farmers access new information to identify crop diseases, forecast drought, design fertilizers to boost yields, and even locate an affordable tractor. Private investment in agriculture-related tech in sub-Saharan Africa went from $10 million in 2014 to $600 million in 2022, according to the World Bank.

But not without challenges.

Africa has hundreds of languages for AI tools to learn. Even then, few farmers have smartphones and many can’t read. Electricity and internet service are patchy at best in rural areas, and often non-existent.

“One of the biggest challenges to sustainable AI use in African agriculture is accessibility,” said Daniel Mvalo, a Malawian technology specialist. “Many tools fail to account for language diversity, low literacy and poor digital infrastructure.”

The AI tool in Malawi tries to do that. The app is called Ulangizi, which means advisor in the country’s Chichewa language. It is WhatsApp-based and works in Chichewa and English. You can type or speak your question, and it replies with an audio or text response, said Richard Chongo, Opportunity International’s country director for Malawi.

“If you can’t read or write, you can take a picture of your crop disease and ask, ‘What is this?’ And the app will respond,” he said.

But to work in Malawi, AI still needs a human touch. For Maere’s area, that is the job of 33-year-old Patrick Napanja, a farmer support agent who brings a smartphone with the app for those who have no devices. Chongo calls him the “human in the loop.”

“I used to struggle to provide answers to some farming challenges, now I use the app,” said Napanja.

Farmer support agents like Napanja generally have around 150-200 farmers to help and try to visit them in village groups once a week. But sometimes, most of an hour-long meeting is taken up waiting for responses to load because of the area’s poor connectivity, he said. Other times, they have to trudge up nearby hills to get a signal.

They are the simple but stubborn obstacles millions face taking advantage of technology that others have at their fingertips.

For African farmers living on the edge of poverty, the impact of bad advice or AI “hallucinations” can be far more devastating than for those using it to organize their emails or put together a work presentation.

Mvalo, the tech specialist, warned that inaccurate AI advice like a chatbot misidentifying crop diseases could lead to action that ruins the crop as well as a struggling farmer’s livelihood.

“Trust in AI is fragile,” he said. “If it fails even once, many farmers may never try it again.”

The Malawian government has invested in Ulangizi and it is programmed to align with the agriculture ministry’s own official farming advice, making it more relevant for Malawians, said Webster Jassi, the agriculture extension methodologies officer at the ministry.

But he said Malawi faces challenges in getting the tool to enough communities to make an extensive difference. Those communities don’t just need smartphones, but also to be able to afford internet access.

For Malawi, the potential may be in combining AI with traditional collaboration among communities.

“Farmers who have access to the app are helping fellow farmers,” Jassi said, and that is improving productivity.

___

For more on Africa and development: https://apnews.com/hub/africa-pulse

The Associated Press receives financial support for global health and development coverage in Africa from the Gates Foundation. The AP is solely responsible for all content. Find AP’s standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org.

Source link

Tools & Platforms

AI-Driven IT Modernization for Cost Savings

Published

1 hour ago

September 13, 2025

Name

In the rapidly evolving world of technology, Dell Technologies Inc. is embarking on a ambitious internal transformation known as Project Maverick, aimed at revamping its systems to better harness artificial intelligence. This initiative, shrouded in secrecy since its inception in 2024, seeks to modernize the company’s sprawling IT infrastructure, which has long been criticized for inefficiency and outdated processes. Internal documents reveal a comprehensive strategy that includes migrating legacy systems to cloud-based platforms and integrating AI tools to streamline operations across the board.

At the heart of Project Maverick is a push to eliminate silos within Dell’s vast organization, which employs tens of thousands worldwide. The plan outlines the creation of a unified data architecture that would allow seamless access to information, enabling faster decision-making and innovation. Executives involved in the project emphasize that this overhaul is not just about technology but about fostering a culture of agility in an era where AI is reshaping competitive dynamics.

Unveiling the Strategic Vision

According to details from Business Insider, which obtained exclusive internal documents, Project Maverick involves a multi-year timeline with phased implementations starting in early 2025. The initiative allocates significant resources to AI integration, including the adoption of machine learning models for predictive analytics in supply chain management and customer service. This move comes as Dell, a major player in hardware and services, faces pressure to keep pace with rivals like Hewlett Packard Enterprise Co. and IBM Corp., who have already made strides in AI-centric infrastructures.

One key component is the overhaul of Dell’s enterprise resource planning systems, which are currently fragmented across different business units. The documents highlight plans to consolidate these into a single, AI-enhanced platform that can automate routine tasks and provide real-time insights. Insiders note that this could reduce operational costs by up to 20%, based on preliminary projections, while boosting employee productivity through intelligent automation.

Challenges in Implementation

The project isn’t without hurdles. Legacy systems, some dating back decades, pose significant migration challenges, including data compatibility issues and potential downtime risks. Business Insider reports that Dell has assembled a cross-functional team of over 500 engineers and strategists to tackle these obstacles, drawing on expertise from recent acquisitions and partnerships with AI leaders like NVIDIA Corp.

Moreover, cultural resistance within the organization could slow progress. Long-time employees accustomed to traditional workflows may need extensive training to adapt to new AI-driven tools. The plan includes robust change management programs, such as workshops and pilot programs, to ensure buy-in at all levels.

Broader Implications for the Industry

Beyond Dell’s walls, Project Maverick signals a broader trend among tech giants to internalize AI capabilities for competitive advantage. As noted in related coverage from Forbes, Dell’s focus on hybrid AI infrastructures offers choice and scalability, which could influence how other firms approach their own transformations. This initiative aligns with Dell’s public announcements at events like Dell Technologies World 2025, where the company showcased its AI Factory concept for enterprise use.

Financially, the overhaul is expected to drive long-term growth. Analysts project that successful implementation could enhance Dell’s margins in its infrastructure solutions group, which has seen a surge in AI server sales. The Next Platform highlights how Dell’s datacenter business now outpaces its PC segment, fueled by AI demand, underscoring the strategic importance of Project Maverick.

Looking Ahead to AI Dominance

As Dell pushes forward with this secretive plan, industry watchers are keenly observing its outcomes. The integration of AI into core systems could position Dell as a leader in providing end-to-end AI solutions, from hardware to software ecosystems. However, the true test will be in execution, balancing innovation with operational stability.

In conclusion, Project Maverick represents a pivotal step for Dell in navigating the AI-driven future. By addressing internal inefficiencies head-on, the company aims to emerge stronger, more adaptive, and ready to capitalize on emerging opportunities in technology’s next frontier.

Source link

Tools & Platforms

White House AI Task Force Positions AI as Top Education Priority

Published

2 hours ago

September 12, 2025

Julia Gilban-Cohen

When Trump administration officials met with ed-tech leaders at the White House last week to discuss the nation’s vision for artificial intelligence in American life, they repeatedly underscored one central message: Education must be at the heart of the nation’s AI strategy.

Established by President Trump’s April 2025 executive order, the White House Task Force on AI Education is chaired by director of science and technology policy Michael Kratsios, and is tasked with promoting AI literacy and proficiency among America’s youth and educators, organizing a nationwide AI challenge and forging public-private partnerships to provide AI education resources to K-12 students.

“The robots are here. Our future is no longer science fiction,” First Lady Melania Trump said in opening remarks. “But, as leaders and parents, we must manage AI’s growth responsibly. During this primitive stage, it is our duty to treat AI as we would our own children: empowering but with watchful guidance.”

MAINTAINING U.S. COMPETITIVENESS

In a recording of the meeting Sept. 4, multiple speakers, including Department of Agriculture Secretary Brooke Rollins and Special Advisor for AI and Crypto David Sacks, stressed that AI will define the future of U.S. work and international competitiveness, with explicit framing against rivals like China.

“The United States will lead the world in artificial intelligence, period, full stop, not China, not any of our other foreign adversaries, but America,” Rollins said in the recording. “We are making sure that our young people are ready to win that race.”

In order to do so, though, Sacks noted that K-12 and higher education systems must adapt quickly.

“AI is going to be the ultimate boost for our workers,” Sacks said. “And it is important that they learn from an early age how to use AI.”

The Department of Education signaled that federal funding will also shift to incentivize schools’ adoption of AI. Secretary Linda McMahon said applications that include AI-based solutions will be “more strongly considered” and could receive “bonus points” in the review process.

EMBRACING CHANGE MANAGEMENT

Several officials at the meeting urged schools and communities not to view AI as a threat, but as a tool for growth.

“It’s not one of those things to be afraid of,” McMahon said. “Let’s embrace it. Let’s develop AI-based solutions to real-world problems and cultivate an AI-informed, future-ready workforce.”

Secretary Chris Wright of the Department of Energy linked the success of AI adoption to larger infrastructure challenges.

“We will not win in AI if we don’t massively grow our electricity production,” he said. “Perhaps the killer app, the most important use of AI, is for education and to fix one of the greatest American shortcomings, our K-12 education system.”

WORKFORCE DEVELOPMENT

Workforce training and reskilling emerged as another priority, with Labor Secretary Lori Chavez-DeRemer describing apprenticeships and on-the-job training as essential to preparing workers for an AI-driven economy.

“On-the-job training programs will help build the mortgage-paying jobs that AI will create while also enhancing the unique skills required to succeed in various industries,” Chavez-DeRemer said. She tied these efforts to the president’s goal of 1 million new apprenticeships nationwide.

Alex Kotran, chief executive officer of the education nonprofit aiEDU, told Government Technology that members of the task force spent a notable amount of time discussing rural schools and the importance of reaching underserved students, especially in regard to preparing rural students for the modern workforce.

PRIVATE-SECTOR COMMITMENTS

In addition to White House officials, attendees included high-level technology executives and entrepreneurs committed to expanding U.S. AI education.

During the recorded meeting, IBM CEO Arvind Krishna pledged to train 2 million American workers in AI skills over the next three years, noting that “no organization can do it alone.” Similarly, Google CEO Sundar Pichai highlighted efforts to use AI to personalize learning worldwide, envisioning a future “where every student, regardless of their background or location, can learn anything in the world in a way that works best for them.”

In a recent co-authored blog post on Microsoft’s website, the company’s Vice Chair and President Brad Smith and LinkedIn CEO Ryan Roslansky said that empowering teachers and students with modern-day AI tools, continuously developing AI skills and creating economic opportunity by connecting new skills to jobs are the top priorities in U.S. AI education.

“We believe delivering on the real promise of AI depends on how broadly it’s diffused,” they wrote. “This requires investment and innovation in AI education, training, and job certification.”

In its efforts to increase exposure to educational AI tools, Microsoft committed to providing a year’s subscription to Copilot for college students free of charge, expanding access to Microsoft AI tools in schools, $1.25 million in educator grants for teachers pioneering AI-powered learning, free LinkedIn Learning AI courses, and AI training for job seekers and certifications for community colleges.

LOOKING AHEAD

In a phone call with Government Technology last week, Kotran expressed excitement following the task force meeting, which he was invited to, stating he was heartened that education appears to be taking center stage at our nation’s capital.

“The White House Task Force meeting today, I think, represents an opening to actually harness the power of the White House,” he said. “But also the federal government to just motivate all the other actors that are part of the education system to make the change that’s going to be required.”

But, he emphasized, the private sector must support educators and school leaders in their adoption of AI, considering recent cuts to education funding. The measure of whether the task force is successful, according to Kotran, will depend on if the private sector supports states in AI tools and implementation.

“It’s not going to be enough for a school to have one elective class called ‘introduction to AI,’” Kotran said. “The only chance we have to make progress on AI readiness is for companies, the private sector, philanthropies, to put resources on the table.”

Source link