Tools & Platforms

We have let down teens if we ban social media but embrace AI

Published

3 days ago

September 3, 2025

If you are in your 70s, you didn’t fight in the second world war. Such a statement should be uncontroversial, given that even the oldest septuagenarian today was born after the war ended. But there remains a cultural association between this age group and the era of Vera Lynn and the Blitz.

A similar category error exists when we think about parents and technology. Society seems to have agreed that social media and the internet are unknowable mysteries to parents, so the state must step in to protect children from the tech giants, with Australia releasing details of an imminent ban. Yet the parents of today’s teenagers are increasingly millennial digital natives. Somehow, we have decided that people who grew up using MySpace or Habbo Hotel are today unable to navigate how their children use TikTok or Fortnite.

Simple tools to restrict children’s access to the internet already exist, from adjusting router settings to requiring parental permission to install smartphone apps, but the consensus among politicians seems to be that these require a PhD in electrical engineering, leading to blanket illiberal restrictions. If you customised your Facebook page while at university, you should be able to tweak a few settings. So, rather than asking everyone to verify their age and identify themselves online, why can’t we trust parents to, well, parent?

“
If you customised your Facebook page at university, you should be able to tweak a few settings
“

Failing to keep up with generational shifts could also result in wider problems. As with the pensioners we’ve bumped from serving in Vietnam to storming Normandy, there is a danger in focusing on the wrong war. While politicians crack down on social media, they rush to embrace AI built on large language models, and yet it is this technology that will have the largest effect on today’s teens, not least as teachers wonder how they will be able to set ChatGPT-proof homework.

Rather than simply banning things, we need to be encouraging open conversations about social media, AI and any future technologies, both across society and within families.

Topics:

Source link

Tools & Platforms

Nvidia Is Not Happy With the Gain AI Act, Says As Much

Published

1 hour ago

September 6, 2025

Riley Gutiérrez McDermid

In a move drawing considerable attention across the tech industry, Nvidia Corporation has publicly critiqued the recently proposed Gain AI Act, emphasizing its potential to stifle competition in the rapidly evolving artificial intelligence sector.

The GAIN AI Act, which stands for Guaranteeing Access and Innovation for National Artificial Intelligence Act, was introduced as part of the U.S. National Defense Authorization Act, with the goal of ensuring that the United States is the dominant market force for AI.

It has not yet passed and remains a hotly debated policy topic both here and abroad because of the restrictions it looks to enact.

Backers say it aims to protect American market interests by prioritizing domestic orders for advanced AI chips and processors, as well as secure supply chains for critical AI hardware, and theoretically reduce our reliance on foreign manufacturers.

So it’s no huge surprise that Nvidia, a Chinese corporation and currently the world’s biggest company, would take aim at a law that might potentially restrict the competitiveness of foreign technology.

The company said as much during a recent industry forum.

“We never deprive American customers in order to serve the rest of the world. In trying to solve a problem that does not exist, the proposed bill would restrict competition worldwide in any industry that uses mainstream computing chips,” an Nvidia spokesperson said.

Is the Gain AI Act a good idea for innovation?

It depends on who you ask.

Essentially, the law seeks to strengthen national security and economic competitiveness by ensuring that key AI components remain accessible to American companies and government agencies before they are supplied abroad.

Its language takes a hard line on what the priority should be for the United States government.

“It should be the policy of the United States and the Department of Commerce to deny licenses for the export of the most powerful AI chips, including such chips with total processing power of 4,800 or above and to restrict the export of advanced artificial intelligence chips to foreign entities so long as United States entities are waiting and unable to acquire those same chips,” the legislation reads.

Nvidia’s critique reflects broader industry anxieties about regulatory environments that might hinder innovation. As global competition intensifies, particularly with formidable advances in AI from regions such as China, firms like Nvidia are closely watching how regulatory frameworks are taking shape abroad.

But it’s not just foreign companies. American market players, too, have said it could hit many domestic operations hard.

“Advanced AI chips are the jet engine that is going to enable the U.S. AI industry to lead for the next decade,” Brad Carson, president of Americans for Responsible Innovation (ARI), a lobbying group for the AI industry, said in a widely distributed statement.

“Globally, these chips are currently supply-constrained, which means that every advanced chip sold abroad is a chip the U.S. cannot use to accelerate American R&D and economic growth,” Carson said. “As we compete to lead on this dual-use technology, including the GAIN AI Act in the NDAA would be a major win for U.S. economic competitiveness and national security.”

‘Doomer science fiction’

Nvidia didn’t stop there. It then took aim at an earlier attempt to make the U.S. more competitive in the chipmaker market, a policy called the AI Diffusion Rule, which ultimately failed.

The company minced no words in a follow-up statement, saying that the past attempts by legislators to control market forces based on protectionist policies was ultimately a bad idea.

“The AI Diffusion Rule was a self-defeating policy, based on doomer science fiction, and should not be revived,” it read.

“Our sales to customers worldwide do not deprive U.S. customers of anything—and in fact expand the market for many U.S. businesses and industries,” it said. “The pundits feeding fake news to Congress about chip supply are attempting to overturn President Trump’s AI Action Plan and surrender America’s chance to lead in AI and computing worldwide.”

The challenge will be creating laws that are as dynamic as the technologies they aim to govern, fostering a climate where innovation and ethical accountability are not mutually exclusive, but rather mutually reinforcing.

We’ve tried this before

Nvidia’s mention of the AI Diffusion rule was no accident. That ill-fated policy had many of the same political goals but ultimately stumbled at the finish line and was a relatively toothless attempt to rein in some of the world’s most competitive companies.

The Biden administration’s AI Diffusion rule, enacted in January 2025, represented a significant shift in U.S. export controls targeting cutting-edge artificial intelligence technology.

Designed to curb the spread of advanced AI tools to rival nations, the regulation mandated licensing for the sale of high-end AI chips and imposed strict caps on computing power accessible to foreign recipients. Its goal was to slow the diffusion of sensitive AI capabilities that could enhance military or strategic applications abroad.

However, the Trump-era approach to export controls, which focused on a more targeted, bilateral framework, was poised to replace the Biden administration’s broader strategy.

President Trump had announced plans to rescind the AI Diffusion rule, criticizing it as overly bureaucratic and potentially hindering U.S. innovation. Instead, his administration favored engaging in country-specific agreements to control export practices, aiming for a more adaptable, case-by-case approach.

Though the AI Diffusion rule was ultimately rolled back, the Bureau of Industry and Security (BIS) signaled a renewed emphasis on enforcing existing regulations. The agency issued a notice reinforcing actions against companies with a “high probability” of violations, warning that increased scrutiny would be applied to entities with knowledge of potential breaches.

Whether this latest attempt to advance American interests meets a similar fate remains to be seen.

Source link

Tools & Platforms

Why Do Language Models Hallucinate? OpenAI Scientists Say LLMs Rewarded For Being Too Cocky

Published

1 hour ago

September 6, 2025

Matt Swayne

Insider Brief

An OpenAI team of scientists report that language models hallucinate because their training and evaluation processes reward confident guesses over admitting uncertainty.
Hallucinations are predictable statistical errors that arise during pretraining and persist because benchmarks penalize responses that express doubt.
Fixing the problem requires changing mainstream evaluations to credit uncertainty, aligning incentives toward more trustworthy AI systems.

Stopping language models from hallucinating starts with figuring out why they hallucinate in the first place.

Now, a new study from researchers at OpenAI and Georgia Tech finds that language models often generate confident falsehoods because they are rewarded for guessing rather than admitting uncertainty. The paper, published in the pre-print server arXiv, argues that hallucinations — plausible but incorrect statements — are not mysterious quirks of artificial intelligence but predictable outcomes of the way these systems are trained and evaluated.

The researchers frame hallucinations as a statistical inevitability, drawing on decades of computational learning theory. During training, language models learn from massive text corpora using a method called density estimation: predicting the probability of sequences of words. Even with perfect training data, the statistical objectives used in this process guarantee that errors will appear.

Don’t get too judgmental. We’ve all been guilty of using a similar strategy, the OpenAI team suggests in a recent blog post on the research.

“Think about it like a multiple-choice test,” the team writes in the post. “If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say ‘I don’t know.’”

The researchers liken this to a binary classification problem, which means, if a model is asked to decide whether a given sentence is valid or erroneous, misclassifications are unavoidable. When applied to language generation, these misclassifications surface as hallucinations. For instance, when prompted with questions about obscure birthdays or dissertation titles, models produce wrong but confident answers because the correct information either appears only once in training data or not at all, according to the study.

The paper introduces the idea of a “singleton rate,” or the proportion of facts that appear only once in the training set. This measure predicts hallucination frequency: if 20% of birthdays are singletons, then models are expected to hallucinate on about 20% of such queries.

Why Training Encourages Mistakes

Errors also persist because language models are designed to be calibrated, according to the team. A well-calibrated model assigns probabilities to responses that reflect their likelihood, much like a weather forecast. But this calibration creates a trade-off: a model that always refuses to answer would avoid mistakes, but it would fail as a language generator. To balance usefulness with accuracy, models inevitably produce some errors.

Beyond statistical limits, models struggle with poor representations. For example, older systems such as trigram models, which predict each word based only on the previous two words, misfired on grammar, while modern transformer models can still falter on tasks like counting letters when text is tokenized into larger chunks. Distribution shifts—where test inputs differ from training data — and the presence of errors in the training corpus itself (“garbage in, garbage out”) further amplify hallucination risks, according to the study.

The second phase of model development, known as post-training, is meant to refine outputs and reduce hallucinations. Techniques such as reinforcement learning from human or AI feedback attempt to align models with human expectations. Yet, the study argues, post-training often makes hallucinations worse.

The reason lies in how benchmarks evaluate model performance. Most benchmarks use binary grading, in other words, scoring answers as simply right or wrong. Under this scheme, admitting uncertainty with “I don’t know” is penalized as harshly as providing no answer at all. Models, like students gaming multiple-choice exams, learn that guessing maximizes their expected score.

This creates what the researchers call an “epidemic” of overconfidence. A model that always provides an answer, even when wrong, will outperform a more cautious system under prevailing test rules, the researchers report.

The study catalogues widely used benchmarks such as MMLU-Pro, GPQA, and SWE-bench, finding that most either lack a mechanism for crediting uncertainty or explicitly penalize it. Even new hallucination-specific benchmarks have struggled to gain traction because mainstream leaderboards still rely on binary accuracy metrics.

This misalignment means that companies optimizing for leaderboard performance inadvertently reinforce hallucinatory behavior. A model trained to signal uncertainty truthfully would lose ground in rankings to one that guesses confidently. In practice, the incentive structure tilts toward bluffing.

Proposed Solutions

The team argues that hallucination cannot be solved by adding new evaluations alone. Instead, they call for modifying mainstream benchmarks to incorporate “confidence targets.” Under this system, models would be explicitly told that uncertain answers are acceptable and even preferable in some cases.

For example, a benchmark might specify: only answer if you are more than 75% confident, otherwise respond with “I don’t know.” Correct answers earn a point, wrong answers lose two points, and abstentions neither gain nor lose. This mirrors certain standardized human exams that penalize wrong guesses more heavily than skipped questions.

Such changes would encourage what the researchers call “behavioral calibration”—the habit of expressing uncertainty appropriately. Unlike probabilistic confidence scores, which can be unwieldy in natural language, behavioral calibration emphasizes practical, human-like communication of doubt.

Limits of the Framework

The study acknowledges its limitations. It focuses on plausible falsehoods rather than nonsensical strings, and it simplifies open-ended generation into binary categories of valid or erroneous. The framework does not account for subtle pragmatic factors such as hedging or asking clarifying questions.

It also cautions that retrieval-augmented models, which use search engines to ground responses, are not immune. According to the researchers, if benchmarks still penalize uncertainty, even these systems will prefer to guess when search fails. Similarly, reasoning-based models that chain through logical steps may reduce certain errors but remain vulnerable to the same incentive misalignments.

What Comes Next

Future work will likely focus on refining evaluation schemes, experimenting with explicit confidence penalties across popular benchmarks, and studying how users respond to models that admit uncertainty more often. There is also interest in developing richer forms of pragmatic competence, enabling models to hedge or ask clarifying questions instead of presenting false facts.

The study suggests that hallucinations are not evidence of a fundamental flaw in large language models but rather artifacts of the systems built around them. With recalibrated incentives, the researchers argue, AI can become more reliable partners—less like students bluffing on exams and more like cautious collaborators who know when to say, “I don’t know.”

On a broader level, the findings highlight a fundamental tension in artificial intelligence development: the push for models to appear competent under evaluation clashes with the need for trustworthy systems in real-world use. By rewarding guessing, the field has inadvertently created machines that bluff.

Correcting this requires a socio-technical shift, not just better algorithms. Benchmarks and leaderboards — the currency of AI progress — must evolve to reward honesty about uncertainty. Without such reforms, hallucinations will remain entrenched, regardless of technical improvements in architecture or training scale.

For a deeper, more technical dive, please review the paper on arXiv. It’s important to note that arXiv is a pre-print server, which allows researchers to receive quick feedback on their work. However, it is not — nor is this article, itself — official peer-review publications. Peer-review is an important step in the scientific process to verify the work.

The research team included am Tauman Kalai, Ofir Nachum and Edwin Zhang, all of Open AI and Santosh S. Vempala, from Georgia Tech.

Source link

Tools & Platforms

Australian government must resist temptation to build sovereign artificial intelligence to compete against tech giants

Published

1 hour ago

September 6, 2025

Richard Holden

I keep thinking about a scene from the movie The Social Network. Facebook founders Mark Zuckerberg and Eduardo Saverin meet serial entrepreneur Sean Parker (of Napster infamy and played by Justin Timberlake) at a New York restaurant.

Saverin asks Parker to “settle an argument” about whether it’s “time to monetise the site”. Parker cautions against contaminating Facebook with ads, observing: “You don’t even know what the thing is yet, how big it can get, how far it can go.”

Loading…

Source link