AI Research

Research leaders urge tech industry to monitor AI’s ‘thoughts’

Published

2 months ago

July 15, 2025

AI researchers from OpenAI, Google DeepMind, Anthropic, as well as a broad coalition of companies and nonprofit groups, are calling for deeper investigation into techniques for monitoring the so-called thoughts of AI reasoning models in a position paper published Tuesday.

A key feature of AI reasoning models, such as OpenAI’s o3 and DeepSeek’s R1, are their chains-of-thought or CoTs — an externalized process in which AI models work through problems, similar to how humans use a scratch pad to work through a difficult math question. Reasoning models are a core technology for powering AI agents, and the paper’s authors argue that CoT monitoring could be a core method to keep AI agents under control as they become more widespread and capable.

“CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions,” said the researchers in the position paper. “Yet, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make the best use of CoT monitorability and study how it can be preserved.”

The position paper asks leading AI model developers to study what makes CoTs “monitorable” — in other words, what factors can increase or decrease transparency into how AI models really arrive at answers. The paper’s authors say that CoT monitoring may be a key method for understanding AI reasoning models, but note that it could be fragile, cautioning against any interventions that could reduce their transparency or reliability.

The paper’s authors also call on AI model developers to track CoT monitorability and study how the method could one day be implemented as a safety measure.

Notable signatories of the paper include OpenAI chief research officer Mark Chen, Safe Superintelligence CEO Ilya Sutskever, Nobel laureate Geoffrey Hinton, Google DeepMind cofounder Shane Legg, xAI safety adviser Dan Hendrycks, and Thinking Machines co-founder John Schulman. First authors include leaders from the UK AI Security Institute and Apollo Research, and other signatories come from METR, Amazon, Meta, and UC Berkeley.

The paper marks a moment of unity among many of the AI industry’s leaders in an attempt to boost research around AI safety. It comes at a time when tech companies are caught in a fierce competition — which has led Meta to poach top researchers from OpenAI, Google DeepMind, and Anthropic with million-dollar offers. Some of the most highly sought-after researchers are those building AI agents and AI reasoning models.

“We’re at this critical time where we have this new chain-of-thought thing. It seems pretty useful, but it could go away in a few years if people don’t really concentrate on it,” said Bowen Baker, an OpenAI researcher who worked on the paper, in an interview with TechCrunch. “Publishing a position paper like this, to me, is a mechanism to get more research and attention on this topic before that happens.”

OpenAI publicly released a preview of the first AI reasoning model, o1, in September 2024. In the months since, the tech industry was quick to release competitors that exhibit similar capabilities, with some models from Google DeepMind, xAI, and Anthropic showing even more advanced performance on benchmarks.

However, there’s relatively little understood about how AI reasoning models work. While AI labs have excelled at improving the performance of AI in the last year, that hasn’t necessarily translated into a better understanding of how they arrive at their answers.

Anthropic has been one of the industry’s leaders in figuring out how AI models really work — a field called interpretability. Earlier this year, CEO Dario Amodei announced a commitment to crack open the black box of AI models by 2027 and invest more in interpretability. He called on OpenAI and Google DeepMind to research the topic more, as well.

Early research from Anthropic has indicated that CoTs may not be a fully reliable indication of how these models arrive at answers. At the same time, OpenAI researchers have said that CoT monitoring could one day be a reliable way to track alignment and safety in AI models.

The goal of position papers like this is to signal boost and attract more attention to nascent areas of research, such as CoT monitoring. Companies like OpenAI, Google DeepMind, and Anthropic are already researching these topics, but it’s possible that this paper will encourage more funding and research into the space.

Source link

Related Topics:AI safety Anthropic OpenAI Research

Up Next

AI may diminish demand for high-wage skills like data analysis, research finds

Don't Miss

Small Nonprofits Bleed Funding as Faulty AI Grant Tools Mislead

Maxwell Zeff

Click to comment

AI Research

EY-Parthenon practice unveils neurosymbolic AI capabilities to empower businesses to identify, predict and unlock revenue at scale | EY

Published

25 minutes ago

September 10, 2025

EY-Parthenon practice unveils neurosymbolic AI capabilities to empower businesses to identify, predict and unlock revenue at scale rn

Jeff Schumacher, architect behind the groundbreaking AI solution, to steer EY Growth Platforms.

Ernst & Young LLP (EY) announced the launch of EY Growth Platforms (EYGP), a disruptive artificial intelligence (AI) solution powered by neurosymbolic AI. By combining machine learning with logical reasoning, EYGP empowers organizations to uncover transformative growth opportunities and revolutionize their commercial models for profitability. The neurosymbolic AI workflows that power EY Growth Platforms consistently uncover hundred-million-dollar+ growth opportunities for global enterprises, with the potential to enhance revenue.

This represents a rapid development in enterprise technology—where generative AI and neurosymbolic AI combine to redefine how businesses create value. This convergence empowers enterprises to reimagine growth at impactful scale, producing outcomes that are traceable, trustworthy and statistically sound.

EYGP serves as a powerful accelerator for the innovative work at EY-Parthenon, helping clients with their most complex strategic opportunities to realize greater value and transform their business, by reimagining their business from the ground up—including building and scaling new corporate ventures, or executing high-stakes transactions.

“In today’s uncertain economic climate, leading companies aren’t just adapting—they’re taking control,” says Mitch Berlin, EY Americas Vice Chair, EY-Parthenon. “EY Growth Platforms gives our clients the predictive power and actionable foresight they need to confidently steer their revenue trajectory. EY Growth Platforms is a game changer, poised to become the backbone of enterprise growth.”

How EY Growth Platforms work

Neurosymbolic AI merges the statistical power of neural networks with the structured logic of symbolic reasoning, driving powerful pattern recognition to deliver predictions and decisions that are practical, actionable and grounded in real-world outcomes. EYGP harnesses this powerful technology to simulate real-time market scenarios and their potential impact, uncovering the most effective business strategies tailored to each client. It expands beyond the limits of generative AI, becoming a growth operating system for companies to tackle complex go-to-market challenges and unlock scalable revenue.

At the core of EYGP is a unified data and reasoning engine that ingests structured and unstructured data from internal systems, external signals, and deep EY experience and data sets. Developed over three years, this robust solution is already powering proprietary AI applications and intelligent workflows for EY-Parthenon clients across the consumer product goods, industrials and financial services sectors without the need for extensive data cleaning or digital transformation.

Use cases for EY Growth Platforms

With the ability to operate in complex high-stakes scenarios, EYGP is driving a measurable impact across industries such as:

Financial services: In a tightly regulated industry, transparency and accountability are nonnegotiable. Neurosymbolic AI enhances underwriting, claims processing and compliance with transparency and rigor, validating that decisions are aligned with regulatory standards and optimized for customer outcomes.
Consumer products: Whether powering real-time recommendations, adaptive interfaces or location-aware services, neurosymbolic AI drives hyperpersonalized experiences at a one-to-one level. By combining learned patterns with structured knowledge, it delivers precise, context-rich insights tailored to individual behavior, preferences and environments.
Industrial products: Neurosymbolic AI helps industrial conglomerates optimize the entire value chain — from sourcing and production to distribution and service. By integrating structured domain knowledge with real-time operational data, it empowers leaders to make smarter decisions — from facility placement and supply routing to workforce allocation tailored to specific geographies and market-specific conditions.

The platform launch follows the appointment of Jeff Schumacher as the EYGP Leader for EY-Parthenon. Schumacher brings over 25 years of experience in business strategy, innovation and digital disruption, having helped establish over 100 early growth companies. He is the founder of neurosymbolic AI company, Growth Protocol, the technology that EY has exclusive licensing agreement with.

“Neurosymbolic AI is not another analytics tool, it’s a growth engine,” says Jeff Schumacher, EY Growth Platforms Leader, EY-Parthenon. “With EY Growth Platforms, we’re putting a dynamic, AI-powered operating system in the hands of leaders, giving them the ability to rewire how their companies make money. This isn’t incremental improvement; it’s a complete reset of the commercial model.”

EYGP is currently offered in North America, Europe, and Australia. For more information, visit ey.com/NeurosymbolicAI/

– ends –

About EY

EY is building a better working world by creating new value for clients, people, society and the planet, while building trust in capital markets.

Enabled by data, AI and advanced technology, EY teams help clients shape the future with confidence and develop answers for the most pressing issues of today and tomorrow.

EY teams work across a full spectrum of services in assurance, consulting, tax, strategy and transactions. Fueled by sector insights, a globally connected, multi-disciplinary network and diverse ecosystem partners, EY teams can provide services in more than 150 countries and territories.

All in to shape the future with confidence.

EY refers to the global organization, and may refer to one or more, of the member firms of Ernst & Young Global Limited, each of which is a separate legal entity. Ernst & Young Global Limited, a UK company limited by guarantee, does not provide services to clients. Information about how EY collects and uses personal data and a description of the rights individuals have under data protection legislation are available via ey.com/privacy. EY member firms do not practice law where prohibited by local laws. For more information about our organization, please visit ey.com.

This news release has been issued by EYGM Limited, a member of the global EY organization that also does not provide any services to clients.

Source link

AI Research

MFour Mobile Research Now MFour Data Research, Reflecting Traction in Validated Surveys & AI training data

Published

39 minutes ago

September 10, 2025

MFour Data Research

Name highlights evolution from mobile survey pioneer to leading provider of validated, ethically sourced data powering insights in the AI era.

IRVINE, Calif., Sept. 10, 2025 /PRNewswire/ — MFour Mobile Research, Inc., a pioneer in mobile-based consumer survey research, today announced its new name: MFour Data Research, Inc.

The change reflects the company’s expanded focus on delivering both high-quality validated survey insights and quality consumer behavior data – including connected app, web, location, and purchase journeys – all connected (and anonymized) through a single consumer ID.

Founded in 2011, MFour originally led the industry by using smartphones to improve the quality and accuracy of survey data. Over the past decade, customer demand has grown for ethically sourced, first-party behavior data that can power deeper consumer journey insights and support the next generation of AI-driven decision-making.

“Our new name reflects the products and data we sell today and where our customers are headed,” said Chris St. Hilaire, CEO and Founder of MFour. “Mobile surveys were just the beginning. Today, we combine validated survey data with app, web, location, and purchase behaviors — sourced directly from opted-in consumers through our Fair Trade Data® model. That makes us uniquely positioned to deliver the trusted, transparent datasets companies need in an AI-driven world.”

MFour Data Research’s solutions are anchored by its 4.5-star Surveys On The Go® app, which generates billions of verified data points annually, and the MFour Studio™ platform, where brands and institutions access connected survey and behavior datasets. From Fortune 100 companies to disruptive startups, organizations rely on MFour to provide clarity, accuracy, and confidence in understanding consumer journeys.

SOURCE MFour Data Research

Source link

AI Research

Qodo Unveils Top Deep Research Agent for Coding, Outperforming Leading AI Labs on Multi-Repository Benchmark

Published

1 hour ago

September 10, 2025

PRNewswire

Qodo Aware Deep Research achieves 80% accuracy on new coding benchmark, surpassing OpenAI’s Codex at 74%, Anthropic’s Claude Code at 64%, and Google’s Gemini CLI at 45%

Qodo, the agentic code quality platform, announced Qodo Aware, a new flagship product in its enterprise platform that brings agentic understanding and context engineering to large codebases. It features the industry’s first deep research agent designed specifically for navigating enterprise-scale codebases. In benchmark testing, Qodo Aware’s deep research agent demonstrated superior accuracy and speed compared to leading AI coding agents when answering questions that require context from multiple repositories.

AI has made generating code easy, but ensuring quality at scale is now even harder. Modern software systems span hundreds or thousands of interconnected code repositories, making it nearly impossible for developers to maintain a comprehensive understanding of their organization’s entire codebase. While current AI coding tools excel at single-repository tasks, they cannot traverse the complex web of dependencies and relationships: the 2025 State of AI Code Quality report found that more than 60% of developers say AI coding tools miss relevant context. Qodo Aware addresses this limitation with a context engine that powers deep research agents that can automatically navigate across repository boundaries.

“Developers don’t typically work in isolation, they need to understand how changes in one service affect systems across their entire organization and how those systems evolved to their current state,” said Itamar Friedman, co-founder and CEO of Qodo. “Our deep research agent can analyze impact, dependencies and historical context across thousands of files and hundreds of repositories in seconds, something that could take a principal engineer hours or days to trace manually. This eliminates the traditional speed-quality tradeoff that enterprises face when adopting AI for development, while adding the crucial dimension of understanding not just what the code does, but why it was built that way.”

Also Read: AiThority Interview with Tim Morrs, CEO at SpeakUp

Qodo Aware features three distinct modes, each powered by specialized agents for different use cases. The Deep Research agent performs comprehensive multi-step analysis across repositories, making it ideal for complex architectural questions and system-wide tasks. For quicker code Q&As, the Ask agent provides rapid responses through agentic context retrieval, and the Issue Finder agent searches across repos for bugs, code duplication, security risks, and other hidden issues. These agents can be used to get direct answers, or integrated into existing coding agents, like Cursor and Claude Code, as a powerful context retrieval layer, enhancing their ability to understand large-scale codebases.

Qodo Aware uses a sophisticated indexing and context retrieval approach that combines Language Server Protocol (LSP) analysis, knowledge graphs, and vector embeddings to create deep semantic understanding of code relationships. For enterprises, this means developers can safely modify complex systems without fear of breaking unknown dependencies, reducing deployment risks and accelerating release cycles. Teams report cutting investigation time for complex issues from days to minutes, even when working across massive, interconnected codebases with more than 100M lines of code.

Along with these capabilities, Qodo is releasing a new multi-repository dataset for evaluating coding deep research agents. The dataset includes real-world questions that require information that spans multiple open source code repositories to correctly answer. On the new DeepCodeBench benchmark, Qodo Aware achieved 80% accuracy, while OpenAI Codex scored 74%, Claude Code reached 64%, and Gemini CLI correctly solved 45%. Importantly, Qodo Aware Deep Research took less than half the time of Codex to answer, enabling faster iteration cycles for developers.

Qodo Aware has been integrated directly into existing Qodo development tools – including Qodo Gen IDE agent, Qodo Command CLI agent, and Qodo Merge code review agent – bringing context to workflows across the entire software development lifecycle.. It is also available as a standalone product accessible via Model Context Protocol (MCP) and API, enabling integration with any AI assistant or coding agent. Qodo Aware can be deployed within enterprise single-tenant environments, ensuring code never leaves organizational boundaries, while maintaining the governance and compliance standards enterprises require. It supports GitHub, GitLab, and Bitbucket, with all indexing and processing occurring within customer-controlled infrastructure.

Source link