AI Research

The End of Chain-of-Thought? CoreThink and University of California Researchers Propose a Paradigm Shift in AI Reasoning

Published

1 day ago

September 5, 2025

For years, the race in artificial intelligence has been about scale. Bigger models, more GPUs, longer prompts. OpenAI, Anthropic, and Google have led the charge with massive large language models (LLMs), reinforcement learning fine-tuning, and chain-of-thought prompting—techniques designed to simulate reasoning by spelling out step-by-step answers.

But a new technical white paper titled CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs from CoreThink AI and University of California researchers argues that this paradigm may be reaching its ceiling. The authors make a provocative claim: LLMs are powerful statistical text generators, but they are not reasoning engines. And chain-of-thought, the method most often used to suggest otherwise, is more performance theater than genuine logic.

In response, the team introduces General Symbolics, a neuro-symbolic reasoning layer designed to plug into existing models. Their evaluations show dramatic improvements across a wide range of reasoning benchmarks—achieved without retraining or additional GPU cost. If validated, this approach could mark a turning point in how AI systems are designed for logic and decision-making.

What Is Chain-of-Thought — and Why It Matters

Chain-of-thought (CoT) prompting has become one of the most widely adopted techniques in modern AI. By asking a model to write out its reasoning steps before delivering an answer, researchers found they could often improve benchmark scores in areas like mathematics, coding, and planning. On the surface, it seemed like a breakthrough.

Yet the report underscores the limitations of this approach. CoT explanations may look convincing, but studies show they are often unfaithful to what the model actually computed, rationalizing outputs after the fact rather than revealing true logic. This creates real-world risks. In medicine, a plausible narrative may mask reliance on spurious correlations, leading to dangerous misdiagnoses. In law, fabricated rationales could be mistaken for genuine justifications, threatening due process and accountability.

The paper further highlights inefficiency: CoT chains often grow excessively long on simple problems, while collapsing into shallow reasoning on complex ones. The result is wasted computation and, in many cases, reduced accuracy. The authors conclude that chain-of-thought is “performative, not mechanistic”—a surface-level display that creates the illusion of interpretability without delivering it.

Symbolic AI: From Early Dreams to New Revivals

The critique of CoT invites a look back at the history of symbolic AI. In its earliest decades, AI research revolved around rule-based systems that encoded knowledge in explicit logical form. Expert systems like MYCIN attempted to diagnose illnesses by applying hand-crafted rules, and fraud detection systems relied on vast logic sets to catch anomalies.

Symbolic AI had undeniable strengths: every step of its reasoning was transparent and traceable. But these systems were brittle. Encoding tens of thousands of rules required immense labor, and they struggled when faced with novel situations. Critics like Hubert Dreyfus argued that human intelligence depends on tacit, context-driven know-how that no rule set could capture. By the 1990s, symbolic approaches gave way to data-driven neural networks.

In recent years, there has been a renewed effort to combine the strengths of both worlds through neuro-symbolic AI. The idea is straightforward: let neural networks handle messy, perceptual inputs like images or text, while symbolic modules provide structured reasoning and logical guarantees. But most of these hybrids have struggled with integration. Symbolic backbones were too rigid, while neural modules often undermined consistency. The result was complex, heavy systems that failed to deliver the promised interpretability.

General Symbolics: A New Reasoning Layer

CoreThink’s General Symbolics Reasoner (GSR) aims to overcome these limitations with a different approach. Instead of translating language into rigid formal structures or high-dimensional embeddings, GSR operates entirely within natural language itself. Every step of reasoning is expressed in words, ensuring that context, nuance, and modality are preserved. This means that differences like “must” versus “should” are carried through the reasoning process, rather than abstracted away.

The framework works by parsing inputs natively in natural language, applying logical constraints through linguistic transformations, and producing verbatim reasoning traces that remain fully human-readable. When contradictions or errors appear, they are surfaced directly in the reasoning path, allowing for transparency and debugging. To remain efficient, the system prunes unnecessary steps, enabling stable long-horizon reasoning without GPU scaling.

Because it acts as a layer rather than requiring retraining, GSR can be applied to existing base models. In evaluations, it consistently delivered accuracy improvements of between 30 and 60 percent across reasoning tasks, all without increasing training costs.

Benchmark Results

The improvements are best illustrated through benchmarks. On LiveCodeBench v6, which evaluates competition-grade coding problems, CoreThink achieved a 66.6 percent pass rate—substantially higher than leading models in its category. In SWE-Bench Lite, a benchmark for real-world bug fixing drawn from GitHub repositories, the system reached 62.3 percent accuracy, the highest result yet reported. And on ARC-AGI-2, one of the most demanding tests of abstract reasoning, it scored 24.4 percent, far surpassing frontier models like Claude and Gemini, which remain below 6 percent.

These numbers reflect more than raw accuracy. In detailed case studies, the symbolic layer enabled models to act differently. In scikit-learn’s ColumnTransformer, for instance, a baseline model proposed a superficial patch that masked the error. The CoreThink-augmented system instead identified the synchronization problem at the root and fixed it comprehensively. On a difficult LeetCode challenge, the base model misapplied dynamic programming and failed entirely, while the symbolic reasoning layer corrected the flawed state representation and produced a working solution.

How It Fits into the Symbolic Revival

General Symbolics joins a growing movement of attempts to bring structure back into AI reasoning. Classic symbolic AI showed the value of transparency but could not adapt to novelty. Traditional neuro-symbolic hybrids promised balance but often became unwieldy. Planner stacks that bolted search onto LLMs offered early hope but collapsed under complexity as tasks scaled.

Recent advances point to the potential of new hybrids. DeepMind’s AlphaGeometry, for instance, has demonstrated that symbolic structures can outperform pure neural models on geometry problems. CoreThink’s approach extends this trend. In its ARC-AGI pipeline, deterministic object detection and symbolic pattern abstraction are combined with neural execution, producing results far beyond those of LLM-only systems. In tool use, the symbolic layer helps maintain context and enforce constraints, allowing for more reliable multi-turn planning.

The key distinction is that General Symbolics does not rely on rigid logic or massive retraining. By reasoning directly in language, it remains flexible while preserving interpretability. This makes it lighter than earlier hybrids and, crucially, practical for integration into enterprise applications.

Why It Matters

If chain-of-thought is an illusion of reasoning, then the AI industry faces a pressing challenge. Enterprises cannot depend on systems that only appear to reason, especially in high-stakes environments like medicine, law, and finance. The paper suggests that real progress will come not from scaling models further, but from rethinking the foundations of reasoning itself.

General Symbolics is one such foundation. It offers a lightweight, interpretable layer that can enhance existing models without retraining, producing genuine reasoning improvements rather than surface-level narratives. For the broader AI community, it marks a possible paradigm shift: a return of symbolic reasoning, not as brittle rule sets, but as a flexible companion to neural learning.

As the authors put it: “We don’t need to add more parameters to get better reasoning—we need to rethink the foundations.”

Source link

Related Topics:chain of thought Chain-of-Thought (CoT)corethink CoT paper white paper

Up Next

Delaware Partnership to Build AI Skills in Students, Workers

Don't Miss

Microsoft’s Experimental Computer Could Run AI Workloads With Less Energy

Antoine Tardif

Click to comment

AI Research

Now Artificial Intelligence (AI) for smarter prison surveillance in West Bengal – The CSR Journal

Published

2 hours ago

September 7, 2025

Ujjal Roy

Now Artificial Intelligence (AI) for smarter prison surveillance in West Bengal The CSR Journal

Source link

AI Research

OpenAI business to burn $115 billion through 2029 The Information

Published

5 hours ago

September 6, 2025

ArticleHeader-author

OpenAI CEO Sam Altman walks on the day of a meeting of the White House Task Force on Artificial Intelligence (AI) Education in the East Room at the White House in Washington, D.C., U.S., September 4, 2025.

Brian Snyder | Reuters

OpenAI has sharply raised its projected cash burn through 2029 to $115 billion as it ramps up spending to power the artificial intelligence behind its popular ChatGPT chatbot, The Information reported on Friday.

The new forecast is $80 billion higher than the company previously expected, the news outlet said, without citing a source for the report.

OpenAI, which has become one of the world’s biggest renters of cloud servers, projects it will burn more than $8 billion this year, some $1.5 billion higher than its projection from earlier this year, the report said.

The company did not immediately respond to Reuters request for comment.

To control its soaring costs, OpenAI will seek to develop its own data center server chips and facilities to power its technology, The Information said.

OpenAI is set to produce its first artificial intelligence chip next year in partnership with U.S. semiconductor giant Broadcom, the Financial Times reported on Thursday, saying OpenAI plans to use the chip internally rather than make it available to customers.

The company deepened its tie-up with Oracle in July with a planned 4.5-gigawatts of data center capacity, building on its Stargate initiative, a project of up to $500 billion and 10 gigawatts that includes Japanese technology investor SoftBank. OpenAI has also added Alphabet’s Google Cloud among its suppliers for computing capacity.

The company’s cash burn will more than double to over $17 billion next year, $10 billion higher than OpenAI’s earlier projection, with a burn of $35 billion in 2027 and $45 billion in 2028, The Information said.

Read the complete report by The Information here.

Source link

AI Research

Who is Shawn Shen? The Cambridge alumnus and ex-Meta scientist offering $2M to poach AI researchers

Published

10 hours ago

September 6, 2025

Apeksha Tanwar

Shawn Shen, co-founder and Chief Executive Officer of the artificial intelligence (AI) startup Memories.ai, has made headlines for offering compensation packages worth up to $2 million to attract researchers from top technology companies. In a recent interview with Business Insider, Shen explained that many scientists are leaving Meta, the parent company of Facebook, due to constant reorganisations and shifting priorities.“Meta is constantly doing reorganizations. Your manager and your goals can change every few months. For some researchers, it can be really frustrating and feel like a waste of time,” Shen told Business Insider, adding that this is a key reason why researchers are seeking roles at startups. He also cited Meta Chief Executive Officer Mark Zuckerberg’s philosophy that “the biggest risk is not taking any risks” as a motivation for his own move into entrepreneurship.With Memories.ai, a company developing AI capable of understanding and remembering visual data, Shen is aiming to build a niche team of elite researchers. His company has already recruited Chi-Hao Wu, a former Meta research scientist, as Chief AI Officer, and is in talks with other researchers from Meta’s Superintelligence Lab as well as Google DeepMind.

From full scholarships to Cambridge classrooms

Shen’s academic journey is rooted in engineering, supported consistently by merit-based scholarships. He studied at Dulwich College from 2013 to 2016 on a full scholarship, completing his A-Level qualifications.He then pursued higher education at the University of Cambridge, where he was awarded full scholarships throughout. Shen earned a Bachelor of Arts (BA) in Engineering (2016–2019), followed by a Master of Engineering (MEng) at Trinity College (2019–2020). He later continued at Cambridge as a Meta PhD Fellow, completing his Doctor of Philosophy (PhD) in Engineering between 2020 and 2023.

Early career: Internships in finance and research

Alongside his academic pursuits, Shen gained early experience through internships and analyst roles in finance. He worked as a Quantitative Research Summer Analyst at Killik & Co in London (2017) and as an Investment Banking Summer Analyst at Morgan Stanley in Shanghai (2018).Shen also interned as a Research Scientist at the Computational and Biological Learning Lab at the University of Cambridge (2019), building the foundations for his transition into advanced AI research.

From Meta’s Reality Labs to academia

After completing his PhD, Shen joined Meta (Reality Labs Research) in Redmond, Washington, as a Research Scientist (2022–2024). His time at Meta exposed him to cutting-edge work in generative AI, but also to the frustrations of frequent corporate restructuring. This experience eventually drove him toward building his own company.In April 2024, Shen began his academic career as an Assistant Professor at the University of Bristol, before launching Memories.ai in October 2024.

Betting on talent with $2M offers

Explaining his company’s aggressive hiring packages, Shen told Business Insider: “It’s because of the talent war that was started by Mark Zuckerberg. I used to work at Meta, and I speak with my former colleagues often about this. When I heard about their compensation packages, I was shocked — it’s really in the tens of millions range. But it shows that in this age, AI researchers who make the best models and stand at the frontier of technology are really worth this amount of money.”Shen noted that Memories.ai is looking to recruit three to five researchers in the next six months, followed by up to ten more within a year. The company is prioritising individuals willing to take a mix of equity and cash, with Shen emphasising that these recruits would be treated as founding members rather than employees.By betting heavily on talent, Shen believes Memories.ai will be in a strong position to secure additional funding and establish itself in the competitive AI landscape.His bold $2 million offers may raise eyebrows, but they also underline a larger truth: in today’s technology race, the fiercest competition is not for customers or capital, it’s for talent.

Source link