AI Research
AI is rewriting the rules of the insurance industry
Despite its traditionally risk-averse nature, the insurance industry is being fundamentally reshaped by AI.
AI has already become vital for the insurance industry, touching everything from complex risk calculations to the way insurers talk to their customers. However, while nearly eight out of ten companies are dipping their toes in the AI water, a similar number admit it hasn’t actually made them any more money.
Such figures reveal a simple truth: just buying the fancy new tech isn’t enough. The real winners will be the ones who figure out how to weave it into the very fabric of who they are and everything they do.
You can see the most dramatic changes right at the heart of the business: handling claims. That mountain of paperwork and endless phone calls, a process that could drag on for weeks, is finally being bulldozed by AI.
A deployment by New York-based insurer Lemonade back in 2021 resulted in settling over a third of its claims in just three seconds, with no human input. Or look at a major US travel insurer that handles 400,000 claims a year; it went from a completely manual system to one that was 57% automated, cutting down processing times from weeks to just minutes.
However, this isn’t just about moving faster; it’s about getting it right. AI can slash the kind of costly human errors that lead to claims leakage in the insurance industry by as much as 30%. The knock-on effect is a huge productivity leap, with adjusters able to handle 40-50% more cases. This frees up the real experts to stop being paper-pushers and start focusing on the tricky cases where a human touch and genuine empathy make all the difference.
It’s a similar story for the underwriters, the people who calculate the risks. AI is giving them superpowers, letting them analyse colossal amounts of data from all sorts of places – like telematics or credit scores – that a person could never sift through alone. It can even draft an initial risk report with incredible accuracy by looking at past data and policies in the blink of an eye.
In practice, this helps create pricing that is fairer and more accurately reflects a person’s unique situation. Zurich, for example, used a modern platform to build a risk management tool that made their assessments 90% more accurate.
Suddenly, underwriting isn’t about looking in the rearview mirror anymore—it’s a living, breathing process that can adapt on the fly to new, complex threats like cyberattacks or the effects of climate change.
But this isn’t just about back-office wizardry. When deployed in the insurance industry, AI is completely changing the conversation between insurers and the people they serve. It’s allowing a move away from simply reacting to problems to proactively helping customers.
AI chatbots can offer 24/7 support, getting smarter with every question they answer. This lets the human team focus on the more difficult conversations. The real game-changer, though, is making things personal.
By understanding a customer’s policy and behaviour, AI can gently nudge them with a renewal reminder or suggest a product that actually fits their life, like usage-based car insurance. It’s about showing customers you actually get them, which builds the kind of loyalty that’s been so hard to come by in an industry where over 30% of claimants feel dissatisfied, and 60% blame slow settlements.
This protective instinct also helps the whole system. AI is a brilliant fraud detective for the insurance industry and beyond, spotting weird patterns in data that a person would miss, and has the potential to cut fraud-related losses by up to 40%. It keeps everyone honest and protects the business and its customers.
What’s pouring fuel on this fire of change? A new breed of low-code platforms. They are the accelerators, letting insurers build and launch new apps and services much faster than before. In a world where customer tastes and rules can change overnight, that kind of speed is everything.
The best part of such tools is they democratise access and put the power to innovate into more hands. They allow regular business users – or ‘citizen developers’ – to build the tools they need without having to be coding geniuses. These platforms often come with strong security and controls, meaning this newfound speed doesn’t have to mean sacrificing safety or compliance, which is non-negotiable for an industry like insurance.
When you step back and look at the big picture, it’s clear that getting on board with AI isn’t just a tech project; it’s a make-or-break business strategy. Those who jumped in early are already pulling away from the pack, seeing things like a 14% jump in customer retention and a 48% rise in Net Promoter Scores.
The market for this technology is set to explode to over $14 billion dollars by 2034, and some believe AI could add $1.1 trillion in value to the industry every year. But the biggest roadblocks aren’t about the technology itself; they’re about people and old habits.
Data, especially in an industry like insurance, is often stuck in old systems which stops AI from seeing the whole picture. To get past this, you need more than clever software. You need leaders with a clear vision, a willingness to change the company culture, and a commitment to training their people.
The winners in this new era won’t be the ones tinkering with AI in a corner—they’ll be the ones who lead from the top, with a clear plan to make it a part of their DNA. This will require an understanding that it’s not just about doing old things better, but about finding entirely new ways to bring value and build trust.
Learn more about how AI is rewriting the rules of the insurance industry at the upcoming webinar “From Complexity to Clarity: AI + Agility Layer for Intelligent Insurance” on July 16, 2025, at 7PM BST / 2PM ET. Industry experts from Appian and EXL will share real-world examples and practical insights into how leading carriers are implementing these technologies. Registration is available at the webinar link.
Featured speakers include:
- Vikram Machado, Senior Vice President & Practice Leader – Life, Annuities, Retirements & Group Insurance, EXL
- Vikrant Saraswat, Vice President – AI Consulting, EXL
- Jack Moroney, Enterprise Account Executive – Insurance & Financial Services, Appian
- Andrew Kearns, Insurance Industry Lead, Appian
- Michaela Morari, Senior Solution Consultant – Insurance & Financial Services, Appian
See also: UK and Singapore form alliance to guide AI in finance
AI Research
Artificial Intelligence News for the Week of July 11; Updates from Capgemini, Cerebras, Cloudian & More
Solutions Review Executive Editor Tim King curated this list of notable artificial intelligence news for the week of July 11, 2025.
Keeping tabs on all the most relevant artificial intelligence news can be a time-consuming task. As a result, our editorial team aims to provide a summary of the top headlines from the last week in this space. Solutions Review editors will curate vendor product news, mergers and acquisitions, venture capital funding, talent acquisition, and other noteworthy artificial intelligence news items.
For early access to all the expert insights published on Solutions Review, join Insight Jam, a community dedicated to enabling the human conversation on AI.
Artificial Intelligence News for the Week of July 11, 2025
Accenture and Microsoft Expand Cybersecurity Partnership with GenAI Solutions
Accenture and Microsoft have deepened their partnership to deliver generative AI-powered cybersecurity solutions. The collaboration focuses on modernizing security operations, automating data protection, and enhancing identity and access management. By combining Accenture’s cybersecurity expertise with Microsoft’s security technologies, the alliance aims to help organizations tackle advanced threats, optimize security tools, and reduce operational costs.
Read the full article: Accenture & Microsoft Cyber Collaboration
Capgemini to Acquire WNS, Creating a Global Agentic AI Powerhouse
Capgemini has announced its acquisition of WNS for $3.3 billion, aiming to become a global leader in agentic AI-powered intelligent operations. The deal will combine Capgemini’s and WNS’s strengths in digital business process services (BPS), blending vertical sector expertise with scale to address the rapidly growing demand for AI-driven transformation.
Read the full press release: Capgemini to acquire WNS
Cerebras Launches Qwen3-235B: The World’s Fastest Frontier AI Model with 131K Context
Cerebras has unveiled Qwen3-235B, a groundbreaking AI reasoning model now available on the Cerebras Inference Cloud. Boasting a massive 131,000-token context window, Qwen3-235B delivers code generation and reasoning at 30 times the speed and one-tenth the cost of leading closed-source alternatives.
Read the full press release: Cerebras Launches Qwen3-235B
CapStorm Launches CapStorm:AI for Secure, Self-Hosted Data Insights
CapStorm has unveiled CapStorm:AI, a self-hosted AI solution that allows organizations to interact with their Salesforce and SQL data using natural language. The platform delivers real-time dashboards and insights without coding, keeping all data within the organization’s environment for maximum security and control. CapStorm:AI works with leading SQL databases and cloud data warehouses, empowering users to unlock actionable intelligence from complex datasets.
Read the full press release: CapStorm Launches CapStorm:AI
Cloudian Unveils Unified AI Inferencing and Data Storage Platform
Cloudian has launched a breakthrough platform that integrates high-performance object storage with AI inferencing capabilities, dramatically simplifying enterprise AI infrastructure. The new solution combines Cloudian HyperStore’s industry-leading storage—delivering up to 35GB/s per node—with integrated support for the Milvus vector database, enabling real-time, low-latency AI inferencing on petabyte-scale datasets.
Read the full press release: Cloudian Delivers Integrated AI Inferencing and Data Storage Solution
Cognizant Debuts Agent Foundry to Scale Agentic AI Across Enterprises
Cognizant has launched Agent Foundry, a new framework designed to help enterprises deploy and orchestrate autonomous AI agents at scale. The offering combines modular design, reusable assets, and multi-platform interoperability, enabling organizations to embed agentic capabilities into their workflows for adaptive operations and real-time decision-making. Agent Foundry supports the full lifecycle of agent deployment, from discovery to enterprise-wide scaling.
Read the full press release: Cognizant Introduces Agent Foundry
AI-Driven Cloud Demand Powers Record Q2 Growth in Global IT and Business Services
The latest ISG Index™ reveals that surging demand for cloud services—driven by enterprise AI initiatives—propelled the global IT and business services market to a record $29.2 billion in Q2, up 17% year-over-year. Cloud-based “as-a-service” (XaaS) offerings soared 28%, fueled by infrastructure investments from major hyperscalers, while managed services saw steady growth.
Read the full press release: AI-Driven Cloud Demand Fuels Q2 Growth in Global IT and Business Services Market: ISG Index
ManageEngine Report: Shadow AI as a Strategic Advantage
A new report from ManageEngine reveals that while 97 percent of IT leaders see significant risks in “shadow AI” (unauthorized AI tool use), 91 percent of employees believe the risks are minimal or outweighed by rewards. The report highlights the rapid adoption of unapproved AI tools—60 percent of employees use them more than a year ago—and identifies data leakage as a primary concern.
Read the full report summary: ManageEngine Shadow AI Report
National Academy for AI Instruction Launches with Microsoft, OpenAI, Anthropic, and AFT
The American Federation of Teachers (AFT), with support from Microsoft, OpenAI, and Anthropic, is launching the National Academy for AI Instruction in Manhattan. This $23 million initiative will train educators to harness AI technology in the classroom, with OpenAI contributing $10 million, Microsoft $12.5 million, and Anthropic $500,000 in the first year.
Read the full press release: AFT to launch National Academy for AI Instruction
SambaNova Launches First Turnkey AI Inference Solution for Data Centers
SambaNova has introduced SambaManaged, a turnkey AI inference solution for data centers that can be deployed in just 90 days—far faster than the industry norm. The modular system, powered by SambaNova’s SN40L AI chips, enables existing data centers to offer high-performance AI inference services with minimal infrastructure changes. This innovation addresses the growing demand for rapid, scalable AI infrastructure and is already being adopted by major public companies.
Read the full press release: SambaNova Launches Turnkey AI Inference Solution
WEKA Debuts NeuralMesh Axon for Exascale AI Deployments
WEKA has introduced NeuralMesh Axon, a breakthrough storage system designed for exascale AI workloads. Leveraging a fusion architecture, NeuralMesh Axon delivers up to 20x faster AI performance and 90 percent GPU utilization, addressing the challenges of large-scale AI training and inference. The system integrates seamlessly with GPU servers and AI factories, enabling organizations to accelerate AI model development, reduce costs, and maximize infrastructure efficiency.
Read the full press release: WEKA Debuts NeuralMesh Axon
Expert Insights
Watch this space each week as our editors will share upcoming events, new thought leadership, and the best resources from Insight Jam, Solutions Review’s enterprise tech community where the human conversation around AI is happening. The goal? To help you gain a forward-thinking analysis and remain on-trend through expert advice, best practices, predictions, and vendor-neutral software evaluation tools.
Take the Tech Leader Survey – Spring 2025 Now
In partnership with Skiilify Co-Founder and distinguished Northeastern University Professor Paula Caligiuri, PhD, we’ve just launched our latest enterprise tech leader Survey to uncover how thought leaders are thinking about disruption in this AI moment.
The Digital Analyst with John Santaferraro Featuring IBM’s Bruno Aziza: Deep Blue, Deep Learning & the Future of AI
Bruno reveals why only 16 percent of organizations have achieved enterprise-scale AI adoption, shares battle-tested strategies from companies like PepsiCo and NatWest, and explains why the future belongs to leaders who can orchestrate agents at scale rather than just build them.
NEW Episode of Insight AI Featuring Doug Shannon: AGI on the Horizon
They break down what this means for knowledge workers, consultants, and anyone who thought their job was safe from automation. The conversation gets real about the five stages of AI grief most people are experiencing, why Apple is flailing while Meta throws $100 million at talent, and how to find your uniquely human value before the machines come for your paycheck.
Understanding & Preparing for the 7 Levels of AI Agents by Douglas Laney
The following framework for agentic AI stems from a computer science base with theoretical psychology and theoretical philosophy perspectives. Each of the seven levels represents a step-change in technology, capability, and autonomy. The framework shows how organizations gain more potential to innovate and thrive while transforming through data-powered and AI-based digital economic systems.
GLEWs Views:AI Transparency Moves Beyond Moratorium by Gregory Lewandowski
Following the Senate’s removal of a proposed AI development moratorium from major legislation in July 2025, Anthropic announced a targeted transparency framework for frontier AI companies. Their framework targets only the largest AI developers while establishing specific disclosure obligations around safety practices. This represents a significant shift in how the AI industry approaches self-regulation in the absence of comprehensive federal legislation.
6 Must-Have Human-Centric Skills for the AI Age by Tim King
Yet despite this shift, most organizations are not prepared. A proprietary study of over 200 senior tech professionals (get the research by my team and I here)—including AI practitioners, cybersecurity leaders, and IT executives—reveals a stark disconnect: while nearly all respondents believe human-centered skills are vital for the AI age, the vast majority admit their organizations lack the structure, time, or training mechanisms to develop them.
Take the Tech Leader Survey – Spring 2025 Now
In partnership with Skiilify Co-Founder and distinguished Northeastern University Professor Paula Caligiuri, PhD, we’ve just launched our latest enterprise tech leader Survey to uncover how thought leaders are thinking about disruption in this AI moment.
Mini Jam Highlights: Has AI Completely Replaced Process Automation?
Our AI industry experts debate whether AI agents have completely replaced traditional process automation (RPA) or if the future lies in a hybrid approach combining both technologies. This panel discussion reveals the hidden costs of AI implementation, the importance of solving real business problems over chasing use cases, and how the shift from SaaS to “Agent as a Service” is reshaping enterprise technology strategies.
Mini Jam Highlights: Best Cybersecurity Use Cases for AI Agents
Our cybersecurity experts reveal the most effective AI agent use cases transforming enterprise security operations, from compliance automation to vulnerability management and threat detection. They cover real-world implementations including CIS control optimization, SOC analyst assessment systems, and proactive vulnerability identification, while addressing the critical balance between AI autonomy and human oversight in security operations. Essential viewing for security leaders evaluating AI agent deployment strategies.
Mini Jam Highlights: Building and Deploying AI Agent Systems at Scale
Our AI and data experts dive deep into the architecture and infrastructure powering enterprise AI agent systems at scale, from low-latency decision making to vector databases and real-time streaming. This comprehensive technical discussion reveals the challenges of building reliable, traceable, and scalable agentic AI systems, including the critical role of human feedback loops and the current limitations preventing full AI agent autonomy. Essential viewing for technical leaders architecting AI agent deployments.
Mini Jam Highlights On-Demand: How AI Agents Will Transform Business Culture Forever
Our AI industry experts explore how agentic AI will fundamentally reshape business culture, workforce dynamics, and professional roles in the coming years. They discuss the shift from traditional employment to collaborative business partnerships, the rise of new AI-focused roles, and how companies must adapt their culture as AI agents automate routine tasks.
For consideration in future artificial intelligence news roundups, send your announcements to the editor: tking@solutionsreview.com.
AI Research
Simulation-based pipeline tailors training data for dexterous robots | MIT News
When ChatGPT or Gemini give what seems to be an expert response to your burning questions, you may not realize how much information it relies on to give that reply. Like other popular generative artificial intelligence (AI) models, these chatbots rely on backbone systems called foundation models that train on billions, or even trillions, of data points.
In a similar vein, engineers are hoping to build foundation models that train a range of robots on new skills like picking up, moving, and putting down objects in places like homes and factories. The problem is that it’s difficult to collect and transfer instructional data across robotic systems. You could teach your system by teleoperating the hardware step-by-step using technology like virtual reality (VR), but that can be time-consuming. Training on videos from the internet is less instructive, since the clips don’t provide a step-by-step, specialized task walk-through for particular robots.
A simulation-driven approach called “PhysicsGen” from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Robotics and AI Institute customizes robot training data to help robots find the most efficient movements for a task. The system can multiply a few dozen VR demonstrations into nearly 3,000 simulations per machine. These high-quality instructions are then mapped to the precise configurations of mechanical companions like robotic arms and hands.
PhysicsGen creates data that generalize to specific robots and condition via a three-step process. First, a VR headset tracks how humans manipulate objects like blocks using their hands. These interactions are mapped in a 3D physics simulator at the same time, visualizing the key points of our hands as small spheres that mirror our gestures. For example, if you flipped a toy over, you’d see 3D shapes representing different parts of your hands rotating a virtual version of that object.
The pipeline then remaps these points to a 3D model of the setup of a specific machine (like a robotic arm), moving them to the precise “joints” where a system twists and turns. Finally, PhysicsGen uses trajectory optimization — essentially simulating the most efficient motions to complete a task — so the robot knows the best ways to do things like repositioning a box.
Each simulation is a detailed training data point that walks a robot through potential ways to handle objects. When implemented into a policy (or the action plan that the robot follows), the machine has a variety of ways to approach a task, and can try out different motions if one doesn’t work.
“We’re creating robot-specific data without needing humans to re-record specialized demonstrations for each machine,” says Lujie Yang, an MIT PhD student in electrical engineering and computer science and CSAIL affiliate who is the lead author of a new paper introducing the project. “We’re scaling up the data in an autonomous and efficient way, making task instructions useful to a wider range of machines.”
Generating so many instructional trajectories for robots could eventually help engineers build a massive dataset to guide machines like robotic arms and dexterous hands. For example, the pipeline might help two robotic arms collaborate on picking up warehouse items and placing them in the right boxes for deliveries. The system may also guide two robots to work together in a household on tasks like putting away cups.
PhysicsGen’s potential also extends to converting data designed for older robots or different environments into useful instructions for new machines. “Despite being collected for a specific type of robot, we can revive these prior datasets to make them more generally useful,” adds Yang.
Addition by multiplication
PhysicsGen turned just 24 human demonstrations into thousands of simulated ones, helping both digital and real-world robots reorient objects.
Yang and her colleagues first tested their pipeline in a virtual experiment where a floating robotic hand needed to rotate a block into a target position. The digital robot executed the task at a rate of 81 percent accuracy by training on PhysicGen’s massive dataset, a 60 percent improvement from a baseline that only learned from human demonstrations.
The researchers also found that PhysicsGen could improve how virtual robotic arms collaborate to manipulate objects. Their system created extra training data that helped two pairs of robots successfully accomplish tasks as much as 30 percent more often than a purely human-taught baseline.
In an experiment with a pair of real-world robotic arms, the researchers observed similar improvements as the machines teamed up to flip a large box into its designated position. When the robots deviated from the intended trajectory or mishandled the object, they were able to recover mid-task by referencing alternative trajectories from their library of instructional data.
Senior author Russ Tedrake, who is the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT, adds that this imitation-guided data generation technique combines the strengths of human demonstration with the power of robot motion planning algorithms.
“Even a single demonstration from a human can make the motion planning problem much easier,” says Tedrake, who is also a senior vice president of large behavior models at the Toyota Research Institute and CSAIL principal investigator. “In the future, perhaps the foundation models will be able to provide this information, and this type of data generation technique will provide a type of post-training recipe for that model.”
The future of PhysicsGen
Soon, PhysicsGen may be extended to a new frontier: diversifying the tasks a machine can execute.
“We’d like to use PhysicsGen to teach a robot to pour water when it’s only been trained to put away dishes, for example,” says Yang. “Our pipeline doesn’t just generate dynamically feasible motions for familiar tasks; it also has the potential of creating a diverse library of physical interactions that we believe can serve as building blocks for accomplishing entirely new tasks a human hasn’t demonstrated.”
Creating lots of widely applicable training data may eventually help build a foundation model for robots, though MIT researchers caution that this is a somewhat distant goal. The CSAIL-led team is investigating how PhysicsGen can harness vast, unstructured resources — like internet videos — as seeds for simulation. The goal: transform everyday visual content into rich, robot-ready data that could teach machines to perform tasks no one explicitly showed them.
Yang and her colleagues also aim to make PhysicsGen even more useful for robots with diverse shapes and configurations in the future. To make that happen, they plan to leverage datasets with demonstrations of real robots, capturing how robotic joints move instead of human ones.
The researchers also plan to incorporate reinforcement learning, where an AI system learns by trial and error, to make PhysicsGen expand its dataset beyond human-provided examples. They may augment their pipeline with advanced perception techniques to help a robot perceive and interpret their environment visually, allowing the machine to analyze and adapt to the complexities of the physical world.
For now, PhysicsGen shows how AI can help us teach different robots to manipulate objects within the same category, particularly rigid ones. The pipeline may soon help robots find the best ways to handle soft items (like fruits) and deformable ones (like clay), but those interactions aren’t easy to simulate yet.
Yang and Tedrake wrote the paper with two CSAIL colleagues: co-lead author and MIT PhD student Hyung Ju “Terry” Suh SM ’22 and MIT PhD student Bernhard Paus Græsdal. Robotics and AI Institute researchers Tong Zhao ’22, MEng ’23, Tarik Kelestemur, Jiuguang Wang, and Tao Pang PhD ’23 are also authors. Their work was supported by the Robotics and AI Institute and Amazon.
The researchers recently presented their work at the Robotics: Science and Systems conference.
AI Research
Elon Musk’s New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up
New Grok 4 Takes on ‘Humanity’s Last Exam’ as the AI Race Heats Up
Elon Musk has launched xAI’s Grok 4—calling it the “world’s smartest AI” and claiming it can ace Ph.D.-level exams and outpace rivals such as Google’s Gemini and OpenAI’s o3 on tough benchmarks
Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, “the smartest AI in the world” and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences.
During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity’s Last Exam (HLE)—a 2,500-question benchmark designed to evaluate an AI’s academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google’s Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI’s o3 model (which got 24.9 percent, also with the tools). The results from xAI’s internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called “Mana”) on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE’s leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.)
During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the “weirdest” profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year—and possibly “new physics” by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy—the deluxe package with multiple agents and research tools—runs at $300.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI’s o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2—benchmarks that measure progress toward “humanlike” general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4’s results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. “Before we report performance for any lab, it’s not verified unless we verify it,” Kamradt says. “We approved the [testing results] slide that [the xAI team] showed in the launch.”
According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. “Grok has been strong on math and programming in my tests, and I’ve been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,” Olteanu says. “Its context window, however, isn’t very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.” (Multimodal abilities refer to a model’s capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.)
On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X—owned by Musk himself—as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk’s stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of “white genocide”—incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures.
At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good—probably. “I somewhat reconciled myself to the fact that, even if it wasn’t going to be good, I’d at least like to be alive to see it happen,” he said.
-
Funding & Business2 weeks ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers2 weeks ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions2 weeks ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business1 week ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers1 week ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Education4 days ago
9 AI Ethics Scenarios (and What School Librarians Would Do)
-
Education1 week ago
AERDF highlights the latest PreK-12 discoveries and inventions
-
Education4 days ago
Teachers see online learning as critical for workforce readiness in 2025
-
Education6 days ago
How ChatGPT is breaking higher education, explained
-
Education5 days ago
Nursery teachers to get £4,500 to work in disadvantaged areas