AI Insights
AI Agents Do Well in Simulations, Falter in Real-World Test

In a bid to test whether artificial intelligence (AI) agents can operate autonomously in the real economy, Andon Labs and Anthropic deployed Claude Sonnet 3.7 — nicknamed “Claudius” — to run an actual small, automated vending store at Anthropic’s San Francisco office for a month.
AI Insights
When Cybercriminals Weaponize Artificial Intelligence at Scale

Anthropic’s August threat intelligence report sounds like a cybersecurity novel, except it’s terrifyingly not fiction. The report describes how cybercriminals used Claude AI to orchestrate and attack 17 organizations with ransom demands exceeding $500,000. This may be the most sophisticated AI-driven attack campaign to date.
But beyond the alarming headlines lies a more fundamental swing – the emergence of “agentic cybercrime,” where AI doesn’t just assist attackers, it becomes their co-pilot, strategic advisor, and operational commander all at once.
The End of Traditional Cybercrime Economics
The Anthropic report highlights a cruel reality that IT leaders have long feared. The economics of cybercrime have undergone significant change. What previously required teams of specialized attackers working for weeks can now be accomplished by a single individual in a matter of hours with AI assistance.
For example, the “vibe hacking” operation is detailed in the report. One cybercriminal used Claude Code to automate reconnaissance across thousands of systems, create custom malware with anti-detection capabilities, perform real-time network penetration, and analyze stolen financial data to calculate psychologically optimized ransom amounts.
More than just following instructions, the AI made tactical decisions about which data to exfiltrate and crafted victim-specific extortion strategies that maximized psychological pressure.
Sophisticated Attack Democratization
One of the most unnerving revelations in Anthropic’s report involves North Korean IT workers who have infiltrated Fortune 500 companies using AI to simulate technical competence they don’t have. While these attackers are unable to write basic code or communicate professionally in English, they’re successfully maintaining full-time engineering positions at major corporations thanks to AI handling everything from technical interviews to daily work deliverables.
The report also discloses that 61 percent of the workers’ AI usage focused on frontend development, 26 percent on programming tasks, and 10 percent on interview preparation. They are essentially human proxies for AI systems, channeling hundreds of millions of dollars to North Korea’s weapons programs while their employers remain unaware.
Similarly, the report reveals how criminals with little technical skill are developing and selling sophisticated ransomware-as-a-service packages for $400 to $1,200 on dark web forums. Features that previously required years of specialized knowledge, such as ChaCha20 encryption, anti-EDR techniques, and Windows internals exploitation, are now generated on demand with the aid of AI.
Defense Speed Versus Attack Velocity
Traditional cybersecurity operates on human timetables, with threat detection, analysis, and response cycles measured in hours or days. AI-powered attacks, on the other hand, operate at machine speed, with reconnaissance, exploitation, and data exfiltration occurring in minutes.
The cybercriminal highlighted in Anthropic’s report automated network scanning across thousands of endpoints, identified vulnerabilities with “high success rates,” and crossed through compromised networks faster than human defenders could respond. When initial attack vectors failed, the AI immediately generated alternative attacks, creating a dynamic adversary that adapted in real-time.
This speed delta creates an impossible situation for traditional security operations centers (SOCs). Human analysts cannot keep up with the velocity and persistence of AI-augmented attackers operating 24/7 across multiple targets simultaneously.
Asymmetry of Intelligence
What makes these AI-powered attacks particularly dangerous isn’t only their speed – it’s their intelligence. The criminals highlighted in the report utilized AI to analyze stolen data and develop “profit plans” by incorporating multiple monetization strategies. Claude evaluated financial records to gauge optimal ransom amounts, analyzed organizational structures to locate key decision-makers, and crafted sector-specific threats based on regulatory vulnerabilities.
This level of strategic thinking, combined with operational execution, has created a new category of threats. These aren’t script-based armatures using predefined playbooks; they’re adaptive adversaries that learn and evolve throughout each campaign.
The Acceleration of the Arms Race
The current challenge is summed up as: “All of these operations were previously possible but would have required dozens of sophisticated people weeks to carry out the attack. Now all you need is to spend $1 and generate 1 million tokens.”
The asymmetry is significant. Human defenders must deal with procurement cycles, compliance requirements, and organizational approval before deploying new security technologies. Cybercriminals simply create new accounts when existing ones are blocked – a process that takes about “13 seconds.”
But this predicament also presents an opportunity. The same AI functions being weaponized can be harnessed for defenses, and in many cases defensive AI has natural advantages.
Attackers can move fast, but defenders have access to something criminals don’t – historical data, organizational context, and the ability to establish baseline behaviors across entire IT environments. AI defense systems can monitor thousands of endpoints simultaneously, correlate subtle anomalies across network traffic, and respond to threats faster than human attackers can ever hope to.
Modern AI security platforms, such as the AI SOC Agent that works like an AI SOC Analyst, have proven this principle in practice. By automating alert triage, investigation, and response processes, these systems process security events at machine speed while maintaining the context and judgment that pure automation lacks.
Defensive AI doesn’t need to be perfect; it just needs to be faster and more persistent than human attackers. When combined with human expertise for strategic oversight, this creates a formidable defensive posture for organizations.
Building AI-Native Security Operations
The Anthropic report underscores how incremental improvements to traditional security tools won’t matter against AI-augmented adversaries. Organizations need AI-native security operations that match the scale, speed, and intelligence of modern AI attacks.
This means leveraging AI agents that autonomously investigate suspicious activities, correlate threat intelligence across multiple sources, and respond to attacks faster than humans can. It requires SOCs that use AI for real-time threat hunting, automated incident response, and continuous vulnerability assessment.
This new approach demands a shift from reactive to predictive security postures. AI defense systems must anticipate attack vectors, identify potential compromises before they fully manifest, and adapt defensive strategies based on emerging threat patterns.
The Anthropic report clearly highlights that attackers don’t wait for a perfect tool. They train themselves on existing capabilities and can cause damage every day, even if the AI revolution were to stop. Organizations cannot afford to be more cautious than their adversaries.
The AI cybersecurity arms race is already here. The question isn’t whether organizations will face AI-augmented attacks, but if they’ll be prepared when those attacks happen.
Success demands embracing AI as a core component of security operations, not an experimental add-on. It means leveraging AI agents that operate autonomously while maintaining human oversight for strategic decisions. Most importantly, it requires matching the speed of adoption that attackers have already achieved.
The cybercriminals highlighted in the Anthropic report represent the new threat landscape. Their success demonstrates the magnitude of the challenge and the urgency of the needed response. In this new reality, the organizations that survive and thrive will be those that adopt AI-native security operations with the same speed and determination that their adversaries have already demonstrated.
The race is on. The question is whether defenders will move fast enough to win it.
AI Insights
Westwood joins 40 other municipalities using artificial intelligence to examine roads

The borough of Westwood has started using artificial intelligence to determine if their roads need to be repaired or repaved.
It’s an effort by elected officials as a way to save money on manpower and to be sure that all decisions are objective.
Instead of relying on his own two eyes, the superintendent of Public Works is now allowing an app on his phone to record images of Westwood’s roads as he drives them.
Data on every pothole, faded striping and 13 other types of road defects are collected by the app.
The road management app is from a New Jersey company called Vialytics.
Westwood is one of 40 municipalities in the state to use the software, which also rates road quality and provides easy to use data.
“Now you’re relying on the facts here not just my opinion of the street. It’s helped me a lot already. A lot of times you’ll have residents who just want their street paved. Now I can go back to people and say there’s nothing wrong with your street that it needs to be repaved,” said Rick Woods, superintendent of Public Works.
Superintendent Woods says he can even create work orders from the road as soon as a defect is detected.
Borough officials believe the Vialytics app will pay for itself in manpower and offer elected officials objective data when determining how to use taxpayer dollars for roads.
AI Insights
How AI Simulations Match Up to Real Students—and Why It Matters

AI-simulated students consistently outperform real students—and make different kinds of mistakes—in math and reading comprehension, according to a new study.
That could cause problems for teachers, who increasingly use general prompt-based artificial intelligence platforms to save time on daily instructional tasks. Sixty percent of K-12 teachers report using AI in the classroom, according to a June Gallup study, with more than 1 in 4 regularly using the tools to generate quizzes and more than 1 in 5 using AI for tutoring programs. Even when prompted to cater to students of a particular grade or ability level, the findings suggest underlying large language models may create inaccurate portrayals of how real students think and learn.
“We were interested in finding out whether we can actually trust the models when we try to simulate any specific types of students. What we are showing is that the answer is in many cases, no,” said Ekaterina Kochmar, co-author of the study and an assistant professor of natural-language processing at the Mohamed bin Zayed University of Artificial Intelligence in the United Arab Emirates, the first university dedicated entirely to AI research.
How the study tested AI “students”
Kochmar and her colleagues prompted 11 large language models (LLMs), including those underlying generative AI platforms like ChatGPT, Qwen, and SocraticLM, to answer 249 mathematics and 240 reading grade-level questions on the National Assessment of Educational Progress in reading and math using the persona of typical students in grades 4, 8, and 12. The researchers then compared the models’ answers to NAEP’s database of real student answers to the same questions to measure how closely AI-simulated students’ answers mirrored those of actual student performance.
The LLMs that underlie AI tools do not think but generate the most likely next word in a given context based on massive pools of training data, which might include real test items, state standards, and transcripts of lessons. By and large, Kochmar said, the models are trained to favor correct answers.
“In any context, for any task, [LLMs] are actually much more strongly primed to answer it correctly,” Kochmar said. “That’s why it’s very difficult to force them to answer anything incorrectly. And we’re asking them to not only answer incorrectly but fall in a particular pattern—and then it becomes even harder.”
For example, while a student might miss a math problem because he misunderstood the order of operations, an LLM would have to be specifically prompted to misuse the order of operations.
None of the tested LLMs created simulated students that aligned with real students’ math and reading performance in 4th, 8th, or 12th grades. Without specific grade-level prompts, the proxy students performed significantly higher than real students in both math and reading—scoring, for example, 33 percentile points to 40 percentile points higher than the average real student in reading.
Kochmar also found that simulated students “fail in different ways than humans.” While specifying specific grades in prompts did make simulated students perform more like real students with regard to how many answers they got correct, they did not necessarily follow patterns related to particular human misconceptions, such as order of operations in math.
The researchers found no prompt that fully aligned simulated and real student answers across different grades and models.
What this means for teachers
For educators, the findings highlight both the potential and the pitfalls of relying on AI-simulated students, underscoring the need for careful use and professional judgment.
“When you think about what a model knows, these models have probably read every book about pedagogy, but that doesn’t mean that they know how to make choices about how to teach,” said Robbie Torney, the senior director of AI programs at Common Sense Media, which studies children and technology.
Torney was not connected to the current study, but last month released a study of AI-based teaching assistants that similarly found alignment problems. AI models produce answers based on their training data, not professional expertise, he said. “That might not be bad per se, but it might also not be a good fit for your learners, for your curriculum, and it might not be a good fit for the type of conceptual knowledge that you’re trying to develop.”
This doesn’t mean teachers shouldn’t use general prompt-based AI to develop tools or tests for their classes, the researchers said, but that educators need to prompt AI carefully and use their own professional judgement when deciding if AI outputs match their students’ needs.
“The great advantage of the current technologies is that it is relatively easy to use, so anyone can access [them],” Kochmar said. “It’s just at this point, I would not trust the models out of the box to mimic students’ actual ability to solve tasks at a specific level.”
Torney said educators need more training to understand not just the basics of how to use AI tools but their underlying infrastructure. “To be able to optimize use of these tools, it’s really important for educators to recognize what they don’t have, so that they can provide some of those things to the models and use their professional judgement.”
-
Business2 weeks ago
The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial
-
Tools & Platforms4 weeks ago
Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks
-
Ethics & Policy2 months ago
SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية
-
Events & Conferences4 months ago
Journey to 1000 models: Scaling Instagram’s recommendation system
-
Jobs & Careers2 months ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Podcasts & Talks2 months ago
Happy 4th of July! 🎆 Made with Veo 3 in Gemini
-
Education2 months ago
Macron says UK and France have duty to tackle illegal migration ‘with humanity, solidarity and firmness’ – UK politics live | Politics
-
Education2 months ago
VEX Robotics launches AI-powered classroom robotics system
-
Funding & Business2 months ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Podcasts & Talks2 months ago
OpenAI 🤝 @teamganassi