AI agents are hitting a liability wall. Mixus has a plan to overcome it using human overseers on high-risk workflows

Published

1 week ago

June 28, 2025

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

While enterprises face the challenges of deploying AI agents in critical applications, a new, more pragmatic model is emerging that puts humans back in control as a strategic safeguard against AI failure.

One such example is Mixus, a platform that uses a “colleague-in-the-loop” approach to make AI agents reliable for mission-critical work.

This approach is a response to the growing evidence that fully autonomous agents are a high-stakes gamble.

The high cost of unchecked AI

The problem of AI hallucinations has become a tangible risk as companies explore AI applications. In a recent incident, the AI-powered code editor Cursor saw its own support bot invent a fake policy restricting subscriptions, sparking a wave of public customer cancellations.

Similarly, the fintech company Klarna famously reversed course on replacing customer service agents with AI after admitting the move resulted in lower quality. In a more alarming case, New York City’s AI-powered business chatbot advised entrepreneurs to engage in illegal practices, highlighting the catastrophic compliance risks of unmonitored agents.

These incidents are symptoms of a larger capability gap. According to a May 2025 Salesforce research paper, today’s leading agents succeed only 58% of the time on single-step tasks and just 35% of the time on multi-step ones, highlighting “a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios.”

The colleague-in-the-loop model

To bridge this gap, a new approach focuses on structured human oversight. “An AI agent should act at your direction and on your behalf,” Mixus co-founder Elliot Katz told VentureBeat. “But without built-in organizational oversight, fully autonomous agents often create more problems than they solve.”

This philosophy underpins Mixus’s colleague-in-the-loop model, which embeds human verification directly into automated workflows. For example, a large retailer might receive weekly reports from thousands of stores that contain critical operational data (e.g., sales volumes, labor hours, productivity ratios, compensation requests from headquarters). Human analysts must spend hours manually reviewing the data and making decisions based on heuristics. With Mixus, the AI agent automates the heavy lifting, analyzing complex patterns and flagging anomalies like unusually high salary requests or productivity outliers.

For high-stakes decisions like payment authorizations or policy violations — workflows defined by a human user as “high-risk” — the agent pauses and requires human approval before proceeding. The division of labor between AI and humans has been integrated into the agent creation process.

“This approach means humans only get involved when their expertise actually adds value — typically the critical 5-10% of decisions that could have significant impact — while the remaining 90-95% of routine tasks flow through automatically,” Katz said. “You get the speed of full automation for standard operations, but human oversight kicks in precisely when context, judgment, and accountability matter most.”

In a demo that the Mixus team showed to VentureBeat, creating an agent is an intuitive process that can be done with plain-text instructions. To build a fact-checking agent for reporters, for example, co-founder Shai Magzimof simply described the multi-step process in natural language and instructed the platform to embed human verification steps with specific thresholds, such as when a claim is high-risk and can result in reputational damage or legal consequences.

One of the platform’s core strengths is its integrations with tools like Google Drive, email, and Slack, allowing enterprise users to bring their own data sources into workflows and interact with agents directly from their communication platform of choice, without having to switch contexts or learn a new interface (for example, the fact-checking agent was instructed to send approval requests to the editor’s email).

The platform’s integration capabilities extend further to meet specific enterprise needs. Mixus supports the Model Context Protocol (MCP), which enables businesses to connect agents to their bespoke tools and APIs, avoiding the need to reinvent the wheel for existing internal systems. Combined with integrations for other enterprise software like Jira and Salesforce, this allows agents to perform complex, cross-platform tasks, such as checking on open engineering tickets and reporting the status back to a manager on Slack.

Human oversight as a strategic multiplier

The enterprise AI space is currently undergoing a reality check as companies move from experimentation to production. The consensus among many industry leaders is that humans in the loop are a practical necessity for agents to perform reliably.

AI Agents will likely follow a self driving trajectory, where you need a human in the loop for a long tail of tasks for a while. The big difference is we’ll get a growing number of autonomous agents along the way, where full self driving is an all or nothing proposition. https://t.co/5dR7cGS7jn— Aaron Levie (@levie) June 20, 2025

Mixus’s collaborative model changes the economics of scaling AI. Mixus predicts that by 2030, agent deployment may grow 1000x and each human overseer will become 50x more efficient as AI agents become more reliable. But the total need for human oversight will still grow.

“Each human overseer manages exponentially more AI work over time, but you still need more total oversight as AI deployment explodes across your organization,” Katz said.

For enterprise leaders, this means human skills will evolve rather than disappear. Instead of being replaced by AI, experts will be promoted to roles where they orchestrate fleets of AI agents and handle the high-stakes decisions flagged for their review.

In this framework, building a strong human oversight function becomes a competitive advantage, allowing companies to deploy AI more aggressively and safely than their rivals.

“Companies that master this multiplication will dominate their industries, while those chasing full automation will struggle with reliability, compliance, and trust,” Katz said.

Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

Funding & Business

UK’s Schroder Family Facing Defining Moment for City Dynasty

Published

25 minutes ago

July 7, 2025

Leonard Kehnscherper

A once-unimaginable sale or breakup of the clan’s 221-year-old City of London firm is looking more and more feasible to onlookers.

Source link

Tools & Platforms

Committee Encourages Georgia Courts To Adopt, Govern AI

Published

32 minutes ago

July 7, 2025

The Editors

By Matt Perez · July 7, 2025, 4:53 PM EDT

Georgia should begin pilot programs tailored to specific use cases of artificial intelligence across each class of court or jurisdiction, an ad hoc committee established by retired Chief Justice Michael P….

Want to continue reading?

Unlock these benefits today when you sign-up for a FREE 7-day trial:

Gain a competitive edge with exclusive data visualization tools to tailor to your practice
Stay informed with daily newsletters and custom alerts across 14+ coverage areas relevant to you
Streamline your business of law needs with integrated news and research in a single destination

Already have an account? Sign In Now

Source link

Books, Courses & Certifications

How INRIX accelerates transportation planning with Amazon Bedrock

Published

45 minutes ago

July 7, 2025

Arunsingh Jeyasingh Jacob

This post is co-written with Shashank Saraogi, Nat Gale, and Durran Kelly from INRIX.

The complexity of modern traffic management extends far beyond mere road monitoring, encompassing massive amounts of data collected worldwide from connected cars, mobile devices, roadway sensors, and major event monitoring systems. For transportation authorities managing urban, suburban, and rural traffic flow, the challenge lies in effectively processing and acting upon this vast network of information. The task requires balancing immediate operational needs, such as real-time traffic redirection during incidents, with strategic long-term planning for improved mobility and safety.

Traditionally, analyzing these complex data patterns and producing actionable insights has been a resource-intensive process requiring extensive collaboration. With recent advances in generative AI, there is an opportunity to transform how we process, understand, and act upon transportation data, enabling more efficient and responsive traffic management systems.

In this post, we partnered with Amazon Web Services (AWS) customer INRIX to demonstrate how Amazon Bedrock can be used to determine the best countermeasures for specific city locations using rich transportation data and how such countermeasures can be automatically visualized in street view images. This approach allows for significant planning acceleration compared to traditional approaches using conceptual drawings.

INRIX pioneered the use of GPS data from connected vehicles for transportation intelligence. For over 20 years, INRIX has been a leader for probe-based connected vehicle and device data and insights, powering automotive, enterprise, and public sector use cases. INRIX’s products range from tickerized datasets that inform investment decisions for the financial services sector to digital twins for the public rights-of-way in the cities of Philadelphia and San Francisco. INRIX was the first company to develop a crowd-sourced traffic network, and they continue to lead in real-time mobility operations.

In June 2024, the State of California’s Department of Transportation (Caltrans) selected INRIX for a proof of concept for a generative AI-powered solution to improve safety for vulnerable road users (VRUs). The problem statement sought to harness the combination of Caltrans’ asset, crash, and points-of-interest (POI) data and INRIX’s 50 petabyte (PB) data lake to anticipate high-risk locations and quickly generate empirically validated safety measures to mitigate the potential for crashes. Trained on real-time and historical data and industry research and manuals, the solution provides a new systemic, safety-based methodology for risk assessment, location prioritization, and project implementation.

Solution overview

INRIX announced INRIX Compass in November 2023. INRIX Compass is an application that harnesses generative AI and INRIX’s 50 PB data lake to solve transportation challenges. This solution uses INRIX Compass countermeasures as the input, AWS serverless architecture, and Amazon Nova Canvas as the image visualizer. Key components include:

Countermeasures generation:
Image visualization
- API Gateway and AWS Lambda process requests from API Gateway and Amazon Bedrock
- Amazon Bedrock with model access to Amazon Nova Canvas provide image generation and in-painting

The following diagram shows the architecture of INRIX Compass.

INRIX Compass for countermeasures

By using INRIX Compass, users can ask natural language queries such as, Where are the top five locations with the highest risk for vulnerable road users? and Can you recommend a suite of proven safety countermeasures at each of these locations? Furthermore, users can probe deeper into the roadway characteristics that contribute to risk factors, and find similar locations in the roadway network that meet those conditions. Behind the scenes, Compass AI uses RAG and Amazon Bedrock powered foundation models (FMs) to query the roadway network to identify and prioritize locations with systemic risk factors and anomalous safety patterns. The solution provides prioritized recommendations for operational and design solutions and countermeasures based on industry knowledge.

The following image shows the interface of INRIX Compass.

Image visualization for countermeasures

The generation of countermeasure suggestions represents the initial phase in transportation planning. Image visualization requires the crucial next step of preparing conceptual drawings. This process has traditionally been time-consuming due to the involvement of multiple specialized teams, including:

Transportation engineers who assess technical feasibility and safety standards
Urban planners who verify alignment with city development goals
Landscape architects who integrate environmental and aesthetic elements
CAD or visualization specialists who create detailed technical drawings
Safety analysts who evaluate the potential impact on road safety
Public works departments who oversee implementation feasibility
Traffic operations teams who assess impact on traffic flow and management

These teams work collaboratively, creating and iteratively refining various visualizations based on feedback from urban designers and other stakeholders. Each iteration cycle typically involves multiple rounds of reviews, adjustments, and approvals, often extending the timeline significantly. The complexity is further amplified by city-specific rules and design requirements, which often necessitate significant customization. Additionally, local regulations, environmental considerations, and community feedback must be incorporated into the design process. Consequently, this lengthy and costly process frequently leads to delays in implementing safety countermeasures. To streamline this challenge, INRIX has pioneered an innovative approach to the visualization phase by using generative AI technology. This prototyped solution enables rapid iteration of conceptual drawings that can be efficiently reviewed by various teams, potentially reducing the design cycle from weeks to days. Moreover, the system incorporates a few-shot learning approach with reference images and carefully crafted prompts, allowing for seamless integration of city-specific requirements into the generated outputs. This approach not only accelerates the design process but also supports consistency across different projects while maintaining compliance with local standards.

The following image shows the congestion insights by INRIX Compass.

Amazon Nova Canvas for conceptual visualizations

INRIX developed and prototyped this solution using Amazon Nova models. Amazon Nova Canvas delivers advanced image processing through text-to-image generation and image-to-image transformation capabilities. The model provides sophisticated controls for adjusting color schemes and manipulating layouts to achieve desired visual outcomes. To promote responsible AI implementation, Amazon Nova Canvas incorporates built-in safety measures, including watermarking and content moderation systems.

The model supports a comprehensive range of image editing operations. These operations encompass basic image generation, object removal from existing images, object replacement within scenes, creation of image variations, and modification of image backgrounds. This versatility makes Amazon Nova Canvas suitable for a wide range of professional applications requiring sophisticated image editing.

The following sample images show an example of countermeasures visualization.

In-painting implementation in Compass AI

Amazon Nova Canvas integrates with INRIX Compass’s existing natural language analytics capabilities. The original Compass system generated text-based countermeasure recommendations based on:

Historical transportation data analysis
Current environmental conditions
User-specified requirements

The INRIX Compass visualization feature specifically uses the image generation and in-painting capabilities of Amazon Nova Canvas. In-painting enables object replacement through two distinct approaches:

A binary mask precisely defines the areas targeted for replacement.
Text prompts identify objects for replacement, allowing the model to interpret and modify the specified elements while maintaining visual coherence with the surrounding image context. This functionality provides seamless integration of new elements while preserving the overall image composition and contextual relevance. The developed interface accommodates both image generation and in-painting approaches, providing comprehensive image editing capabilities.

The implementation follows a two-stage process for visualizing transportation countermeasures. Initially, the system employs image generation functionality to create street-view representations corresponding to specific longitude and latitude coordinates where interventions are proposed. Following the initial image creation, the in-painting capability enables precise placement of countermeasures within the generated street view scene. This sequential approach provides accurate visualization of proposed modifications within the actual geographical context.

An Amazon Bedrock API facilitates image editing and generation through the Amazon Nova Canvas model. The responses contain the generated or modified images in base64 format, which can be decoded and processed for further use in the application. The generative AI capabilities of Amazon Bedrock enable rapid iteration and simultaneous visualization of multiple countermeasures within a single image. RAG implementation can further extend the pipeline’s capabilities by incorporating county-specific regulations, standardized design patterns, and contextual requirements. The integration of these technologies significantly streamlines the countermeasure deployment workflow. Traditional manual visualization processes that previously required extensive time and resources can now be executed efficiently through automated generation and modification. This automation delivers substantial improvements in both time-to-deployment and cost-effectiveness.

Conclusion

The partnership between INRIX and AWS showcases the transformative potential of AI in solving complex transportation challenges. By using Amazon Bedrock FMs, INRIX has turned their massive 50 PB data lake into actionable insights through effective visualization solutions. This post highlighted a single specific transportation use case, but Amazon Bedrock and Amazon Nova power a wide spectrum of applications, from text generation to video creation. The combination of extensive data and advanced AI capabilities continues to pave the way for smarter, more efficient transportation systems worldwide.

For more information, check out the documentation for Amazon Nova Foundation Models, Amazon Bedrock, and INRIX Compass.

About the authors

Arun is a Senior Solutions Architect at AWS, supporting enterprise customers in the Pacific Northwest. He’s passionate about solving business and technology challenges as an AWS customer advocate, with his recent interest being AI strategy. When not at work, Arun enjoys listening to podcasts, going for short trail runs, and spending quality time with his family.

Alicja Kwasniewska, PhD, is an AI leader driving generative AI innovations in enterprise solutions and decision intelligence for customer engagements in North America, advertisement and marketing verticals at AWS. She is recognized among the top 10 women in AI and 100 women in data science. Alicja published in more than 40 peer-reviewed publications. She also serves as a reviewer for top-tier conferences, including ICML,NeurIPS,and ICCV. She advises organizations on AI adoption, bridging research and industry to accelerate real-world AI applications.

Shashank is the VP of Engineering at INRIX, where he leads multiple verticals, including generative AI and traffic. He is passionate about using technology to make roads safer for drivers, bikers, and pedestrians every day. Prior to working at INRIX, he held engineering leadership roles at Amazon and Lyft. Shashank brings deep experience in building impactful products and high-performing teams at scale. Outside of work, he enjoys traveling, listening to music, and spending time with his family.

Nat Gale is the Head of Product at INRIX, where he manages the Safety and Traffic product verticals. Nat leads the development of data products and software that help transportation professionals make smart, more informed decisions. He previously ran the City of Los Angeles’ Vision Zero program and was the Director of Capital Projects and Operations for the City of Hartford, CT.

Durran is a Lead Software Engineer at INRIX, where he designs scalable backend systems and mentors engineers across multiple product lines. With over a decade of experience in software development, he specializes in distributed systems, generative AI, and cloud infrastructure. Durran is passionate about writing clean, maintainable code and sharing best practices with the developer community. Outside of work, he enjoys spending quality time with his family and deepening his Japanese language skills.

Source link