AI Research

Pangea Launches New Research Division and Red Teaming Services to Combat Growing AI Security Attacks

Published

2 months ago

July 15, 2025

New offerings led by champion AI ethical hacker include specialized security assessments and industry’s most comprehensive AI Attack Taxonomy

PALO ALTO, Calif., July 15, 2025 /PRNewswire/ — Pangea, a leading provider of AI security guardrails, today announced the launch of Pangea Labs, a dedicated research division, and specialized Red Teaming services to help organizations defend against sophisticated AI attacks. The dual launch includes the debut of the industry’s most comprehensive AI Prompt Injection Attack Taxonomy—a living framework that maps prompt injection methods and countermeasures developed by the new research team.

Introducing Pangea’s AI Research Division, Pangea Labs

Under the guidance of Chief Product Officer Rob Truesdell, Pangea Labs will research emerging AI attack techniques and conduct red team exercises to identify vulnerabilities in AI systems before malicious actors can exploit them. The division will translate cutting-edge research, such as Pangea’s Prompt Injection Challenge research report, into actionable security enhancements and services. The team’s focus areas include:

Advanced prompt injection techniques and countermeasures
AI model manipulation and jailbreaking methods
Enterprise AI security best practices
Emerging threat intelligence and attack pattern analysis

Joining Pangea Labs as the first AI Red Team Specialist is Joey Melo, an ethical hacker and professional penetration tester who distinguished himself as the only contestant to successfully escape all three virtual rooms in Pangea’s 2025 Prompt Injection Challenge. Melo holds multiple offensive security certifications including BSCP, OSCP, and OSCE3, and recently achieved 100% completion in the HackAPrompt 2.0 competition, successfully jailbreaking all 39 AI security challenges across multiple models.

Melo joins Dr. Jim Hoagland, whose years-long foundational research has been instrumental in developing Pangea’s comprehensive understanding of AI attack vectors.

Pangea Red Teaming Services Now Available

Building on the expertise of Pangea Labs, Pangea now offers specialized AI Red Teaming services that go beyond traditional penetration testing. These comprehensive security assessments simulate malicious cyberattacks specifically targeting AI systems, helping organizations identify vulnerabilities, assess the effectiveness of security controls, and improve incident response capabilities.

Unlike standard security testing, Pangea’s Red Teaming offering employs a broader scope, mimicking real-world attack scenarios and adversary tactics that specifically target AI implementations. The services leverage the research-backed methodologies developed by Dr. Hoagland and the practical attack expertise demonstrated by Melo, providing organizations with comprehensive visibility into their AI security posture, using the most current threat intelligence available.

“As generative AI becomes deeply embedded in enterprise workflows, the attack surface is expanding exponentially,” said Oliver Friedrichs, Founder and CEO of Pangea. “The launch of Pangea Labs alongside our Red Teaming services represents our commitment to staying ahead of these threats through rigorous research and real-world attack simulation. Our research team’s proven ability to think like an attacker—combined with our platform’s defensive capabilities—creates an unmatched advantage for our customers’ security postures.”

New AI Attack Taxonomy

Pangea’s newly published AI Prompt Injection Attack Taxonomy represents the most up-to-date classification system available, providing security teams with a comprehensive roadmap of attack vectors and defensive strategies. Built on Dr. Hoagland’s extensive research foundation and enhanced by Joey Melo’s practical attack expertise, this living framework will be continuously updated as new threats emerge, ensuring organizations stay ahead of evolving AI security risks.

“Traditional security frameworks weren’t designed for the unique challenges of AI systems,” said Rob Truesdell, Pangea’s Chief Product Officer. “Our taxonomy provides teams with the structured knowledge they need to identify vulnerabilities before attackers do. By understanding the full spectrum of AI attack methods, development teams can build more resilient systems from the ground up.”

Pangea was recently recognized by 150+ CISOs as a top cybersecurity startup and already provides the industry’s most comprehensive protection against AI attacks, helping organizations implement robust security controls across their AI implementations.

For more information, visit pangea.cloud.

About Pangea

Pangea’s AI Guardrail Platform empowers security teams to ship secure AI applications quickly and protect workforce AI use with the industry’s most comprehensive set of AI guardrails, easily deployed via gateways or into applications with just a few lines of code. Pangea stops LLM security threats ranging from prompt injection to sensitive data leakage, covering 8 out of 10 OWASP Top Ten Risks for LLM apps, while accelerating engineering velocity and unlocking AI runtime visibility and control for security teams.

Media Contact: Growth Stack Media | 415-574-0738 | [email protected]

SOURCE Pangea Cyber

Source link

AI Research

Guardrails for Responsible AI

Published

19 minutes ago

September 17, 2025

Cristina Blanca-Sancho

Clarivate explores how responsible AI guardrails and content filtering can support safe, ethical use of generative AI in academic research — without compromising scholarly freedom. As AI becomes embedded in research workflows, this blog outlines a suggested path to shaping industry standards for academic integrity, safety, and innovation.

Generative AI has opened new possibilities for academic research, enabling faster discovery, summarization, and synthesis of knowledge, as well as supporting the scholarly discourse. Yet, as these tools become embedded in scholarly workflows, the segment faces a complex challenge: how do we balance responsible AI use and the prevention of harmful outputs with the need to preserve academic freedom and research integrity?

This is an industry-wide problem that affects every organization deploying Large Language Models (LLMs) in academic contexts. There is no simple solution, but there is a pressing need for collaboration across vendors, libraries, and researchers to address it.

There are different ways to technically address the problem. The two most important ones are guardrails and content filtering.

Guardrails

Guardrails are proactive mechanisms designed to prevent undesired behaviour from the model. They are often implemented at a deeper level in the system architecture and can, for example, include instructions in an application’s system prompt to steer the model away from risky topics or to make sure that the language is suitable for the application where it’s being used.

The goal of guardrails is to prevent the model from ever generating harmful or inappropriate content in the first place or misbehaving, with the caveat that the definition of what constitutes ‘inappropriate’ is highly subjective and often dependent on cultural differences and context.

Guardrails are critical for security and compliance, but they can also contribute to over-blocking. For instance, defences against prompt injection — where malicious instructions are hidden in user input — may reject queries that appear suspicious, even if they are legitimate academic questions. It can block certain types of outputs (e.g., hate speech, self-harm advice) or exclude the training data from the output. This tension between safety and openness is one of the hardest problems to solve.

The guardrails used in our products play a very significant role in shaping the model’s output. For example, we carefully design the prompts that guide the LLM, instructing it to rely exclusively on scholarly sources through a Retrieval-Augmented Generation (RAG) architecture or preventing the tools from answering non-scholarly questions such as “Which electric vehicle should I buy”? These techniques limit products’ reliance on the LLM broader training data, significantly minimizing the risk of problematic content impacting user results.

Content filtering

Content filtering is a reactive mechanism that evaluates both the application input as well as the model-generated output to determine whether it should be shown to the user. It uses automated classification models to detect and block (or flag) unwanted or harmful content. Essentially, content filters are processes that can block content from getting to the LLM, as well as block the LLMs responses from being delivered. The goal of content filtering is to catch and block inappropriate content that might slip through the model’s generation process.

However, content filtering is not a single switch; it is a multi-layered process designed to prevent harmful, illegal, or unsafe outputs. Here are the main steps in the pipeline where filtering occurs:

At the LLM level (e.g. GPT, Claude, Gemini, Llama, etc.)

Most modern LLM stacks include a provider-side safety layer that evaluates both the prompt (input) and the model’s draft answer (output) before the application ever sees it. It’s designed to reduce harmful or illegal uses (e.g., violence, self-harm, sexual exploitation, hateful conduct, or instructions to commit wrongdoing), but this same functionality can unintentionally suppress legitimate, research-relevant topics — particularly in history, politics, medicine, and social sciences.

At the LLM cloud provider level (e.g., Azure, AWS Bedrock, etc.)

Organizations, vendors and developers often use LLMs APIs via cloud providers like Azure or Bedrock when they need to control where their data is processed, meet strict compliance and privacy requirements like GDPR, and run everything within private network environments for added security.

These cloud providers implement baseline safety systems to block prompts or outputs that violate their acceptable use policies. These filters are often broad, covering sensitive topics such as violence, self-harm, or explicit content. While essential for safety, these filters can inadvertently block legitimate academic queries — such as research on war crimes or historical atrocities.

This can result in frustrating messages alerting users that the request failed – even when the underlying content is academically valid. At Clarivate, while we recognize these tools may be imperfect, we continue to believe they are essential to incorporate in our arsenal and enable us to balance the benefits with the risks when using this technology. Our commitment to building responsible AI remains steadfast as we continue to monitor and adapt our dynamic controls based on our learnings, feedback and cutting-edge research.

Finding the right safety level

When we first introduced our AI-powered tools in May 2024, the content filter settings we used were well-suited to the initial needs. However, as adoption of these tools significantly increased, we found that the filters could sometimes be over-sensitive, with users sometimes encountering errors when exploring sensitive or controversial topics, even when the intent was clearly scholarly.

In response, we have adjusted our settings, and early results are promising: Searches previously blocked (e.g., on genocide or civil rights history) now return results, while genuinely harmful queries (e.g., instructions for building weapons) remain blocked.

The central Clarivate Academic AI Platform provides a consistent framework for safety, governance, and content management across all our tools. This shared foundation ensures a uniform standard of responsible AI use. Because content filtering is applied at the model level, we validate any adjustments carefully across solutions, rolling them out gradually and testing against production-like data to maintain reliability and trust.

Our goal is to strike a better balance between responsible AI use and academic freedom.

Working together to balance safety and openness – a community effort

Researchers expect AI tools to support inquiry, not censor it. Yet every vendor using LLMs faces the same constraints: provider-level filters, regulatory requirements, and the ethical imperative to prevent harm.

There is no silver bullet. Overly strict filters undermine research integrity; overly permissive settings risk abuse. The only way forward is collaboration — between vendors, libraries, and the academic community — to define standards, share best practices, and advocate for provider-level flexibility that recognises the unique needs of scholarly environments.

At Clarivate, we are committed to transparency and dialogue. We’ve made content filtering a key topic for our Academia AI Advisory Council and are actively engaging with customers to understand their priorities. But this conversation must extend beyond any single company. If we want AI to truly serve scholarship, we need to push this topic with academic AI in mind, balancing safety and openness within the unique context of scholarly discourse. With this goal, we are creating an Academic AI working group that will help us navigate this and other challenges originating from this new technology. If you are interested in joining this group or know someone who might be, please contact us at academiaai@clarivate.com.

Discover Clarivate Academic AI solutions

Source link

AI Research

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment – The Standard (HK)

Published

3 hours ago

September 17, 2025

The Standard 英文虎報

(Policy Address 2025) HK earmarks HK$3B for AI research and talent recruitment The Standard (HK)

Source link

AI Research

Spatially-Aware Image Focus for Visual Reasoning

Published

3 hours ago

September 17, 2025

Zhangquan Chen, Ruihui Zhao, Chuwei Luo, Mingze Sun, Xinlei Yu, Yangyang Kang, Ruqi Huang

[Submitted on 8 Aug 2025 (v1), last revised 16 Sep 2025 (this version, v4)]

View a PDF of the paper titled SIFThinker: Spatially-Aware Image Focus for Visual Reasoning, by Zhangquan Chen and 6 other authors

View PDF
HTML (experimental)

Abstract:Current multimodal large language models (MLLMs) still face significant challenges in complex visual tasks (e.g., spatial understanding, fine-grained perception). Prior methods have tried to incorporate visual reasoning, however, they fail to leverage attention correction with spatial cues to iteratively refine their focus on prompt-relevant regions. In this paper, we introduce SIFThinker, a spatially-aware “think-with-images” framework that mimics human visual perception. Specifically, SIFThinker enables attention correcting and image region focusing by interleaving depth-enhanced bounding boxes and natural language. Our contributions are twofold: First, we introduce a reverse-expansion-forward-inference strategy that facilitates the generation of interleaved image-text chains of thought for process-level supervision, which in turn leads to the construction of the SIF-50K dataset. Besides, we propose GRPO-SIF, a reinforced training paradigm that integrates depth-informed visual grounding into a unified reasoning pipeline, teaching the model to dynamically correct and focus on prompt-relevant regions. Extensive experiments demonstrate that SIFThinker outperforms state-of-the-art methods in spatial understanding and fine-grained visual perception, while maintaining strong general capabilities, highlighting the effectiveness of our method. Code: this https URL.

Submission history

From: Zhangquan Chen [view email]
[v1]
Fri, 8 Aug 2025 12:26:20 UTC (5,223 KB)
[v2]
Thu, 14 Aug 2025 10:34:22 UTC (5,223 KB)
[v3]
Sun, 24 Aug 2025 13:04:46 UTC (5,223 KB)
[v4]
Tue, 16 Sep 2025 09:40:13 UTC (5,223 KB)

Source link