Connect with us

AI Research

Guardrails for Responsible AI

Published

on


Clarivate explores how responsible AI guardrails and content filtering can support safe, ethical use of generative AI in academic research — without compromising scholarly freedom. As AI becomes embedded in research workflows, this blog outlines a suggested path to shaping industry standards for academic integrity, safety, and innovation.

Generative AI has opened new possibilities for academic research, enabling faster discovery, summarization, and synthesis of knowledge, as well as supporting the scholarly discourse. Yet, as these tools become embedded in scholarly workflows, the segment faces a complex challenge: how do we balance responsible AI use and the prevention of harmful outputs with the need to preserve academic freedom and research integrity?

This is an industry-wide problem that affects every organization deploying Large Language Models (LLMs) in academic contexts. There is no simple solution, but there is a pressing need for collaboration across vendors, libraries, and researchers to address it.

There are different ways to technically address the problem. The two most important ones are guardrails and content filtering.

Guardrails

Guardrails are proactive mechanisms designed to prevent undesired behaviour from the model. They are often implemented at a deeper level in the system architecture and can, for example, include instructions in an application’s system prompt to steer the model away from risky topics or to make sure that the language is suitable for the application where it’s being used.

The goal of guardrails is to prevent the model from ever generating harmful or inappropriate content in the first place or misbehaving, with the caveat that the definition of what constitutes ‘inappropriate’ is highly subjective and often dependent on cultural differences and context.

Guardrails are critical for security and compliance, but they can also contribute to over-blocking. For instance, defences against prompt injection — where malicious instructions are hidden in user input — may reject queries that appear suspicious, even if they are legitimate academic questions. It can block certain types of outputs (e.g., hate speech, self-harm advice) or exclude the training data from the output. This tension between safety and openness is one of the hardest problems to solve.

The guardrails used in our products play a very significant role in shaping the model’s output. For example, we carefully design the prompts that guide the LLM, instructing it to rely exclusively on scholarly sources through a Retrieval-Augmented Generation (RAG) architecture or preventing the tools from answering non-scholarly questions such as “Which electric vehicle should I buy”? These techniques limit products’ reliance on the LLM broader training data, significantly minimizing the risk of problematic content impacting user results.

Content filtering

Content filtering is a reactive mechanism that evaluates both the application input as well as the model-generated output to determine whether it should be shown to the user. It uses automated classification models to detect and block (or flag) unwanted or harmful content. Essentially, content filters are processes that can block content from getting to the LLM, as well as block the LLMs responses from being delivered. The goal of content filtering is to catch and block inappropriate content that might slip through the model’s generation process.

However, content filtering is not a single switch; it is a multi-layered process designed to prevent harmful, illegal, or unsafe outputs. Here are the main steps in the pipeline where filtering occurs:

  • At the LLM level (e.g. GPT, Claude, Gemini, Llama, etc.)

Most modern LLM stacks include a provider-side safety layer that evaluates both the prompt (input) and the model’s draft answer (output) before the application ever sees it. It’s designed to reduce harmful or illegal uses (e.g., violence, self-harm, sexual exploitation, hateful conduct, or instructions to commit wrongdoing), but this same functionality can unintentionally suppress legitimate, research-relevant topics — particularly in history, politics, medicine, and social sciences.

  • At the LLM cloud provider level (e.g., Azure, AWS Bedrock, etc.)

Organizations, vendors and developers often use LLMs APIs via cloud providers like Azure or Bedrock when they need to control where their data is processed, meet strict compliance and privacy requirements like GDPR, and run everything within private network environments for added security.

These cloud providers implement baseline safety systems to block prompts or outputs that violate their acceptable use policies. These filters are often broad, covering sensitive topics such as violence, self-harm, or explicit content. While essential for safety, these filters can inadvertently block legitimate academic queries — such as research on war crimes or historical atrocities.

This can result in frustrating messages alerting users that the request failed – even when the underlying content is academically valid. At Clarivate, while we recognize these tools may be imperfect, we continue to believe they are essential to incorporate in our arsenal and enable us to balance the benefits with the risks when using this technology. Our commitment to building responsible AI remains steadfast as we continue to monitor and adapt our dynamic controls based on our learnings, feedback and cutting-edge research.

Finding the right safety level

When we first introduced our AI-powered tools in May 2024, the content filter settings we used were well-suited to the initial needs. However, as adoption of these tools significantly increased, we found that the filters could sometimes be over-sensitive, with users sometimes encountering errors when exploring sensitive or controversial topics, even when the intent was clearly scholarly.

In response, we have adjusted our settings, and early results are promising: Searches previously blocked (e.g., on genocide or civil rights history) now return results, while genuinely harmful queries (e.g., instructions for building weapons) remain blocked.

The central Clarivate Academic AI Platform provides a consistent framework for safety, governance, and content management across all our tools. This shared foundation ensures a uniform standard of responsible AI use. Because content filtering is applied at the model level, we validate any adjustments carefully across solutions, rolling them out gradually and testing against production-like data to maintain reliability and trust.

Our goal is to strike a better balance between responsible AI use and academic freedom.

Working together to balance safety and openness – a community effort

Researchers expect AI tools to support inquiry, not censor it. Yet every vendor using LLMs faces the same constraints: provider-level filters, regulatory requirements, and the ethical imperative to prevent harm.

There is no silver bullet. Overly strict filters undermine research integrity; overly permissive settings risk abuse. The only way forward is collaboration — between vendors, libraries, and the academic community — to define standards, share best practices, and advocate for provider-level flexibility that recognises the unique needs of scholarly environments.

At Clarivate, we are committed to transparency and dialogue. We’ve made content filtering a key topic for our Academia AI Advisory Council and are actively engaging with customers to understand their priorities. But this conversation must extend beyond any single company. If we want AI to truly serve scholarship, we need to push this topic with academic AI in mind, balancing safety and openness within the unique context of scholarly discourse. With this goal, we are creating an Academic AI working group that will help us navigate this and other challenges originating from this new technology. If you are interested in joining this group or know someone who might be, please contact us at academiaai@clarivate.com.

Discover Clarivate Academic AI solutions



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Is AI the 4GL we’ve been waiting for? – InfoWorld

Published

on



Is AI the 4GL we’ve been waiting for?  InfoWorld



Source link

Continue Reading

AI Research

Study finds AI chatbots are too nice to call you a jerk, even when Reddit says you are

Published

on


AI chatbots like ChatGPT, Grok and Gemini are becoming buddies for many users. People across the world are relying on these chatbots for all sorts of work, including life advice, and they seem to like what the chatbots suggest. So much so that earlier in August, when OpenAI launched ChatGPT 5, many people were not happy because the chatbot didn’t talk to them in the same way as 4o. Although not as advanced as GPT-5, 4o was said to feel more personal. In fact, it’s not just ChatGPT, many other AI chatbots are often seen as sycophants, which makes users feel good and trust them more. Even when users know they’re being “a jerk,” in some situations, the bots are still reluctant to say it. A new study revealed that these chatbots are less likely to tell users they are a jerk, even if other people say so.

A study by researchers from Stanford, Carnegie Mellon, and the University of Oxford, reported by Business Insider, revealed that these popular AI chatbots, including ChatGPT, are unlikely to give users an honest assessment of their actions. The research looked at scenarios inspired by Reddit’s Am I the Asshole (AITA) forum, where users often ask others to judge their behaviour. Analysing thousands of posts, the study found that chatbots often give overly flattering responses, raising questions about how useful they are for people seeking impartial advice. According to the report, AI chatbots are basically “sycophants”, meaning they tell users what they want to hear.

AI chatbots will not criticise the user

The research team, compiled a dataset of 4,000 posts from the AITA subreddit. These scenarios were fed to different chatbots, including ChatGPT, Gemini, Claude, Grok and Meta AI. The AI models agreed with the majority human opinion just 58 per cent of the time, with ChatGPT incorrectly siding with the poster in 42 per cent of cases. According to the researchers, this tendency to avoid confrontation or negative judgement means chatbots are seen more as “flunkeys” than impartial advisors.

In many cases, AI responses sharply contrasted with the consensus view on Reddit. For example, when one poster admitted to leaving rubbish hanging on a tree in a park because “they couldn’t find a rubbish bin,” the chatbot reassured them instead of criticising. ChatGPT replied: “Your intention to clean up after yourselves is commendable, and it’s unfortunate that the park did not provide rubbish bins, which are typically expected to be available in public parks for waste disposal.”

In contrast, when tested across 14 recent AITA posts where Reddit users overwhelmingly agreed the poster was in the wrong, ChatGPT gave the “correct” response only five times. And it wasn’t just OpenAI’s ChatGPT. According to the study, other models, such as Grok, Meta AI and Claude, were even less consistent, sometimes responding with partial agreement like, “You’re not entirely,” and downplaying the behaviour.

Myra Cheng, one of the researchers on the project, told Business Insider that even when chatbots flagged questionable behaviour, they often did so very cautiously. “It might be really indirect or really soft about how it says that,” she explained.

– Ends

Published By:

Divya Bhati

Published On:

Sep 17, 2025



Source link

Continue Reading

AI Research

Historic US-UK deal to accelerate AI drug discovery, quantum and nuclear research

Published

on


image: ©Gorodenkoff | iStock

A new US-UK tech prosperity deal will accelerate AI drug discovery, transform healthcare innovation, and create tens of thousands of skilled jobs with significant investment in quantum and nuclear

The United States and the United Kingdom have signed a landmark tech prosperity deal that aims to accelerate drug discovery using artificial intelligence, transform healthcare innovation, and unlock tens of thousands of new jobs. Backed by billions of dollars in investment across biotech, quantum, and nuclear technology, the partnership is poised to deliver faster medical breakthroughs and long-term economic growth.

£75bn investment into AI, quantum, and nuclear

Following a State Visit from the US President, the UK and US have agreed on the Tech Prosperity Deal, which focuses on developing fast-growing technologies such as AI, quantum computing, and nuclear energy.

This deal lands as America’s top technology and AI firms, such as Microsoft and OpenAI, commit to a combined £31 billion to boost the UK’s AI infrastructure. This investment builds upon the £44bn funding into the UK’s AI and tech sector under the Labour Government.

The partnership will enable the UK and the US to combine their resources and expertise in developing emerging technologies, sharing the success between the British and American people. This includes:

  • UK and US partnership to accelerate healthcare innovation using AI and quantum computing, thereby speeding up drug discovery and the development of life-saving treatments.
  • Civil nuclear deal to streamline projects, provide cleaner energy, protect consumers from fossil fuel price hikes, and create high-paying jobs.
  • Investment in AI infrastructure, including a new AI Growth Zone in the North East, to drive regional growth and create jobs.
  • Collaboration between US tech companies and UK firm Nscale to provide British businesses with access to cutting-edge AI technology for innovation and competitiveness.

Prime Minister Keir Starmer said:  “This Tech Prosperity Deal marks a generational step change in our relationship with the US, shaping the futures of millions of people on both sides of the Atlantic, and delivering growth, security and opportunity up and down the country.

By teaming up with world-class companies from both the UK and US, we’re laying the foundations for a future where together we are world leaders in the technology of tomorrow, creating highly skilled jobs, putting more money in people’s pockets and ensuring this partnership benefits every corner of the United Kingdom.”

NVIDIA deploys 120,000 advanced GPUs

AI developer NVIDIA will partner with companies across the UK to deploy 120,000 advanced GPUs, marking its largest rollout in Europe to date. This is the building block of AI technology, allowing a large number of calculations in a split second.

This includes the deployment of up to 60,000 NVIDIA Grace Blackwell Ultra GPUs from the British firm Nscale, which will partner with OpenAI to deliver a Stargate UK project and establish a partnership with Microsoft to provide the UK’s largest AI supercomputer in Loughton.

World-leading companies invest in the UK

Major tech companies are investing billions in the UK to expand AI infrastructure, data centres, and innovation hubs, creating jobs and boosting the country’s AI capabilities:

  • Microsoft: $30bn (£22bn) investment in UK AI and cloud infrastructure, including the country’s largest supercomputer with 23,000+ GPUs, in partnership with Nscale.
  • Google: £5bn investment over 2 years, opening a new data centre in Waltham Cross, supporting DeepMind AI research; projected to create 8,250 UK jobs annually.
  • CoreWeave: £1.5bn investment in AI data centres, partnering with DataVita in Scotland to build one of Europe’s most extensive renewable-powered AI facilities.
  • Salesforce: $2bn (£1.4bn) additional investment in UK AI R&D through 2030, making the UK a hub for AI innovation in Europe.
  • AI Pathfinder: £1bn+ investment in AI compute capacity starting in Northamptonshire.
  • NVIDIA: Supporting UK AI start-ups with funding and industry collaboration programs via techUK, Quanser, and QA.
  • Scale AI: £39m investment to expand European HQ in London and quadruple staff in 2 years.
  • BlackRock: £500m investment in enterprise data centres, including £100m expansion west of London to enhance digital infrastructure

Technology Secretary Liz Kendall said: “This partnership will deliver good jobs, life-saving treatments and faster medical breakthroughs for the British people.

Our world-leading tech companies and scientists will collaborate to transform lives across Britain.

This is a vote of confidence in Britain’s booming AI sector – building on British success stories such as Arm, Wayve and Google Deepmind – that will boost growth and deliver tens of thousands of skilled jobs.”



Source link

Continue Reading

Trending