AI Research

Gemma Scope: helping the safety community shed light on the inner workings of language models

Published

1 year ago

July 31, 2024

The Editors

Models

Published: 31 July 2024
Authors: Language Model Interpretability team

Announcing a comprehensive, open suite of sparse autoencoders for language model interpretability.

To create an artificial intelligence (AI) language model, researchers build a system that learns from vast amounts of data without human guidance. As a result, the inner workings of language models are often a mystery, even to the researchers who train them. Mechanistic interpretability is a research field focused on deciphering these inner workings. Researchers in this field use sparse autoencoders as a kind of ‘microscope’ that lets them see inside a language model, and get a better sense of how it works.

Today, we’re announcing Gemma Scope, a new set of tools to help researchers understand the inner workings of Gemma 2, our lightweight family of open models. Gemma Scope is a collection of hundreds of freely available, open sparse autoencoders (SAEs) for Gemma 2 9B and Gemma 2 2B. We’re also open sourcing Mishax, a tool we built that enabled much of the interpretability work behind Gemma Scope.

We hope today’s release enables more ambitious interpretability research. Further research has the potential to help the field build more robust systems, develop better safeguards against model hallucinations, and protect against risks from autonomous AI agents like deception or manipulation.

Try our interactive Gemma Scope demo, courtesy of Neuronpedia.

Interpreting what happens inside a language model

When you ask a language model a question, it turns your text input into a series of ‘activations’. These activations map the relationships between the words you’ve entered, helping the model make connections between different words, which it uses to write an answer.

As the model processes text input, activations at different layers in the model’s neural network represent multiple increasingly advanced concepts, known as ‘features’.

For example, a model’s early layers might learn to recall facts like that Michael Jordan plays basketball, while later layers may recognize more complex concepts like the factuality of the text.

A stylised representation of using a sparse autoencoder to interpret a model’s activations as it recalls the fact that the City of Light is Paris. We see that French-related concepts are present, while unrelated ones are not.

However, interpretability researchers face a key problem: the model’s activations are a mixture of many different features. In the early days of mechanistic interpretability, researchers hoped that features in a neural network’s activations would line up with individual neurons, i.e., nodes of information. But unfortunately, in practice, neurons are active for many unrelated features. This means that there is no obvious way to tell which features are part of the activation.

This is where sparse autoencoders come in.

A given activation will only be a mixture of a small number of features, even though the language model is likely capable of detecting millions or even billions of them – i.e., the model uses features sparsely. For example, a language model will consider relativity when responding to an inquiry about Einstein and consider eggs when writing about omelettes, but probably won’t consider relativity when writing about omelettes.

Sparse autoencoders leverage this fact to discover a set of possible features, and break down each activation into a small number of them. Researchers hope that the best way for the sparse autoencoder to accomplish this task is to find the actual underlying features that the language model uses.

Importantly, at no point in this process do we – the researchers – tell the sparse autoencoder which features to look for. As a result, we are able to discover rich structures that we did not predict. However, because we don’t immediately know the meaning of the discovered features, we look for meaningful patterns in examples of text where the sparse autoencoder says the feature ‘fires’.

Here’s an example in which the tokens where the feature fires are highlighted in gradients of blue according to their strength:

Example activations for a feature found by our sparse autoencoders. Each bubble is a token (word or word fragment), and the variable blue color illustrates how strongly the feature is present. In this case, the feature is apparently related to idioms.

What makes Gemma Scope unique

Prior research with sparse autoencoders has mainly focused on investigating the inner workings of tiny models or a single layer in larger models. But more ambitious interpretability research involves decoding layered, complex algorithms in larger models.

We trained sparse autoencoders at every layer and sublayer output of Gemma 2 2B and 9B to build Gemma Scope, producing more than 400 sparse autoencoders with more than 30 million learned features in total (though many features likely overlap). This tool will enable researchers to study how features evolve throughout the model and interact and compose to make more complex features.

Gemma Scope is also trained with our new, state-of-the-art JumpReLU SAE architecture. The original sparse autoencoder architecture struggled to balance the twin goals of detecting which features are present, and estimating their strength. The JumpReLU architecture makes it easier to strike this balance appropriately, significantly reducing error.

Training so many sparse autoencoders was a significant engineering challenge, requiring a lot of computing power. We used about 15% of the training compute of Gemma 2 9B (excluding compute for generating distillation labels), saved about 20 Pebibytes (PiB) of activations to disk (about as much as a million copies of English Wikipedia), and produced hundreds of billions of sparse autoencoder parameters in total.

Pushing the field forward

In releasing Gemma Scope, we hope to make Gemma 2 the best model family for open mechanistic interpretability research and to accelerate the community’s work in this field.

So far, the interpretability community has made great progress in understanding small models with sparse autoencoders and developing relevant techniques, like causal interventions, automatic circuit analysis, feature interpretation, and evaluating sparse autoencoders. With Gemma Scope, we hope to see the community scale these techniques to modern models, analyze more complex capabilities like chain-of-thought, and find real-world applications of interpretability such as tackling problems like hallucinations and jailbreaks that only arise with larger models.

Acknowledgements

Gemma Scope was a collective effort of Tom Lieberum, Sen Rajamanoharan, Arthur Conmy, Lewis Smith, Nic Sonnerat, Vikrant Varma, Janos Kramar and Neel Nanda, advised by Rohin Shah and Anca Dragan. We would like to especially thank Johnny Lin, Joseph Bloom and Curt Tigges at Neuronpedia for their assistance with the interactive demo. We are grateful for the help and contributions from Phoebe Kirk, Andrew Forbes, Arielle Bier, Aliya Ahmad, Yotam Doron, Tris Warkentin, Ludovic Peran, Kat Black, Anand Rao, Meg Risdal, Samuel Albanie, Dave Orr, Matt Miller, Alex Turner, Tobi Ijitoye, Shruti Sheth, Jeremy Sie, Tobi Ijitoye, Alex Tomala, Javier Ferrando, Oscar Obeso, Kathleen Kenealy, Joe Fernandez, Omar Sanseviero and Glenn Cameron.

Source link

AI Research

Exclusive | Cyberport may use Chinese GPUs at Hong Kong supercomputing hub to cut reliance on Nvidia

Published

2 hours ago

September 14, 2025

Xinmei Shen

Cyberport may add some graphics processing units (GPUs) made in China to its Artificial Intelligence Supercomputing Centre in Hong Kong, as the government-run incubator seeks to reduce its reliance on Nvidia chips amid worsening China-US relations, its chief executive said.

Cyberport has bought four GPUs made by four different mainland Chinese chipmakers and has been testing them at its AI lab to gauge which ones to adopt in the expanding facilities, Rocky Cheng Chung-ngam said in an interview with the Post on Friday. The park has been weighing the use of Chinese GPUs since it first began installing Nvidia chips last year, he said.

“At that time, China-US relations were already quite strained, so relying solely on [Nvidia] was no longer an option,” Cheng said. “That is why we felt that for any new procurement, we should in any case include some from the mainland.”

Cyberport’s AI supercomputing centre, established in December with its first phase offering 1,300 petaflops of computing power, will deliver another 1,700 petaflops by the end of this year, with all 3,000 petaflops currently relying on Nvidia’s H800 chips, he added.

Cyberport CEO Rocky Cheng Chung-ngam on September 12, 2025. Photo: Jonathan Wong

As all four Chinese solutions offer similar performance, Cyberport would take cost into account when determining which ones to order, according to Cheng, declining to name the suppliers.

Source link

AI Research

Why do AI chatbots use so much energy?

Published

2 hours ago

September 14, 2025

Alice Sun

In recent years, ChatGPT has exploded in popularity, with nearly 200 million users pumping a total of over a billion prompts into the app every day. These prompts may seem to complete requests out of thin air.

But behind the scenes, artificial intelligence (AI) chatbots are using a massive amount of energy. In 2023, data centers, which are used to train and process AI, were responsible for 4.4% of electricity use in the United States. Across the world, these centers make up around 1.5% of global energy consumption. These numbers are expected to skyrocket, at least doubling by 2030 as the demand for AI grows.

“Just three years ago, we didn’t even have ChatGPT yet,” said Alex de Vries-Gao, an emerging technology sustainability researcher at Vrije Universiteit Amsterdam and founder of Digiconomist, a platform dedicated to exposing the unintended consequences of digital trends. “And now we’re talking about a technology that’s going to be responsible for almost half of the electricity consumption by data centers globally.”

But what makes AI chatbots so energy intensive? The answer lies in the massive scale of AI chatbots. In particular, there are two parts of AI that use the most energy: training and inference, said Mosharaf Chowdhury, a computer scientist at the University of Michigan.

To train AI chatbots, large language models (LLMs) are given enormous datasets so the AI can learn, recognize patterns and make predictions. In general, there is a “bigger is better belief” with AI training, de Vries-Gao said, where bigger models that take in more data are thought to make better predictions.

“So what happens when you are trying to do a training is that the models nowadays have gotten so large, they don’t fit in a single GPU [graphics processing unit]; they don’t fit in a single server,” Chowdhury told Live Science.

To give a sense of scale, 2023 research by de Vries-Gao estimated that a single Nvidia DGX A100 server demands up to 6.5 kilowatts of power. Training an LLM usually requires multiple servers, each of which has an average of eight GPUs, which then run for weeks or months. Altogether, this consumes mountains of energy: It’s estimated that training OpenAI’s GPT-4 used 50 gigawatt-hours of energy, equivalent to powering San Francisco for three days.

Inference also consumes a lot of energy. This is where an AI chatbot draws a conclusion from what it has learned and generates an output from a request. Although it takes considerably fewer computational resources to run an LLM after it’s trained, inference is energy intensive because of the sheer number of requests made to AI chatbots.

As of July 2025, OpenAI states that users of ChatGPT send over 2.5 billion prompts every day, meaning that multiple servers are used to produce instantaneous responses for those requests. That isn’t even to consider the other chatbots that are widely used, including Google’s Gemini, which representatives say will soon become the default option when users access Google Search.

“So even in inference, you can’t really save any energy,” Chowdhury said. “It’s not really massive data. I mean, the model is already massive, but we have a massive number of people using it.”

Researchers like Chowdhury and de Vries-Gao are now working to better quantify these energy demands to understand how to reduce them. For example, Chowdhury keeps an ML Energy Leaderboard that tracks the inference energy consumption of open-source models.

However, the specific energy demands of the other generative AI platforms are mostly unknown; big companies like Google, Microsoft, and Meta keep these numbers private, or provide statistics that give little insight into the actual environmental impact of these applications, de Vries-Gao said. This makes it difficult to determine how much energy AI really uses, what the energy demand will be in the coming years, and whether the world can keep up.

People who use these chatbots, however, can push for better transparency. This can not only help users make more energy-responsible choices with their own AI use but also push for more robust policies that hold companies accountable.

“One very fundamental problem with digital applications is that the impact is never transparent,” de Vries-Gao said. “The ball is with policymakers to encourage disclosure so that the users can start doing something.”

Source link

AI Research

AI Transformation (AX) using artificial intelligence (AI) is spreading throughout the domestic finan..

Published

4 hours ago

September 14, 2025

The Editors

Getty Images Bank

AI Transformation (AX) using artificial intelligence (AI) is spreading throughout the domestic financial sector. Beyond simple digital transformation (DX), the strategy is to internalize AI across organizations and services to achieve management efficiency, work automation, and customer experience innovation at the same time. Financial companies are moving the judgment that it will be difficult to survive unless they raise their AI capabilities across the company in an environment where regulations and competition are intensifying. AX’s core is internal process innovation and customer service differentiation. AI can reduce costs and secure speed by quickly and accurately handling existing human-dependent tasks such as loan review, risk management, investment product recommendation, and internal counseling support.

At customer contact points, high-quality counseling is provided 24 hours a day through AI bankers, voice robots, and customized chatbots to increase financial service satisfaction. Industry sources say, “AX is not just a matter of technology, but a structural change that determines financial companies’ competitiveness and crisis response.”

First of all, major domestic banks and financial holding companies began to introduce in-house AI assistant and private large language model (LLM), establish a dedicated organization, and establish an AI governance system at the level of all affiliates. It is trying to automate internal work and differentiate customer services at the same time by establishing a strategic center at the group company level or introducing collaboration tools and AI platforms throughout the company.

KB Financial Group has established a ‘KB AI strategy’ and a ‘KB AI agent roadmap’ to introduce more than 250 AI agents to 39 core business areas of the group. It has established the ‘KB GenAI Portal’ for the first time in the financial sector to create an environment in which all executives and employees can utilize and develop AI without coding, and through this, it is efficiently changing work productivity and how they work.

Shinhan Financial Group is increasing work productivity with cloud-based collaboration tools (M365+Copilot) and introducing AI to the site by affiliates. Shinhan Bank placed Generative AI bankers at the window through the “AI Branch,” and in the application “SOL,” “AI Investment Mate” provides customized information to customers through card news.

Hana Bank is operating a “foreign exchange company AI departure prediction system” using its foreign exchange expertise. It is a structure that analyzes 253 variables based on past transaction data to calculate the possibility of suspension of transactions and automatically guides branches to help preemptively respond.

Woori Financial Group established an AI strategy center within the holding under the leadership of Chairman Lim Jong-ryong and deployed AI-only organizations to all affiliates, including banks, cards, securities, and insurance.

Internet banks are trying to differentiate themselves by focusing on interactive search and calculation machines, forgery and alteration detection, customized recommendations, and spreading in-house AI culture. As there is no offline sales network, it is actively strengthening customer contact AI innovation such as app and mobile counseling.

Kakao Bank has upgraded its AI organization to a group and has more than 500 dedicated personnel. K-Bank achieved a 100% recognition rate with its identification card recognition solution using AI, and started to set standards by publishing papers to academia. Toss Bank uses AI to determine ID forgery and alteration (99.5% accuracy), automate mass document optical character recognition (OCR), convert counseling voice letters (STT), and build its own financial-specific language model.

Insurance companies are increasing accuracy, approval rate, and processing speed by introducing AI in the entire process of risk assessment, underwriting, and insurance payment. Due to the nature of the insurance industry, the effect of using AI is remarkable as the screening and payment process is long and complex.

Samsung Fire & Marine Insurance has more than halved the proportion of manpower review by automating the cancer diagnosis and surgical benefit review process through ‘AI medical review’. The machine learning-based “Long-Term Insurance Sickness Screening System” raised the approval rate from 71% to 90% and secured patents.

Industry experts view this AI transformation as a paradigm shift in the financial industry, not just the introduction of technology. It is necessary to create new added value and customer experiences beyond cost reduction and efficiency through AI. In particular, it is evaluated that the differentiation of financial companies will be strengthened only when AI and data are directly connected to resolving customer inconveniences.

However, preparing for ethical, security, and accountability issues is considered an essential task as much as the speed of AI’s spread. Failure to manage risks such as the impact of large language models on financial decision-making, personal information protection, and algorithmic bias can lead to loss of trust. This means that the process of developing accumulated experiences into industrial standards through small experiments is of paramount importance.

[Reporter Lee Soyeon]

Source link

Continue Reading

Trending

Business2 weeks ago

The Guardian view on Trump and the Fed: independence is no substitute for accountability | Editorial

Tools & Platforms1 month ago

Building Trust in Military AI Starts with Opening the Black Box – War on the Rocks

Ethics & Policy2 months ago

SDAIA Supports Saudi Arabia’s Leadership in Shaping Global AI Ethics, Policy, and Research – وكالة الأنباء السعودية

Events & Conferences4 months ago

Journey to 1000 models: Scaling Instagram’s recommendation system

Jobs & Careers3 months ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Podcasts & Talks2 months ago

Happy 4th of July! 🎆 Made with Veo 3 in Gemini

Education2 months ago

VEX Robotics launches AI-powered classroom robotics system

Education2 months ago

Macron says UK and France have duty to tackle illegal migration ‘with humanity, solidarity and firmness’ – UK politics live | Politics

Podcasts & Talks2 months ago

OpenAI 🤝 @teamganassi

Funding & Business3 months ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

aistoriz.com

Gemma Scope: helping the safety community shed light on the inner workings of language models

AI Research

Gemma Scope: helping the safety community shed light on the inner workings of language models

Interpreting what happens inside a language model

What makes Gemma Scope unique

Pushing the field forward

Acknowledgements

Leave a Reply
Cancel reply

Leave a Reply

AI Research

Exclusive | Cyberport may use Chinese GPUs at Hong Kong supercomputing hub to cut reliance on Nvidia

AI Research

Why do AI chatbots use so much energy?

AI Research

AI Transformation (AX) using artificial intelligence (AI) is spreading throughout the domestic finan..

Trending

aistoriz.com

Gemma Scope: helping the safety community shed light on the inner workings of language models

Interpreting what happens inside a language model

What makes Gemma Scope unique

Pushing the field forward

Acknowledgements

You may like

Leave a Reply Cancel reply

Leave a Reply

AI Research

Exclusive | Cyberport may use Chinese GPUs at Hong Kong supercomputing hub to cut reliance on Nvidia

AI Research

Why do AI chatbots use so much energy?

AI Research

AI Transformation (AX) using artificial intelligence (AI) is spreading throughout the domestic finan..

Trending

Leave a Reply
Cancel reply