AI Research

Start building with Gemini 2.5 Flash

Published

5 months ago

April 17, 2025

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Even with thinking off, developers can maintain the fast speeds of 2.0 Flash, and improve performance.

Our Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding. Instead of immediately generating an output, the model can perform a “thinking” process to better understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena, second only to 2.5 Pro.

2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.

Our most cost-efficient thinking model

2.5 Flash continues to lead as the model with the best price-to-performance ratio.

A graph showing Gemini 2.5 Flash price-to-performance comparison

Gemini 2.5 Flash adds another model to Google’s pareto frontier of cost to quality.*

Fine-grained controls to manage thinking

We know that different use cases have different tradeoffs in quality, cost, and latency. To give developers flexibility, we’ve enabled setting a thinking budget that offers fine-grained control over the maximum number of tokens a model can generate while thinking. A higher budget allows the model to reason further to improve quality. Importantly, though, the budget sets a cap on how much 2.5 Flash can think, but the model does not use the full budget if the prompt does not require it.

Plot graphs show improvements in reasoning quality as thinking budget increases

Improvements in reasoning quality as thinking budget increases.

The model is trained to know how long to think for a given prompt, and therefore automatically decides how much to think based on the perceived task complexity.

If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, set the thinking budget to 0. You can also choose to set a specific token budget for the thinking phase using a parameter in the API or the slider in Google AI Studio and in Vertex AI. The budget can range from 0 to 24576 tokens for 2.5 Flash.

The following prompts demonstrate how much reasoning may be used in the 2.5 Flash’s default mode.

Prompts requiring low reasoning:

Example 1: “Thank you” in Spanish

Example 2: How many provinces does Canada have?

Prompts requiring medium reasoning:

Example 1: You roll two dice. What’s the probability they add up to 7?

Example 2: My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days a week and want to play 5 hours of basketball on weekdays, create a schedule for me to make it all work.

Prompts requiring high reasoning:

Example 1: A cantilever beam of length L=3m has a rectangular cross-section (width b=0.1m, height h=0.2m) and is made of steel (E=200 GPa). It is subjected to a uniformly distributed load w=5 kN/m along its entire length and a point load P=10 kN at its free end. Calculate the maximum bending stress (σ_max).

Example 2: Write a function evaluate_cells(cells: Dict[str, str]) -> Dict[str, float] that computes the values of spreadsheet cells.

Each cell contains:

Or a formula like "=A1 + B1 * 2" using +, -, *,/ and other cells.

Requirements:

Resolve dependencies between cells.

Handle operator precedence (*/ before +-).

Detect cycles and raise ValueError("Cycle detected at ").

No eval(). Use only built-in libraries.

Start building with Gemini 2.5 Flash today

Gemini 2.5 Flash with thinking capabilities is now available in preview via the Gemini API in Google AI Studio and in Vertex AI, and in a dedicated dropdown in the Gemini app. We encourage you to experiment with the thinking_budget parameter and explore how controllable reasoning can help you solve more complex problems.

from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
  model="gemini-2.5-flash-preview-04-17",
  contents="You roll two dice. What’s the probability they add up to 7?",
  config=genai.types.GenerateContentConfig(
    thinking_config=genai.types.ThinkingConfig(
      thinking_budget=1024
    )
  )
)

print(response.text)

Python

Find detailed API references and thinking guides in our developer docs or get started with code examples from the Gemini Cookbook.

We will continue to improve Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use.

^*_{^{Model pricing is sourced from Artificial Analysis & Company Documentation}}

Source link

AI Research

Study shakes Silicon Valley: Researchers break AI

Published

1 hour ago

September 14, 2025

The Editors

Study shakes Silicon Valley: Researchers break AI | The Jerusalem Post

Study shows researchers can manipulate chatbots with simple psychology, raising serious concerns about AI’s vulnerability and potential dangers.

ChatGPT encouraged a teenager toward suicide

(photo credit: OpenAI)

ByDR. ITAY GAL

SEPTEMBER 14, 2025 09:13

Source link

AI Research

And Sci Fi Thought AI Was Going To… Take Over? – mindmatters.ai

Published

9 hours ago

September 13, 2025

News

And Sci Fi Thought AI Was Going To… Take Over? mindmatters.ai

Source link

AI Research

Measuring Machine Intelligence Using Turing Test 2.0

Published

9 hours ago

September 13, 2025

News

In 1950, British mathematician Alan Turing (1912–1954) proposed a simple way to test artificial intelligence. His idea, known as the Turing Test, was to see if a computer could carry on a text-based conversation so well that a human judge could not reliably tell it apart from another human. If the computer could “fool” the judge, Turing argued, it should be considered intelligent.

For decades, Turing’s test shaped public understanding of AI. Yet as technology has advanced, many researchers have asked whether imitating human conversation really proves intelligence — or whether it only shows that machines can mimic certain human behaviors. Large language models like ChatGPT can already hold convincing conversations. But does that mean they understand what they are saying?

In a Mind Matters podcast interview, Dr. Georgios Mappouras tells host Robert J. Marks that the answer is no. In a recent paper, The General Intelligence Threshold, Mappouras introduces what he calls Turing Test 2.0. This updated approach sets a higher bar for intelligence than simply chatting like a human. It asks whether machines can go beyond imitation to produce new knowledge.

From information to knowledge

At the heart of Mappouras’s proposal is a distinction between two kinds of information, non-functional vs. functional:

Non-functional information is raw data or observations that don’t lead to new insights by themselves. One example would be noticing that an apple falls from a tree.
Functional information is knowledge that can be applied to achieve something new. When Isaac Newton connected the falling apple to the force of gravity, he transformed ordinary observation into scientific law.

True intelligence, Mappouras argues, is the ability to transform non-functional information into functional knowledge. This creative leap is what allows humans to build skyscrapers, develop medicine, and travel to the moon. A machine that merely rearranges words or retrieves facts cannot be said to have reached the same level.

The General Intelligence Threshold

Mappouras calls this standard the General Intelligence Threshold. His threshold sets a simple challenge: given existing knowledge and raw information, can the system generate new insights that were not directly programmed into it?

This threshold does not require constant displays of brilliance. Even one undeniable breakthrough — a “flash of genius” — would be enough to demonstrate that a machine possesses general intelligence. Just as a person may excel in math but not physics, a machine would only need to show creativity once to prove its potential.

Creativity and open problems

One way to apply the new test is through unsolved problems in mathematics. Throughout history, breakthroughs such as Andrew Wiles’s proof of Fermat’s Last Theorem or Grigori Perelman’s solution to the Poincaré Conjecture marked milestones of human creativity. If AI could solve open problems like the Riemann Hypothesis or the Collatz Conjecture — problems that no one has ever solved before — it would be strong evidence that the system had crossed the threshold into true intelligence.

Large language models already solve equations and perform advanced calculations, but solving a centuries-old unsolved problem would show something far deeper: the ability to create knowledge that has never existed before.

Beyond symbol manipulation

Mappouras also draws on philosopher John Searle’s famous “Chinese Room” thought experiment. In the scenario, a person who does not understand Chinese sits in a room with a rulebook for manipulating Chinese characters. By following instructions, the person produces outputs that convince outsiders he understands the language, even though he does not.

This scenario, Searle argued, shows that a computer might appear intelligent without real understanding. Mappouras agrees but goes further. For him, real intelligence is proven not just by producing outputs, but by acting on new knowledge. If the instructions in the Chinese Room included a way to escape, the person could only succeed if he truly understood what the words meant. In the same way, AI must demonstrate it can act meaningfully on information, not just shuffle symbols.

Image Credit: top images – Adobe Stock

Can AI pass the new test?

So far, Mappouras does not think modern AI has passed the General Intelligence Threshold. Systems like ChatGPT may look impressive, but their apparent creativity usually comes from patterns in the massive data sets on which they were trained. They have not shown the ability to produce new, independent knowledge disconnected from prior inputs.

That said, Mappouras emphasizes that success would not require constant novelty. One true act of creativity — an undeniable demonstration of new knowledge — would be enough. Until that happens, he remains cautious about claims that today’s AI is truly intelligent.

A shift in the debate

The debate over artificial intelligence is shifting. The original Turing Test asked whether machines could fool us into thinking they were human. Turing Test 2.0 asks a harder question: can they discover something new?

Mappouras believes this is the real measure of intelligence. Intelligence is not imitation — it is innovation. Whether machines will ever cross that line remains uncertain. But if they do, the world will not just be talking with computers. We will be learning from them.

Final thoughts: Today’s systems, tomorrow’s threshold

Models like ChatGPT and Grok are remarkable at conversation, summarization, and problem-solving within known domains, but their strengths still reflect pattern learning from vast training data. By Mappouras’s standard, they will cross the General Intelligence Threshold only when they produce a verifiable breakthrough — an insight not traceable to prior text or human scaffolding, such as an original solution to a major open problem. Until then, they remain powerful imitators and accelerators of human work — impressive, useful, and transformative, but not yet creators of genuinely new knowledge.