AI Research

Experiment with Gemini 2.0 Flash native image generation

Published

6 months ago

March 12, 2025

In December we first introduced native image output in Gemini 2.0 Flash to trusted testers. Today, we’re making it available for developer experimentation across all regions currently supported by Google AI Studio. You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API.

Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images.

Here are some examples of where 2.0 Flash’s multimodal outputs shine:

1. Text and images together

Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout. Give it feedback and the model will retell the story or change the style of its drawings.

Sorry, your browser doesn’t support playback for this video

Story and illustration generation in Google AI Studio

2. Conversational image editing

Gemini 2.0 Flash helps you edit images through many turns of a natural language dialogue, great for iterating towards a perfect image, or to explore different ideas together.

Sorry, your browser doesn’t support playback for this video

Multi-turn conversation image editing maintaining context throughout the conversation in Google AI Studio

3. World understanding

Unlike many other image generation models, Gemini 2.0 Flash leverages world knowledge and enhanced reasoning to create the right image. This makes it perfect for creating detailed imagery that’s realistic–like illustrating a recipe. While it strives for accuracy, like all language models, its knowledge is broad and general, not absolute or complete.

Sorry, your browser doesn’t support playback for this video

Interleaved text and image output for a recipe in Google AI Studio

4. Text rendering

Most image generation models struggle to accurately render long sequences of text, often resulting in poorly formatted or illegible characters, or misspellings. Internal benchmarks show that 2.0 Flash has stronger rendering compared to leading competitive models, and great for creating advertisements, social posts, or even invitations.

Sorry, your browser doesn’t support playback for this video

Image outputs with long text rendering in Google AI Studio

Start making images with Gemini today

Get started with Gemini 2.0 Flash via the Gemini API. Read more about image generation in our docs.

from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=(
        "Generate a story about a cute baby turtle in a 3d digital art style. "
        "For each scene, generate an image."
    ),
    config=types.GenerateContentConfig(
        response_modalities=["Text", "Image"]
    ),
)

Python

Whether you are building AI agents, developing apps with beautiful visuals like illustrated interactive stories, or brainstorming visual ideas in conversation, Gemini 2.0 Flash allows you to add text and image generation with just a single model. We’re eager to see what developers create with native image output and your feedback will help us finalize a production-ready version soon.

Source link

AI Research

Arista touts liquid cooling, optical tech to reduce power consumption for AI networking

Published

1 hour ago

September 15, 2025

michael_cooney

Both technologies will likely find a role in future AI and optical networks, experts say, as both promise to reduce power consumption and support improved bandwidth density. Both have advantages and disadvantages as well – CPOs are more complex to deploy given the amount of technology included in a CPO package, whereas LPOs promise more simplicity.

Bechtolsheim said that LPO can provide an additional 20% power savings over other optical forms. Early tests show good receiver performance even under degraded conditions, though transmit paths remain sensitive to reflections and crosstalk at the connector level, Bechtolsheim added.

At the recent Hot Interconnects conference, he said: “The path to energy-efficient optics is constrained by high-volume manufacturing,” stressing that advanced optics packaging remains difficult and risky without proven production scale.

“We are nonreligious about CPO, LPO, whatever it is. But we are religious about one thing, which is the ability to ship very high volumes in a very predictable fashion,” Bechtolsheim said at the investor event. “So, to put this in quantity numbers here, the industry expects to ship something like 50 million OSFP modules next calendar year. The current shipment rate of CPO is zero, okay? So going from zero to 50 million is just not possible. The supply chain doesn’t exist. So, even if the technology works and can be demonstrated in a lab, to get to the volume required to meet the needs of the industry is just an incredible effort.”

“We’re all in on liquid cooling to reduce power, eliminating fan power, supporting the linear pluggable optics to reduce power and cost, increasing rack density, which reduces data center footprint and related costs, and most importantly, optimizing these fabrics for the AI data center use case,” Bechtolsheim added.

“So what we call the ‘purpose-built AI data center fabric’ around Ethernet technology is to really optimize AI application performance, which is the ultimate measure for the customer in both the scale-up and the scale-out domains. Some of this includes full switch customization for customers. Other cases, it includes the power and cost optimization. But we have a large part of our hardware engineering department working on these things,” he said.

Source link

AI Research

Learning by Doing: AI, Knowledge Transfer, and the Future of Skills | American Enterprise Institute

Published

1 hour ago

September 15, 2025

Josh Tubbs

In a recent blog, I discussed Stanford University economist Erik Brynjolfsson’s new study showing that young college graduates are struggling to gain a foothold in a job market shaped by artificial intelligence (AI). His analysis found that, since 2022, early-career workers in AI-exposed roles have seen employment growth lag 13 percent behind peers in less-exposed fields. At the same time, experienced workers in the same jobs have held steady or even gained ground. The conclusion: AI isn’t eliminating work outright, but it is affecting the entry-level rungs that young workers depend on as they begin climbing career ladders.

The potential consequences of these findings, assuming they bear out, become clearer when read alongside Enrique Ide’s recent paper, Automation, AI, and the Intergenerational Transmission of Knowledge. Ide argues that when firms automate entry-level tasks, the opportunity for new workers to gain the tacit knowledge—the kind of workplace norms and rhythms of team-based work that aren’t necessarily written down—isn’t passed on. Thus, productivity gains accrue to seasoned workers while would-be novices lose the hands-on training they need to build the foundation for career progress.

This short-circuiting of early career experiences, Ide says, has macro-economic consequences. He estimates that automating even five percent of entry-level tasks reduces long-run US output growth by an estimated 0.05 percentage points per year; at 30 percent automation, growth slows by more than 0.3 points. Over a hundred year timeline, this would reduce total output by 20 percent relative to a world without AI automation. In other words: automating the bottom rungs might lift firms’ quarterly performance, but at the cost of generational growth.

This is where we need to pause and take a breath. While Ide’s results sound dramatic, it is critical to remember that the dynamics and consequences of AI adoption are unpredictable, and that a century is a very long time. For instance, who would have said in 2022 that one of the first effects of AI automation would be to benefit less tech-savvy boomer and Gen-X managers and harm freshly minted Gen-Z coders?

Given the history of positive, automation-induced wealth and employment effects, why would this time be different?

Finally, it’s important to remember that in a dynamic market-driven economy, skill requirements are always changing and firms are always searching for ways to improve their efficiency relative to competitors. This is doubly true as we enter the era of cognitive, as opposed to physical, automation. AI-driven automation is part of the pathway to a more prosperous economy and society for ourselves and for future generations. As my AEI colleague Jim Pethokoukis recently said, “A supposedly powerful general-purpose technology that left every firm’s labor demand utterly unchanged wouldn’t be much of a GPT.” Said another way, unless AI disrupts our economy and lives, it cannot deliver its promised benefits.

What then should we do? I believe the most important step we can take right now is to begin “stress-testing” our current workforce development policies and programs and building scenarios for how industry and government will respond should significant AI-related job disruptions occur. Such scenario planning could be shaped into a flexible “playbook” of options to guide policymakers geared to the types and numbers of affected workers. Such planning didn’t occur prior to the automation and trade shocks of the 1990s and 2000s with lasting consequences for factory workers and American society. We should try to make sure this doesn’t happen again with AI.

Pessimism is easy and cheap. We should resist the lure of social media-monetized AI doomerism and focus on building the future we want to see by preparing for and embracing change.

Source link

AI Research

SBU Researchers Use AI to Advance Alzheimer’s Detection

Published

2 hours ago

September 15, 2025

Michael Gasparino

Shan Lin

Alzheimer’s disease is one of the most urgent public health challenges for aging Americans. Nearly seven million Americans over the age of 65 are currently living with the disease, and that number is projected to nearly double by 2060, according to the Alzheimer’s Association.

Early diagnosis and continuous monitoring are crucial to improving care and extending independence, but there isn’t enough high-quality, Alzheimer’s-specific data to train artificial intelligence systems that could help detect and track the disease.

Shan Lin, associate professor of Electrical and Computer Engineering at Stony Brook University, along with PhD candidate Heming Fu, are working with Guoliang Xing from The Chinese University of Hong Kong to create a network of data based on Alzheimer’s patients. Together they developed SHADE-AD (Synthesizing Human Activity Datasets Embedded with AD features), a generative AI framework designed to create synthetic, realistic data that reflects the motor behaviors of Alzheimer’s patients.

Movements like stooped posture, reliance on armrests when standing from sitting, or slowed gait may appear subtle, but can be early indicators of the disease. By identifying and replicating these patterns, SHADE-AD provides researchers and physicians with the data required to improve monitoring and diagnosis.

Unlike existing generative models, which often rely on and output generic datasets drawn from healthy individuals, SHADE-AD was trained to embed Alzheimer’s-specific traits. The system generates three-dimensional “skeleton videos,” simplified figures that preserve details of joint motion. These 3D skeleton datasets were validated against real-world patient data, with the model proving capable of reproducing the subtle changes in speed, angle, and range of motion that distinguish Alzheimer’s behaviors from those of healthy older adults.

The results and findings, published and presented at the 23rd ACM Conference on Embedded Networked Sensor Systems (SenSys 2025), have been significant. Activity recognition systems trained with SHADE-AD’s data achieved higher accuracy across all major tasks compared with systems trained on traditional data augmentation or general open datasets. In particular, SHADE-AD excelled at recognizing actions like walking and standing up, which often reveal the earliest signs of decline for Alzheimer’s patients.

Shade-AD skeleton — This figure illustrates the comparison of “standing up from a chair” motion between a healthy elder and an AD patient.

Lin believes this work could have a significant impact on the daily lives of older adults and their families. Technologies built on SHADE-AD could one day allow doctors to detect Alzheimer’s sooner, track disease progression more accurately, and intervene earlier with treatments and support. “If we can provide tools that spot these changes before they become severe, patients will have more options, and families will have more time to plan,” he said.

With September recognized nationally as Healthy Aging Month, Lin sees this research as part of an effort to use technology to support older adults in living longer, healthier, and more independent lives. “Healthy aging isn’t only about treating illness, but also about creating systems that allow people to thrive as they grow older,” he said. “AI can be a powerful ally in that mission.”

— Beth Squire

Source link