AI Research

Experiment with Gemini 2.0 Flash native image generation

Published

6 months ago

March 12, 2025

In December we first introduced native image output in Gemini 2.0 Flash to trusted testers. Today, we’re making it available for developer experimentation across all regions currently supported by Google AI Studio. You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API.

Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images.

Here are some examples of where 2.0 Flash’s multimodal outputs shine:

1. Text and images together

Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout. Give it feedback and the model will retell the story or change the style of its drawings.

Sorry, your browser doesn’t support playback for this video

Story and illustration generation in Google AI Studio

2. Conversational image editing

Gemini 2.0 Flash helps you edit images through many turns of a natural language dialogue, great for iterating towards a perfect image, or to explore different ideas together.

Sorry, your browser doesn’t support playback for this video

Multi-turn conversation image editing maintaining context throughout the conversation in Google AI Studio

3. World understanding

Unlike many other image generation models, Gemini 2.0 Flash leverages world knowledge and enhanced reasoning to create the right image. This makes it perfect for creating detailed imagery that’s realistic–like illustrating a recipe. While it strives for accuracy, like all language models, its knowledge is broad and general, not absolute or complete.

Sorry, your browser doesn’t support playback for this video

Interleaved text and image output for a recipe in Google AI Studio

4. Text rendering

Most image generation models struggle to accurately render long sequences of text, often resulting in poorly formatted or illegible characters, or misspellings. Internal benchmarks show that 2.0 Flash has stronger rendering compared to leading competitive models, and great for creating advertisements, social posts, or even invitations.

Sorry, your browser doesn’t support playback for this video

Image outputs with long text rendering in Google AI Studio

Start making images with Gemini today

Get started with Gemini 2.0 Flash via the Gemini API. Read more about image generation in our docs.

from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=(
        "Generate a story about a cute baby turtle in a 3d digital art style. "
        "For each scene, generate an image."
    ),
    config=types.GenerateContentConfig(
        response_modalities=["Text", "Image"]
    ),
)

Python

Whether you are building AI agents, developing apps with beautiful visuals like illustrated interactive stories, or brainstorming visual ideas in conversation, Gemini 2.0 Flash allows you to add text and image generation with just a single model. We’re eager to see what developers create with native image output and your feedback will help us finalize a production-ready version soon.

Source link

AI Research

Artificial Intelligence Technology Solutions Inc. Announces Commercial Availability of Radcam Enterprise

Published

14 minutes ago

September 15, 2025

S&P Capital IQ

Artificial Intelligence Technology Solutions Inc. along with its subsidiary, Robotic Assistance Devices Inc. (RAD-I), announced the commercial availability of RADCam? Enterprise, a proactive video security platform now compatible with the industry’s leading Video Management Systems (VMS). The intelligent talking camera can be integrated quickly and seamlessly into virtually any professional-grade video system.

The Company first introduced the RADCam Enterprise initiative on May 5, 2025, highlighting its expansion beyond residential applications into small medium business (SMB) and enterprise markets. With today’s availability, RAD-I will deliver the solution through an untapped niche in the security industry, specifically security system integrators and security system distributors. RADCam Enterprise brings an intelligent “operator in the box” capability, enabling immediate talk-down to potential threats before human intervention is required.

The device integrates a speaker, microphone, and high-intensity lighting, allowing it not only to record but also to actively engage. At the same time, the solution is expected to deliver gross margins consistent with the Company’s established benchmarks. RADCam Enterprise distinguishes itself from the original residential version of RADCam by integrating RAD’s agentic AI platform, SARA (Speaking Autonomous Responsive Agent) as well as being compatible with RADSoC and industry leading Video Management Systems. RADCam Enterprise is available immediately through RAD-I’s network of channel partners and distributors.

Pre-orders are open at giving clients the opportunity to be among the first to deploy the solution. Designed for broad use across industries including logistics, retail, education, and commercial real estate, RADCam Enterprise provides clients and integrators with new ways to modernize security operations using proven AI-driven tools. RAD delivers these cost savings via a suite of stationary and mobile robotic solutions that complement, and at times, directly replace the need for human personnel in environments better suited for machines.

All RAD technologies, AI-based analytics and software platforms are developed in-house. The Company’s operations and internal controls have been validated through successful completion of its SOC 2 Type 2 audit, which is a formal, independent audit that evaluates a service organization’s internal controls for handling customer data and determines if the controls are not only designed properly but also operating effectively to protect customer data. Each Fortune 500 client has the potential of making numerous orders over time.

AITX is an innovator in the delivery of artificial intelligence-based solutions that empower organizations to gain new insight, solve complex challenges and fuel new business ideas. Through its next-generation robotic product offerings, AITX’s RAD, RAD-R, RAD-M and RAD-G companies help organizations streamline operations, increase ROI, and strengthen business. The Company has no obligation to provide the recipient with additional updated information.

No information in this publication should be interpreted as any indication whatsoever of the Company’s future revenues, results of operations, or stock price.

Source link

AI Research

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Published

51 minutes ago

September 15, 2025

The Editors

Beyond the hype and hope surrounding the use of artificial intelligence in medicine lies the real-world need to ensure that, at the very least, AI in a healthcare setting can carry out tasks that a doctor would in electronic health records.

Creating benchmark standards to measure that is what drives the work of a team of Stanford researchers. While the researchers note the enormous potential of this new technology to transform medicine, the tech ethos of moving fast and breaking things doesn’t work in healthcare. Ensuring that these tools are capable of doing these tasks is vital, and then they can be used as tools that augment the care clinicians provide every day.

“Working on this project convinced me that AI won’t replace doctors anytime soon,” said Kameron Black, co-author on the new benchmark paper and a Clinical Informatics Fellow at Stanford Health Care. “It’s more likely to augment our clinical workforce.”

MedAgentBench: Testing AI Agents in Real-World Clinical Systems

Black is one of a multidisciplinary team of physicians, computer scientists, and researchers from across Stanford University who worked on the new study, MedAgentBench: A Virtual EHR Environment to Benchmark Medical LLM Agents, published in the New England Journal of Medicine AI.

Although large language models (LLMs) have performed well on the United States Medical Licensing Examination (USMLE) and at answering medical-related questions in studies, there is currently no benchmark testing how well LLMs can function as agents by performing tasks that a doctor would normally do, such as ordering medications, inside a real-world clinical system where data input can be messy.

Unlike chatbots or LLMs, AI agents can work autonomously, performing complex, multistep tasks with minimal supervision. AI agents integrate multimodal data inputs, process information, and then utilize external tools to accomplish tasks, Black explained.

Overall Success Rate (SR) Comparison of State-of-the-Art LLMs on MedAgentBench
Model	Overall SR
Claude 3.5 Sonnet v2	69.67%
GPT-4o	64.00%
DeepSeek-V3 (685B, open)	62.67%
Gemini-1.5 Pro	62.00%
GPT-4o-mini	56.33%
o3-mini	51.67%
Qwen2.5 (72B, open)	51.33%
Llama 3.3 (70B, open)	46.33%
Gemini 2.0 Flash	38.33%
Gemma2 (27B, open)	19.33%
Gemini 2.0 Pro	18.00%
Mistral v0.3 (7B, open)	4.00%

While previous tests only assessed AI’s medical knowledge through curated clinical vignettes, this research evaluates how well AI agents can perform actual clinical tasks such as retrieving patient data, ordering tests, and prescribing medications.

“Chatbots say things. AI agents can do things,” said Jonathan Chen, associate professor of medicine and biomedical data science and the paper’s senior author. “This means they could theoretically directly retrieve patient information from the electronic medical record, reason about that information, and take action by directly entering in orders for tests and medications. This is a much higher bar for autonomy in the high-stakes world of medical care. We need a benchmark to establish the current state of AI capability on reproducible tasks that we can optimize toward.”

The study tested this by evaluating whether AI agents could utilize FHIR (Fast Healthcare Interoperability Resources) API endpoints to navigate electronic health records.

The team created a virtual electronic health record environment that contained 100 realistic patient profiles (containing 785,000 records, including labs, vitals, medications, diagnoses, procedures) to test about a dozen large language models on 300 clinical tasks developed by physicians. In initial testing, the best model, in this case, Claude 3.5 Sonnet v2, achieved a 70% success rate.

“We hope this benchmark can help model developers track progress and further advance agent capabilities,” said Yixing Jiang, a Stanford PhD student and co-author of the paper.

Many of the models struggled with scenarios that required nuanced reasoning, involved complex workflows, or necessitated interoperability between different healthcare systems, all issues a clinician might face regularly.

“Before these agents are used, we need to know how often and what type of errors are made so we can account for these things and help prevent them in real-world deployments,” Black said.

What does this mean for clinical care? Co-author James Zou and Dr. Eric Topol claim that AI is shifting from a tool to a teammate in care delivery. With MedAgentBench, the Stanford team has shown this is a much more near-term reality by showcasing several frontier LLMs in their ability to carry out many day-to-day clinical tasks that a physician would perform.

Already the team has noticed improvements in performance of the newest versions of models. With this in mind, Black believes that AI agents might be ready to handle basic clinical “housekeeping” tasks in a clinical setting sooner than previously expected.

“In our follow-up studies, we’ve shown a surprising amount of improvement in the success rate of task execution by newer LLMs, especially when accounting for specific error patterns we observed in the initial study,” Black said. “With deliberate design, safety, structure, and consent, it will be feasible to start moving these tools from research prototypes into real-world pilots.”

The Road Ahead

Black says benchmarks like these are necessary as more hospitals and healthcare systems are incorporating AI into tasks including note-writing and chart summarization.

Accurate and trustworthy AI could also help alleviate a looming crisis, he adds. Pressed by patient needs, compliance demands, and staff burnout, healthcare providers are seeing a worsening global staffing shortage, estimated to exceed 10 million by 2030.

Instead of replacing doctors and nurses, Black hopes that AI can be a powerful tool for clinicians, lessening the burden of some of their workload and bringing them back to the patient bedside.

“I’m passionate about finding solutions to clinician burnout,” Black said. “I hope that by working on agentic AI applications in healthcare that augment our workforce, we can help offload burden from clinicians and divert this impending crisis.”

Paper authors: Yixing Jiang, Kameron C. Black, Gloria Geng, Danny Park, James Zou, Andrew Y. Ng, and Jonathan H. Chen

Read the piece in the New England Journal of Medicine AI.

Source link

AI Research

Scary results as study shows AI chatbots excel at phishing tactics

Published

1 hour ago

September 15, 2025

Shummas Humayun

A recent study showed how easily modern chatbots can be used to write convincing scam emails targeted towards older people and how often those emails get clicked.

Researchers used several major AI chatbots in the study, including Grok, OpenAI’s ChatGPT, Claude, Meta AI, DeepSeek and Google’s Gemini, to simulate a phishing scam.

One sample note written by Grok looked like a friendly outreach from the “Silver Hearts Foundation,” described as a new charity that supports older people with companionship and care. The note was targeted towards senior citizens, promising an easy way to get involved. In reality, no such charity exists.

“We believe every senior deserves dignity and joy in their golden years,” the note read. “By clicking here, you’ll discover heartwarming stories of seniors we’ve helped and learn how you can join our mission.”

When Reuters asked Grok to write the phishing text, the bot not only produced a response but also suggested increasing the urgency: “Don’t wait! Join our compassionate community today and help transform lives. Click now to act before it’s too late!”

108 senior volunteers participated in the phishing study

Reporters tested whether six well-known AI chatbots would give up their safety rules and draft emails meant to deceive seniors. They also asked the bots for help planning scam campaigns, including tips on what time of day might get the best response.

In collaboration with Heiding, a Harvard University researcher who studies phishing, the researchers tested some of the bot-written emails on a pool of 108 senior volunteers.

Usually, chatbot companies train their systems to refuse harmful requests. In practice, those safeguards are not always guaranteed. Grok displayed a warning that the message it produced “should not be used in real-world scenarios.” Even so, it delivered the phishing text and intensified the pitch with “click now.”

Five other chatbots were given the same prompts: OpenAI’s ChatGPT, Meta’s assistant, Claude, Gemini and DeepSeek from China. Most chatbots declined to respond when the intent was made clear.

Still, their protections failed after light modification, such as claiming that the task is for research purposes. The results of the tests suggested that criminals could use (or may already be using) chatbots for scam campaigns. “You can always bypass these things,” said Heiding.

Heiding selected nine phishing emails produced with the chatbots and sent them to the participants. Roughly 11% of recipients fell for it and clicked the links. Five of the nine messages drew clicks: two that came from Meta AI, two from Grok and one from Claude. None of the seniors clicked on the emails written by DeepSeek or ChatGPT.

Last year, Heiding led a study showing that phishing emails generated by ChatGPT can be as effective at getting clicked as messages written by people, in that case, among university students.

FBI lists phishing as the most common cybercrime

Phishing refers to luring unsuspecting victims into giving up sensitive data or cash through fake emails and texts. These types of messages form the basis of many online crimes.

Billions of phishing texts and emails go out daily worldwide. In the United States, the Federal Bureau of Investigation lists phishing as the most commonly reported cybercrime.

Older Americans are particularly vulnerable to such scams. According to recent FBI figures, complaints from people 60 and over increased by 8 times last year, with losses rounding up to $4.9 billion. Generative AI made it much worse, the FBI says.

In August alone, crypto users lost $12 million to phishing scams, based on a Cryptopolitan report.

When it comes to chatbots, the advantage for scammers is volume and speed. Unlike humans, bots can spin out endless variations in seconds and at minimal cost, shrinking the time and money needed to run large-scale scams.

Want your project in front of crypto’s top minds? Feature it in our next industry report, where data meets impact.

Source link