Connect with us

AI Research

I used Google Gemini to analyze YouTube, and the results were seriously impressive – 4 ways you can use video integration to get the most from AI

Published

on


There are a lot of great YouTube videos with tons of interesting information, but sometimes you’re in a hurry or trying to find something specific amid what may or may not be padding. Happily, Google Gemini can analyze YouTube videos on your behalf and really dive into the details you might have missed or didn’t have time to get to. I don’t just mean transcribing it or guessing what’s going on based on the title. And because Gemini and YouTube are both Google products, you don’t have to download the video and reupload it, just share the link to the video and start asking questions.

It’s pretty straightforward to use, but there are some benefits you might not immediately notice. Here are some of my favorite ways to use the AI feature.

Timestamped summaries

(Image credit: Screenshot from Gemini)

To get Gemini to analyze a YouTube video, you just have to ask it to do just that and include a YouTube link. For instance, I asked Gemini to “analyze this video” and pasted in the YouTube link to a great Defunctland video about the history of The Muppet Show.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Build an MCP application with Mistral models on AWS

Published

on


This post is cowritten with Siddhant Waghjale and Samuel Barry from Mistral AI.

Model Context Protocol (MCP) is a standard that has been gaining significant traction in recent months. At a high level, it consists of a standardized interface designed to streamline and enhance how AI models interact with external data sources and systems. Instead of hardcoding retrieval and action logic or relying on one-time tools, MCP offers a structured way to pass contextual data (for example, user profiles, environment metadata, or third-party content) into a large language model (LLM) context and to route model outputs to external systems. For developers, MCP abstracts away integration complexity and creates a unified layer for injecting external knowledge and executing model actions, making it more straightforward to build robust and efficient agentic AI systems that remain decoupled from data-fetching logic.

Mistral AI is a frontier research lab that emerged in 2023 as a leading open source contender in the field of generative AI. Mistral has released many state-of-the-art models, from Mistral 7B and Mixtral in the early days up to the recently announced Mistral Medium 3 and Small 3—effectively popularizing the mixture of expert architecture along the way. Mistral models are generally described as extremely efficient and versatile, frequently reaching state-of-the-art levels of performance at a fraction of the cost. These models are now seamlessly integrated into Amazon Web Services (AWS) services, unlocking powerful deployment options for developers and enterprises. Through Amazon Bedrock, users can access Mistral models using a fully managed API, enabling rapid prototyping without managing infrastructure. Amazon Bedrock Marketplace further extends this by allowing quick model discovery, licensing, and integration into existing workflows. For power users seeking fine-tuning or custom training, Amazon SageMaker JumpStart offers a streamlined environment to customize Mistral models with their own data, using the scalable infrastructure of AWS. This integration makes it faster than ever to experiment, scale, and productionize Mistral models across a wide range of applications.

This post demonstrates building an intelligent AI assistant using Mistral AI models on AWS and MCP, integrating real-time location services, time data, and contextual memory to handle complex multimodal queries. This use case, restaurant recommendations, serves as an example, but this extensible framework can be adapted for enterprise use cases by modifying MCP server configurations to connect with your specific data sources and business systems.

Solution overview

This solution uses Mistral models on Amazon Bedrock to understand user queries and route the query to relevant MCP servers to provide accurate and up-to-date answers. The system follows this general flow:

  1. User input – The user sends a query (text, image, or both) through either a terminal-based or web-based Gradio interface
  2. Image processing – If an image is detected, the system processes and optimizes it for the AI model
  3. Model request – The query is sent to the Amazon Bedrock Converse API with appropriate system instructions
  4. Tool detection – If the model determines it needs external data, it requests a tool invocation
  5. Tool execution – The system routes the tool request to the appropriate MCP server and executes it
  6. Response generation – The model incorporates the tool’s results to generate a comprehensive response
  7. Response delivery – The final answer is displayed to the user

In this example, we demonstrate the MCP framework using a general use case of restaurant or location recommendation and route planning. Users can provide multimodal input (such as text plus image), and the application integrates Google Maps, Time, and Memory MCP servers. Additionally, this post showcases how to use the Strands Agent framework as an alternative approach to build the same MCP application with significantly reduced complexity and code. Strands Agent is an open source, multi-agent coordination framework that simplifies the development of intelligent, context-aware agent systems across various domains. You can build your own MCP application by modifying the MCP server configurations to suit your specific needs. You can find the complete source code for this example in our Git repository. The following diagram is the solution architecture.

Prerequisites

Before implementing the example, you need to set up the account and environment. Use the following steps.To set up the AWS account :

  1. Create an AWS account. If you don’t already have one, sign up at https://aws.amazon.com
  2. To enable Amazon Bedrock access, go to the Amazon Bedrock console and request access to the models you plan to use (for this walkthrough, request access to Mistral Pixtral Large). Or deploy Mistral Small 3 model from Amazon Bedrock Marketplace. (For more details, refer to the Mistral Model Deployments on AWS section later in this post.) When your request is approved, you’ll be able to use these models through the Amazon Bedrock Converse API

To set up the local environment:

  1. Install the required tools:
    1. Python 3.10 or later
    2. Node.js (required for MCP tool servers)
    3. AWS Command Line Interface (AWS CLI), which is needed for configuration
  2. Clone the Repository:
git clone https://github.com/aws-samples/mistral-on-aws.git
cd mistral-on-aws/MCP/MCP_Mistral_app_demo/

  1. Install Python dependencies:
pip install -r requirements.txt

  1. Configure AWS credentials:

Then enter your AWS access key ID, secret access key, and preferred AWS Region.

  1. Set up MCP tool servers. The server configurations are provided in file: server_configs.py. The system uses Node.js-based MCP servers. They’ll be installed automatically when you run the application for the first time using NPM. You can add other MCP server configurations in this file. This solution can be quickly modified and extended to meet your business requirements.

Mistral model deployments on AWS

Mistral models can be accessed or deployed using the following methods. To use foundation models (FMs) in MCP applications, the models must support tool use functionality.

Amazon Bedrock serverless (Pixtral Large)

To enable this model, follow these steps:

  1. Go to the Amazon Bedrock console.
  2. From the left navigation pane, select Model access.
  3. Choose Manage model access.
  4. Search for the model using the keyword Pixtral, select it, and choose Next, as shown in the following screenshot. The model will then be ready to use.

This model has cross-Region inference enabled. When using the model ID, always add the Region prefix eu or us before the model ID, such as eu.mistral.pixtral-large-2502-v1:0. Provide this model ID in config.py. You can now test the example with the Gradio web-based app.

Amazon Bedrock interface for managing base model access with Pixtral Large model highlighted

Amazon Bedrock Marketplace (Mistral-Small-24B-Instruct-2501)

Amazon Bedrock Marketplace and SageMaker JumpStart deployments are dedicated instances (serverful) and incur charges as long as the instance remains deployed. For more information, refer to Amazon Bedrock pricing and Amazon SageMaker pricing.

To enable this model, follow these steps:

  1. Go to the Amazon Bedrock console
  2. In the left navigation pane, select Model catalog
  3. In the search bar, search for “Mistral-Small-24B-Instruct-25-1,” as shown in the following screenshot

Amazon Bedrock UI with model catalog, filters, and Mistral-Small-24B-Instruct-2501 model spotlight

  1. Select the model and select Deploy.
  2. In the configuration page, you can keep all fields as default. This endpoint requires an instance type ml.g6.12xlarge. Check service quotas under the Amazon SageMaker service to make sure you have more than two instances available for endpoint usage (you’ll use another instance for Amazon SageMaker JumpStart deployment). If you don’t have more than two instances, request a quota increase for this instance type. Then choose Deploy. The model deployment might take a few minutes.
  3. When the model is in service, copy the endpoint Amazon Resource Name (ARN), as shown in the following screenshot, and add it to the config.py file in the model_id field. Then you can test the solution with the Gradio web-based app.
  4. The Mistral-Small-24B-Instruct-25-1 model doesn’t support image input, so only text-based Q&A is supported.

AWS Bedrock marketplace deployments interface with workflow steps and active Mistral endpoint

Amazon SageMaker JumpStart (Mistral-Small-24B-Instruct-2501)

To enable this model, follow these steps:

  1. Go to the Amazon SageMaker console
  2. Create a domain and user profile
  3. Under the created user profile, launch Studio
  4. In the left navigation pane, select JumpStart, then search for “Mistral”
  5. Select Mistral-Small-24B-Instruct-2501, then choose Deploy

This deployment might take a few minutes. The following screenshot shows that this model is marked as Bedrock ready. This means you can register this model as an Amazon Bedrock Marketplace deployment and use Amazon Bedrock APIs to invoke this Amazon SageMaker endpoint.

Dark-themed SageMaker dashboard displaying Mistral AI models with Bedrock ready status

  1. After the model is in service, copy its endpoint ARN from the Amazon Bedrock Marketplace deployment, as shown in the following screenshot, and provide it to the config.py file in the model_id field. Then you can test the solution with the Gradio web-based app.

The Mistral-Small-24B-Instruct-25-1 model doesn’t support image input, so only text-based Q&A is supported.

SageMaker real-time inference endpoint for Mistral small model with AllTraffic variant on ml.g6 instance

Build an MCP application with Mistral models on AWS

The following sections provide detailed insights into building MCP applications from the ground up using a component-level approach. We explore how to implement the three core MCP components, MCP host, MCP client, and MCP servers, giving you complete control and understanding of the underlying architecture.

MCP host component

The MCP is designed to facilitate seamless interaction between AI models and external tools, systems, and data sources. In this architecture, the MCP host plays a pivotal role in managing the lifecycle and orchestration of MCP clients and servers, enabling AI applications to access and utilize external resources effectively. The MCP host is responsible for integration with FMs, providing context, capabilities discovery, initialization, and MCP client management. In this solution, we have three files to provide this capability.

The first file is agent.py. The BedrockConverseAgent class in agent.py is the core component that manages communication with the Amazon Bedrock service and provides the FM models integration. The constructor initializes the agent with model settings and sets up the AWS Bedrock client.

def __init__(self, model_id, region, system_prompt="You are a helpful assistant."):
    """
    Initialize the Bedrock agent with model configuration.
    
    Args:
        model_id (str): The Bedrock model ID to use
        region (str): AWS region for Bedrock service
        system_prompt (str): System instructions for the model
    """
    self.model_id = model_id
    self.region = region
    self.client = boto3.client('bedrock-runtime', region_name=self.region)
    self.system_prompt = system_prompt
    self.messages = []
    self.tools = None

Then, the agent intelligently handles multimodal inputs with its image processing capabilities. This method validates image URLs provided by the user, downloads images, detects and normalizes image formats, resizes large images to meet API constraints, and converts incompatible formats to JPEG.

async def _fetch_image_from_url(self, image_url):
    # Download image from URL
    # Process and optimize for model compatibility
    # Return binary image data with MIME type

When users enter a prompt, the agent detects whether it contains an uploaded image or an image URL and processes it accordingly in the invoke_with_prompt function. This way, users can paste an image URL in their query or upload an image from their local device and have it analyzed by the AI model.

async def invoke_with_prompt(self, prompt):
    # Check if prompt contains an image URL
    has_image, image_url = self._is_image_url(prompt)
    if image_input:
        # First check for direct image upload
        # ...
    if has_image_url:
       # Second check for image URL in prompt
    else:
        # Standard text-only prompt
        content = [{'text': prompt}]
    return await self.invoke(content)

The most powerful feature is the agent’s ability to use external tools provided by MCP servers. When the model wants to use a tool, the agent detects the tool_use stop reason from Amazon Bedrock and extracts tool request details, including names and inputs. It then executes the tool through the UtilityHelper, and the tool use results are returned back to the model. The MCP host then continues the conversation with the tool results incorporated.

async def _handle_response(self, response):
    # Add the response to the conversation history
    self.messages.append(response['output']['message'])
    # Check the stop reason
    stop_reason = response['stopReason']
    if stop_reason == 'tool_use':
        # Extract tool use details and execute
        tool_response = []
        for content_item in response['output']['message']['content']:
            if 'toolUse' in content_item:
                tool_request = {
                    "toolUseId": content_item['toolUse']['toolUseId'],
                    "name": content_item['toolUse']['name'],
                    "input": content_item['toolUse']['input']
                }
                tool_result = await self.tools.execute_tool(tool_request)
                tool_response.append({'toolResult': tool_result})
        # Continue conversation with tool results
        return await self.invoke(tool_response)

The second file is utility.py. The UtilityHelper class in utility.py serves as a bridge between Amazon Bedrock and external tools. It manages tool registration, formatting tool specifications for Bedrock compatibility, and tool execution.

def register_tool(self, name, func, description, input_schema):
    corrected_name = UtilityHelper._correct_name(name)
    self._name_mapping[corrected_name] = name
    self._tools[corrected_name] = {
        "function": func,
        "description": description,
        "input_schema": input_schema,
        "original_name": name,
    }

For Amazon Bedrock to understand available tools from MCP servers, the utility module generates tool specifications by providing name, description, and inputSchema in the following function:

def get_tools(self):
    tool_specs = []
    for corrected_name, tool in self._tools.items():
        # Ensure the inputSchema.json.type is explicitly set to 'object'
        input_schema = tool["input_schema"].copy()
        if 'json' in input_schema and 'type' not in input_schema['json']:
            input_schema['json']['type'] = 'object'
        tool_specs.append(
            {
                "toolSpec": {
                    "name": corrected_name,
                    "description": tool["description"],
                    "inputSchema": input_schema,
                }
            }
        )
    return {"tools": tool_specs}

When the model requests a tool, the utility module executes it and formats the result:

async def execute_tool(self, payload):
    tool_use_id = payload["toolUseId"]
    corrected_name = payload["name"]
    tool_input = payload["input"]
    # Find and execute the tool
    tool_func = self._tools[corrected_name]["function"]
    original_name = self._tools[corrected_name]["original_name"]
    # Execute the tool
    result_data = await tool_func(original_name, tool_input)
    # Format and return the result
    return {
        "toolUseId": tool_use_id,
        "content": [{"text": str(result)}],
    }

The final component in the MCP host is the gradio_app.py file, which implements a web-based interface for our AI assistant using Gradio. First, it initializes the model configurations and the agent, then connects to MCP servers and retrieves available tools from the MCP servers.

async def initialize_agent():
  """Initialize Bedrock agent and connect to MCP tools"""
  # Initialize model configuration from config.py
  model_id = AWS_CONFIG["model_id"]
  region = AWS_CONFIG["region"]
  # Set up the agent and tool manager
  agent = BedrockConverseAgent(model_id, region)
  agent.tools = UtilityHelper()
  # Define the agent's behavior through system prompt
  agent.system_prompt = """
  You are a helpful assistant that can use tools to help you answer questions and perform tasks.
  Please remember and save user's preferences into memory based on user questions and conversations.
  """
  # Connect to MCP servers and register tools
  # ...
  return agent, mcp_clients, available_tools

When a user sends a message, the app processes it through the agent invoke_with_prompt() function. The response from the model is displayed on the Gradio GUI:

async def process_message(message, history):
  """Process a message from the user and get a response from the agent"""
  global agent
  if agent is None:
      # First-time initialization
      agent, mcp_clients, available_tools = await initialize_agent()
  try:
      # Process message and get response
      response = await agent.invoke_with_prompt(message)
      # Return the response
      return response
  except Exception as e:
      logger.error(f"Error processing message: {e}")
      return f"I encountered an error: {str(e)}"

MCP client implementation

MCP clients serve as intermediaries between the AI model and the MCP server. Each client maintains a one-to-one session with a server, managing the lifecycle of interactions, including handling interruptions, timeouts, and reconnections. MCP clients route protocol messages bidirectionally between the host application and the server. They parse responses, handle errors, and make sure that the data is relevant and appropriately formatted for the AI model. They also facilitate the invocation of tools exposed by the MCP server and manage the context so that the AI model has access to the necessary resources and tools for its tasks.

The following function in the mcpclient.py file is designed to establish connections to MCP servers and manage connection sessions.

async def connect(self):
  """
  Establishes connection to MCP server.
  Sets up stdio client, initializes read/write streams,
  and creates client session.
  """
  # Initialize stdio client with server parameters
  self._client = stdio_client(self.server_params)
  # Get read/write streams
  self.read, self.write = await self._client.__aenter__()
  # Create and initialize session
  session = ClientSession(self.read, self.write)
  self.session = await session.__aenter__()
  await self.session.initialize()

After it’s connected with MCP servers, the client lists available tools from each MCP server with their specifications:

async def get_available_tools(self):
    """List available tools from the MCP server."""
    if not self.session:
        raise RuntimeError("Not connected to MCP server")
    response = await self.session.list_tools()
    # Extract and format tools
    tools = response.tools if hasattr(response, 'tools') else []
    formatted_tools = [
        {
            'name': tool.name,
            'description': str(tool.description),
            'inputSchema': {
                'json': {
                    'type': 'object',
                    'properties': tool.inputSchema.get('properties', {}),
                    'required': tool.inputSchema.get('required', [])
                }
            }
        }
        for tool in tools
    ]
    return formatted_tools

When a tool is defined and called, the client first validates the session is active, then executes the tool through the MCP session that is established between client and server. Finally, it returns the structured response.

async def call_tool(self, tool_name, arguments):
    # Execute tool
    start_time = time.time()
    result = await self.session.call_tool(tool_name, arguments=arguments)
    execution_time = time.time() - start_time
    # Augment result with server info
    return {
        "result": result,
        "tool_info": {
            "tool_name": tool_name,
            "server_name": server_name,
            "server_info": server_info,
            "execution_time": f"{execution_time:.2f}s"
        }
    }

MCP server configuration

The server_configs.py file defines the MCP tool servers that our application will connect to. This configuration sets up Google Maps MCP server with an API key, adds a time server for date and time operations, and includes a memory server for storing conversation context. Each server is defined as a StdioServerParameters object, which specifies how to launch the server process using Node.js (using npx). You can add or remove MCP server configurations based on your application objectives and requirements.

from mcp import StdioServerParameters
SERVER_CONFIGS = [
        StdioServerParameters(
            command="npx",
            args=["-y", "@modelcontextprotocol/server-google-maps"],
            env={"GOOGLE_MAPS_API_KEY": ""}
        ),
        StdioServerParameters(
            command="npx",
            args=["-y", "time-mcp"],
        ),
        StdioServerParameters(
            command="npx",
            args=["@modelcontextprotocol/server-memory"]
            )
]

Alternative implementation: Strands Agent framework

For developers seeking a more streamlined approach to building MCP-powered applications, the Strands Agents framework provides an alternative that significantly reduces implementation complexity while maintaining full MCP compatibility. This section demonstrates how the same functionality can be achieved with substantially less code using Strands Agents. The code sample is available in this Git repository.

First, initialize the model and provide the Mistral model ID on Amazon Bedrock.

from strands import Agent
from strands.tools.mcp import MCPClient
from strands.models import BedrockModel
# Initialize the Bedrock model
bedrock_model = BedrockModel(
    model_id="us.mistral.pixtral-large-2502-v1:0",
    streaming=False
)

The following code creates multiple MCP clients from server configurations, automatically manages their lifecycle using context managers, collects available tools from each client, and initializes an AI agent with the unified set of tools.

from contextlib import ExitStack
from mcp import stdio_client
# Create MCP clients with automatic lifecycle management
mcp_clients = [
    MCPClient(lambda cfg=server_config: stdio_client(cfg))
    for server_config in SERVER_CONFIGS
]
with ExitStack() as stack:
    # Enter all MCP clients automatically
    for mcp_client in mcp_clients:
        stack.enter_context(mcp_client)
    
    # Aggregate tools from all clients
    tools = []
    for i, mcp_client in enumerate(mcp_clients):
        client_tools = mcp_client.list_tools_sync()
        tools.extend(client_tools)
        logger.info(f"Loaded {len(client_tools)} tools from client {i+1}")
    
    # Create agent with unified tool registry
    agent = Agent(model=bedrock_model, tools=tools, system_prompt=system_prompt)

The following function processes user messages with optional image inputs by formatting them for multimodal AI interaction, sending them to an agent that handles tool routing and response generation, and returning the agent’s text response:

def process_message(message, image=None):
    """Process user message with optional image input"""
    try:
        if image is not None:
            # Convert PIL image to Bedrock format
            image_data = convert_image_to_bytes(image)
            if image_data:
                # Create multimodal message structure
                multimodal_message = {
                    "role": "user",
                    "content": [
                        {
                            "image": {
                                "format": image_data['format'],
                                "source": {"bytes": image_data['bytes']}
                            }
                        },
                        {
                            "text": message if message.strip() else "Please analyze the content of the image."
                        }
                    ]
                }
                agent.messages.append(multimodal_message)
        
        # Single call handles tool routing and response generation
        response = agent(message)
        
        # Extract response content
        return response.text if hasattr(response, 'text') else str(response)
        
    except Exception as e:
        return f"Error: {str(e)}"

The Strands Agents approach streamlines MCP integration by reducing code complexity, automating resource management, and unifying tools from multiple servers into a single interface. It also offers built-in error handling and native multimodal support, minimizing manual effort and enabling more robust, efficient development.

Demo

This demo showcases an intelligent food recognition application with integrated location services. Users can submit an image of a dish, and the AI assistant:

    1. Accurately identifies the cuisine from the image
    2. Provides restaurant recommendations based on the identified food
    3. Offers route planning powered by the Google Maps MCP server

The application demonstrates sophisticated multi-server collaboration to answer complex queries such as “Is the restaurant open when I arrive?” To answer this, the system:

  1. Determines the current time in the user’s location using the time MCP server
  2. Retrieves restaurant operating hours and calculates travel time using the Google Maps MCP server
  3. Synthesizes this information to provide a clear, accurate response

We encourage you to modify the solution by adding additional MCP server configurations tailored to your specific personal or business requirements.

MCP applicaiton demo

Clean up

When you finish experimenting with this example, delete the SageMaker endpoints that you created in the process:

  1. Go to Amazon SageMaker console
  2. In the left navigation pane, choose Inference and then choose Endpoints
  3. From the endpoints list, delete the ones that you created from Amazon Bedrock Marketplace and SageMaker JumpStart.

Conclusion

This post covers how integrating MCP with Mistral AI models on AWS enables the rapid development of intelligent applications that interact seamlessly with external systems. By standardizing tool use, developers can focus on core logic while keeping AI reasoning and tool execution cleanly separated, improving maintainability and scalability. The Strands Agent framework enhances this by streamlining implementation without sacrificing MCP compatibility. With AWS offering flexible deployment options, from Amazon Bedrock to Amazon Bedrock Marketplace and SageMaker, this approach balances performance and cost. The solution demonstrates how even lightweight setups can connect AI to real-time services.

We encourage developers to build upon this foundation by incorporating additional MCP servers tailored to their specific requirements. As the landscape of MCP-compatible tools continues to expand, organizations can create increasingly sophisticated AI assistants that effectively reason over external knowledge and take meaningful actions, accelerating the adoption of practical, agentic AI systems across industries while reducing implementation barriers.

Ready to implement MCP in your own projects? Explore the official AWS MCP server repository for examples and reference implementations. For more information about the Strands Agents framework, which simplifies agent building with its intuitive, code-first approach to data source integration, visit Strands Agent. Finally, dive deeper into open protocols for agent interoperability in the recent AWS blog post: Open Protocols for Agent Interoperability, which explores how these technologies are shaping the future of AI agent development.


About the authors

Ying Hou, PhD, is a Sr. Specialist Solution Architect for Gen AI at AWS, where she collaborates with model providers to onboard the latest and most intelligent AI models onto AWS platforms. With deep expertise in Gen AI, ASR, computer vision, NLP, and time-series forecasting models, she works closely with customers to design and build cutting-edge ML and GenAI applications.

Siddhant Waghjale, is an Applied AI Engineer at Mistral AI, where he works on challenging customer use cases and applied science, helping customers achieve their goals with Mistral models. He’s passionate about building solutions that bridge  AI capabilities with actual business applications, specifically in agentic workflows and code generation.

Samuel-BarrySamuel Barry is an Applied AI Engineer at Mistral AI, where he helps organizations design, deploy, and scale cutting-edge AI systems. He partners with customers to deliver high-impact solutions across a range of use cases, including RAG, agentic workflows, fine-tuning, and model distillation. Alongside engineering efforts, he also contributes to applied research initiatives that inform and strengthen production use cases.

Preston TugglePreston Tuggle is a Sr. Specialist Solutions Architect with the Third-Party Model Provider team at AWS. He focuses on working with model providers across Amazon Bedrock and Amazon SageMaker, helping them accelerate their go-to-market strategies through technical scaling initiatives and customer engagement.



Source link

Continue Reading

AI Research

Google’s open MedGemma AI models could transform healthcare

Published

on


Instead of keeping their new MedGemma AI models locked behind expensive APIs, Google will hand these powerful tools to healthcare developers.

The new arrivals are called MedGemma 27B Multimodal and MedSigLIP and they’re part of Google’s growing collection of open-source healthcare AI models. What makes these special isn’t just their technical prowess, but the fact that hospitals, researchers, and developers can download them, modify them, and run them however they see fit.

Google’s AI meets real healthcare

The flagship MedGemma 27B model doesn’t just read medical text like previous versions did; it can actually “look” at medical images and understand what it’s seeing. Whether it’s chest X-rays, pathology slides, or patient records potentially spanning months or years, it can process all of this information together, much like a doctor would.

The performance figures are quite impressive. When tested on MedQA, a standard medical knowledge benchmark, the 27B text model scored 87.7%. That puts it within spitting distance of much larger, more expensive models whilst costing about a tenth as much to run. For cash-strapped healthcare systems, that’s potentially transformative.

The smaller sibling, MedGemma 4B, might be more modest in size but it’s no slouch. Despite being tiny by modern AI standards, it scored 64.4% on the same tests, making it one of the best performers in its weight class. More importantly, when US board-certified radiologists reviewed chest X-ray reports it had written, they deemed 81% accurate enough to guide actual patient care.

MedSigLIP: A featherweight powerhouse

Alongside these generative AI models, Google has released MedSigLIP. At just 400 million parameters, it’s practically featherweight compared to today’s AI giants, but it’s been specifically trained to understand medical images in ways that general-purpose models cannot.

This little powerhouse has been fed a diet of chest X-rays, tissue samples, skin condition photos, and eye scans. The result? It can spot patterns and features that matter in medical contexts whilst still handling everyday images perfectly well.

MedSigLIP creates a bridge between images and text. Show it a chest X-ray, and ask it to find similar cases in a database, and it’ll understand not just visual similarities but medical significance too.

Healthcare professionals are putting Google’s AI models to work

The proof of any AI tool lies in whether real professionals actually want to use it. Early reports suggest doctors and healthcare companies are excited about what these models can do.

DeepHealth in Massachusetts has been testing MedSigLIP for chest X-ray analysis. They’re finding it helps spot potential problems that might otherwise be missed, acting as a safety net for overworked radiologists. Meanwhile, at Chang Gung Memorial Hospital in Taiwan, researchers have discovered that MedGemma works with traditional Chinese medical texts and answers staff questions with high accuracy.

Tap Health in India has highlighted something crucial about MedGemma’s reliability. Unlike general-purpose AI that might hallucinate medical facts, MedGemma seems to understand when clinical context matters. It’s the difference between a chatbot that sounds medical and one that actually thinks medically.

Why open-sourcing the AI models is critical in healthcare

Beyond generosity, Google’s decision to make these models is also strategic. Healthcare has unique requirements that standard AI services can’t always meet. Hospitals need to know their patient data isn’t leaving their premises. Research institutions need models that won’t suddenly change behaviour without warning. Developers need the freedom to fine-tune for very specific medical tasks.

By open-sourcing the AI models, Google has addressed these concerns with healthcare deployments. A hospital can run MedGemma on their own servers, modify it for their specific needs, and trust that it’ll behave consistently over time. For medical applications where reproducibility is crucial, this stability is invaluable.

However, Google has been careful to emphasise that these models aren’t ready to replace doctors. They’re tools that require human oversight, clinical correlation, and proper validation before any real-world deployment. The outputs need checking, the recommendations need verifying, and the decisions still rest with qualified medical professionals.

This cautious approach makes sense. Even with impressive benchmark scores, medical AI can still make mistakes, particularly when dealing with unusual cases or edge scenarios. The models excel at processing information and spotting patterns, but they can’t replace the judgment, experience, and ethical responsibility that human doctors bring.

What’s exciting about this release isn’t just the immediate capabilities, but what it enables. Smaller hospitals that couldn’t afford expensive AI services can now access cutting-edge technology. Researchers in developing countries can build specialised tools for local health challenges. Medical schools can teach students using AI that actually understands medicine.

The models are designed to run on single graphics cards, with the smaller versions even adaptable for mobile devices. This accessibility opens doors for point-of-care AI applications in places where high-end computing infrastructure simply doesn’t exist.

As healthcare continues grappling with staff shortages, increasing patient loads, and the need for more efficient workflows, AI tools like Google’s MedGemma could provide some much-needed relief. Not by replacing human expertise, but by amplifying it and making it more accessible where it’s needed most.

(Photo by Owen Beard)

See also: Tencent improves testing creative AI models with new benchmark

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.



Source link

Continue Reading

AI Research

Pope: AI development must build bridges of dialogue and promote fraternity

Published

on


In a message signed by the Cardinal Secretary of State Pietro Parolin, to the United Nations’ AI for Good Summit happening in Geneva, Pope Leo XIV encourages nations to create frameworks and regulations to work for the common good.

By Isabella H. de Carvalho

Pope Leo XIV encouraged nations to establish frameworks and regulations on AI so that it can be developed and used according to the common good, in a message sent on July 10 to the participants of the AI for Good Summit, taking place in Geneva, Switzerland, from July 8 to 11.  

“I would like to take this opportunity to encourage you to seek ethical clarity and to establish a coordinated local and global governance of AI, based on the shared recognition of the inherent dignity and fundamental freedoms of the human person”, the message, signed by the Secretary of State, Cardinal Pietro Parolin, said.

The summit is organized by the United Nations’ International Telecommunication Union (ITU) and is co-hosted by the Swiss government. The event sees the participation of governments, tech leaders, academics and others who are interested and work with AI.

In this “era of profound innovation” where many are reflecting on “what it means to be human”, the world “is at crossroads, facing the immense potential generated by the digital revolution driven by Artificial Intelligence”, the Pope highlighted in his message. 

AI requires ethical management and regulatory frameworks 

“As AI becomes capable of adapting autonomously to many situations by making purely technical algorithmic choices, it is crucial to consider its anthropological and ethical implications, the values at stake and the duties and regulatory frameworks required to uphold those values”, the Pope underlined in his message. 

He emphasized that the “responsibility for the ethical use of AI systems begins with those who develop, manage and oversee them” but users also need to share this mission. AI “requires proper ethical management and regulatory frameworks centered on the human person, and which goes beyond the mere criteria of utility or efficiency,” the Pope insisted. 

Building peaceful societies 

Citing St. Augustine’s concept of the “tranquility of order”, Pope Leo highlighted that this should be the common goal and thus AI should foster “more human order of social relations” and “peaceful and just societies in the service of integral human development and the good of the human family”. 

While AI can simulate human reasoning and perform tasks quickly and efficiently or transform areas such as “education, work, art, healthcare, governance, the military, and communication”, “it cannot replicate moral discernment or the ability to form genuine relationships”, Pope Leo warned. 

For him the development of this technology “must go hand in hand with respect for human and social values, the capacity to judge with a clear conscience, and growth in human responsibility”. It requires “discernment to ensure that AI is developed and utilized for the common good, building bridges of dialogue and fostering fraternity”, the Pope urged. AI needs to serve “the interests of humanity as a whole”.



Source link

Continue Reading

Trending