Books, Courses & Certifications

Context extraction from image files in Amazon Q Business using LLMs

Published

7 days ago

June 30, 2025

To effectively convey complex information, organizations increasingly rely on visual documentation through diagrams, charts, and technical illustrations. Although text documents are well-integrated into modern knowledge management systems, rich information contained in diagrams, charts, technical schematics, and visual documentation often remains inaccessible to search and AI assistants. This creates significant gaps in organizational knowledge bases, leading to interpreting visual data manually and preventing automation systems from using critical visual information for comprehensive insights and decision-making. While Amazon Q Business already handles embedded images within documents, the custom document enrichment (CDE) feature extends these capabilities significantly by processing standalone image files (for example, JPGs and PNGs).

In this post, we look at a step-by-step implementation for using the CDE feature within an Amazon Q Business application. We walk you through an AWS Lambda function configured within CDE to process various image file types, and we showcase an example scenario of how this integration enhances the Amazon Q Business ability to provide comprehensive insights. By following this practical guide, you can significantly expand your organization’s searchable knowledge base, enabling more complete answers and insights that incorporate both textual and visual information sources.

Example scenario: Analyzing regional educational demographics

Consider a scenario where you’re working for a national educational consultancy that has charts, graphs, and demographic data across different AWS Regions stored in an Amazon Simple Storage Service (Amazon S3) bucket. The following image shows student distribution by age range across various cities using a bar chart. The insights in visualizations like this are valuable for decision-making but traditionally locked within image formats in your S3 buckets and other storage.

With Amazon Q Business and CDE, we show you how to enable natural language queries against such visualizations. For example, your team could ask questions such as “Which city has the highest number of students in the 13–15 age range?” or “Compare the student demographics between City 1 and City 4” directly through the Amazon Q Business application interface.

You can bridge this gap using the Amazon Q Business CDE feature to:

Detect and process image files during the document ingestion process
Use Amazon Bedrock with AWS Lambda to interpret the visual information
Extract structured data and insights from charts and graphs
Make this information searchable using natural language queries

Solution overview

In this solution, we walk you through how to implement a CDE-based solution for your educational demographic data visualizations. The solution empowers organizations to extract meaningful information from image files using the CDE capability of Amazon Q Business. When Amazon Q Business encounters the S3 path during ingestion, CDE rules automatically trigger a Lambda function. The Lambda function identifies the image files and calls the Amazon Bedrock API, which uses multimodal large language models (LLMs) to analyze and extract contextual information from each image. The extracted text is then seamlessly integrated into the knowledge base in Amazon Q Business. End users can then quickly search for valuable data and insights from images based on their actual context. By bridging the gap between visual content and searchable text, this solution helps organizations unlock valuable insights previously hidden within their image repositories.

The following figure shows the high-level architecture diagram used for this solution.

Arch Diagram

For this use case, we use Amazon S3 as our data source. However, this same solution is adaptable to other data source types supported by Amazon Q Business, or it can be implemented with custom data sources as needed.To complete the solution, follow these high-level implementation steps:

Create an Amazon Q Business application and sync with an S3 bucket.
Configure the Amazon Q Business application CDE for the Amazon S3 data source.
Extract context from the images.

Prerequisites

The following prerequisites are needed for implementation:

An AWS account.
At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business. For pricing information, refer to Amazon Q Business pricing.
AWS Identity and Access Management (IAM) permissions to create and manage IAM roles and policies.
A supported data source to connect, such as an S3 bucket containing your public documents.
Access to an Amazon Bedrock LLM in the required AWS Region.

Create an Amazon Q Business application and sync with an S3 bucket

To create an Amazon Q Business application and connect it to your S3 bucket, complete the following steps. These steps provide a general overview of how to create an Amazon Q Business application and synchronize it with an S3 bucket. For more comprehensive, step-by-step guidance, follow the detailed instructions in the blog post Discover insights from Amazon S3 with Amazon Q S3 connector.

Initiate your application setup through either the AWS Management Console or AWS Command Line Interface (AWS CLI).
Create an index for your Amazon Q Business application.
Use the built-in Amazon S3 connector to link your application with documents stored in your organization’s S3 buckets.

Configure the Amazon Q Business application CDE for the Amazon S3 data source

With the CDE feature of Amazon Q Business, you can make the most of your Amazon S3 data sources by using the sophisticated capabilities to modify, enhance, and filter documents during the ingestion process, ultimately making enterprise content more discoverable and valuable. When connecting Amazon Q Business to S3 repositories, you can use CDE to seamlessly transform your raw data, applying modifications that significantly improve search quality and information accessibility. This powerful functionality extends to extracting context from binary files such as images through integration with Amazon Bedrock services, enabling organizations to unlock insights from previously inaccessible content formats. By implementing CDE for Amazon S3 data sources, businesses can maximize the utility of their enterprise data within Amazon Q, creating a more comprehensive and intelligent knowledge base that responds effectively to user queries.To configure the Amazon Q Business application CDE for the Amazon S3 data source, complete the following steps:

Select your application and navigate to Data sources.
Choose your existing Amazon S3 data source or create a new one. Verify that Audio/Video under Multi-media content configuration is not enabled.
In the data source configuration, locate the Custom Document Enrichment section.
Configure the pre-extraction rules to trigger a Lambda function when specific S3 bucket conditions are satisfied. Check the following screenshot for an example configuration.

Reference Settings
Pre-extraction rules are executed before Amazon Q Business processes files from your S3 bucket.

Extract context from the images

To extract insights from an image file, the Lambda function makes an Amazon Bedrock API call using Anthropic’s Claude 3.7 Sonnet model. You can modify the code to use other Amazon Bedrock models based on your use case.

Constructing the prompt is a critical piece of the code. We recommend trying various prompts to get the desired output for your use case. Amazon Bedrock offers the capability to optimize a prompt that you can use to enhance your use case specific input.

Examine the following Lambda function code snippets, written in Python, to understand the Amazon Bedrock model setup along with a sample prompt to extract insights from an image.

In the following code snippet, we start by importing relevant Python libraries, define constants, and initialize AWS SDK for Python (Boto3) clients for Amazon S3 and Amazon Bedrock runtime. For more information, refer to the Boto3 documentation.

import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name="us-east-1"))

The prompt passed to the Amazon Bedrock model, Anthropic’s Claude 3.7 Sonnet in this case, is broken into two parts: prompt_prefix and prompt_suffix. The prompt breakdown makes it more readable and manageable. Additionally, the Amazon Bedrock prompt caching feature can be used to reduce response latency as well as input token cost. You can modify the prompt to extract information based on your specific use case as needed.

prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various """
"""types of images. These images may include technical diagrams,"""
""" graphs and charts, categorization diagrams, data flow and process flow diagrams,"""
""" hierarchical and timeline diagrams, infographics, """
"""screenshots and product diagrams/images from user manuals. """
""" The description of these images needs to be very detailed so that user can ask """
""" questions based on the image, which can be answered by only looking at the descriptions """
""" that you generate.
Here is the image you need to analyze:


"""

prompt_suffix = """


Please follow these steps to analyze the image and generate a comprehensive description:

1. Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.

2. Items:
   Carefully examine the image and extract all entities, texts, and numbers present. List these elements in  tags.

3. Detailed Description:
   Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other.  Capture all the crucial details that can be used to answer any followup questions. Write this description in  tags.

4. Data Estimation (for charts and graphs only):
   If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart.
   If exact values cannot be inferred, provide an estimated range for each value in  tags.
   If no data is present, respond with "No data found".

Present your analysis in the following format:



[Classify the image type here]



[List all extracted entities, texts, and numbers here]



[Provide a detailed description of the image here]



[If applicable, provide estimated number ranges for chart elements here]



Remember to be thorough and precise in your analysis. If you're unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
"""

The lambda_handler is the main entry point for the Lambda function. While invoking this Lambda function, the CDE passes the data source’s information within event object input. In this case, the S3 bucket and the S3 object key are retrieved from the event object along with the file format. Further processing of the input happens only if the file_format matches the expected file types. For production ready code, implement proper error handling for unexpected errors.

def lambda_handler(event, context):
    logger.info("Received event: %s" % json.dumps(event))
    s3Bucket = event.get("s3Bucket")
    s3ObjectKey = event.get("s3ObjectKey")
    metadata = event.get("metadata")
    file_format = s3ObjectKey.lower().split('.')[-1]
    new_key = 'cde_output/' + s3ObjectKey + '.txt'
    if (file_format in FILE_FORMATS):
        afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format)
        s3.put_object(Bucket = s3Bucket, Key = new_key, Body=afterCDE)
    return {
        "version" : "v0",
        "s3ObjectKey": new_key,
        "metadataUpdates": []
    }

The generate_image_description function calls two other functions: first to construct the message that is passed to the Amazon Bedrock model and second to invoke the model. It returns the final text output extracted from the image file by the model invocation.

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str:
    """
    Generate a description for an image.
    Inputs:
        image_file: str - Path to the image file
    Output:
        str - Generated image description
    """
    messages = _llm_input(s3Bucket, s3ObjectKey, file_format)
    response = _invoke_model(messages)
    return response['output']['message']['content'][0]['text']

The _llm_input function takes in the S3 object’s details passed as input along with the file type (png, jpg) and builds the message in the format expected by the model invoked by Amazon Bedrock.

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]:
    s3_response = s3.get_object(Bucket = s3Bucket, Key = s3ObjectKey)
    image_content = s3_response['Body'].read()
    message = {
        "role": "user",
        "content": [
            {"text": prompt_prefix},
            {
                "image": {
                    "format": file_format,
                    "source": {
                        "bytes": image_content
                    }
                }
            },
            {"text": prompt_suffix}
        ]
    }
    return [message]

The _invoke_model function calls the converse API using the Amazon Bedrock runtime client. This API returns the response generated by the model. The values within inferenceConfig settings for maxTokens and temperature are used to limit the length of the response and make the responses more deterministic (less random) respectively.

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Call the Bedrock model with retry logic.
    Input:
        messages: List[Dict[str, Any]] - Prepared messages for the model
    Output:
        Dict[str, Any] - Model response
    """
    for attempt in range(MAX_RETRIES):
        try:
            response = bedrock.converse(
                modelId=MODEL_ID,
                messages=messages,
                inferenceConfig={
                    "maxTokens": MAX_TOKENS,
                    "temperature": 0,
                }
            )
            return response
        except Exception as e:
            print(e)
    
    raise Exception(f"Failed to call model after {MAX_RETRIES} attempts")

Putting all the preceding code pieces together, the full Lambda function code is shown in the following block:

# Example Lambda function for image processing
import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = ("jpg", "jpeg", "png")

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
bedrock = boto3.client('bedrock-runtime', config=Config(read_timeout=3600, region_name="us-east-1"))

prompt_prefix = """You are an expert image reader tasked with generating detailed descriptions for various """
"""types of images. These images may include technical diagrams,"""
""" graphs and charts, categorization diagrams, data flow and process flow diagrams,"""
""" hierarchical and timeline diagrams, infographics, """
"""screenshots and product diagrams/images from user manuals. """
""" The description of these images needs to be very detailed so that user can ask """
""" questions based on the image, which can be answered by only looking at the descriptions """
""" that you generate.
Here is the image you need to analyze:


"""

prompt_suffix = """


Please follow these steps to analyze the image and generate a comprehensive description:

1. Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.

2. Items:
   Carefully examine the image and extract all entities, texts, and numbers present. List these elements in  tags.

3. Detailed Description:
   Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other.  Capture all the crucial details that can be used to answer any followup questions. Write this description in  tags.

4. Data Estimation (for charts and graphs only):
   If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart.
   If exact values cannot be inferred, provide an estimated range for each value in  tags.
   If no data is present, respond with "No data found".

Present your analysis in the following format:



[Classify the image type here]



[List all extracted entities, texts, and numbers here]



[Provide a detailed description of the image here]



[If applicable, provide estimated number ranges for chart elements here]



Remember to be thorough and precise in your analysis. If you're unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
"""

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]:
    s3_response = s3.get_object(Bucket = s3Bucket, Key = s3ObjectKey)
    image_content = s3_response['Body'].read()
    message = {
        "role": "user",
        "content": [
            {"text": prompt_prefix},
            {
                "image": {
                    "format": file_format,
                    "source": {
                        "bytes": image_content
                    }
                }
            },
            {"text": prompt_suffix}
        ]
    }
    return [message]

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]:
    """
    Call the Bedrock model with retry logic.
    Input:
        messages: List[Dict[str, Any]] - Prepared messages for the model
    Output:
        Dict[str, Any] - Model response
    """
    for attempt in range(MAX_RETRIES):
        try:
            response = bedrock.converse(
                modelId=MODEL_ID,
                messages=messages,
                inferenceConfig={
                    "maxTokens": MAX_TOKENS,
                    "temperature": 0,
                }
            )
            return response
        except Exception as e:
            print(e)
    
    raise Exception(f"Failed to call model after {MAX_RETRIES} attempts")

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str:
    """
    Generate a description for an image.
    Inputs:
        image_file: str - Path to the image file
    Output:
        str - Generated image description
    """
    messages = _llm_input(s3Bucket, s3ObjectKey, file_format)
    response = _invoke_model(messages)
    return response['output']['message']['content'][0]['text']

def lambda_handler(event, context):
    logger.info("Received event: %s" % json.dumps(event))
    s3Bucket = event.get("s3Bucket")
    s3ObjectKey = event.get("s3ObjectKey")
    metadata = event.get("metadata")
    file_format = s3ObjectKey.lower().split('.')[-1]
    new_key = 'cde_output/' + s3ObjectKey + '.txt'
    if (file_format in FILE_FORMATS):
        afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format)
        s3.put_object(Bucket = s3Bucket, Key = new_key, Body=afterCDE)
    return {
        "version" : "v0",
        "s3ObjectKey": new_key,
        "metadataUpdates": []
    }

We strongly recommend testing and validating code in a nonproduction environment before deploying it to production. In addition to Amazon Q pricing, this solution will incur charges for AWS Lambda and Amazon Bedrock. For more information, refer to AWS Lambda pricing and Amazon Bedrock pricing.

After the Amazon S3 data is synced with the Amazon Q index, you can prompt the Amazon Q Business application to get the extracted insights as shown in the following section.

Example prompts and results

The following question and answer pairs refer the Student Age Distribution graph at the beginning of this post.

Q: Which City has the highest number of students in the 13-15 age range?

Natural Language Query Response

Q: Compare the student demographics between City 1 and City 4?

Natural Language Query Response

In the original graph, the bars representing student counts lacked explicit numerical labels, which could make data interpretation challenging on a scale. However, with Amazon Q Business and its integration capabilities, this limitation can be overcome. By using Amazon Q Business to process these visualizations with Amazon Bedrock LLMs using the CDE feature, we’ve enabled a more interactive and insightful analysis experience. The service effectively extracts the contextual information embedded in the graph, even when explicit labels are absent. This powerful combination means that end users can ask questions about the visualization and receive responses based on the underlying data. Rather than being limited by what’s explicitly labeled in the graph, users can now explore deeper insights through natural language queries. This capability demonstrates how Amazon Q Business transforms static visualizations into queryable knowledge assets, enhancing the value of your existing data visualizations without requiring additional formatting or preparation work.

Best practices for Amazon S3 CDE configuration

When setting up CDE for your Amazon S3 data source, consider these best practices:

Use conditional rules to only process specific file types that need transformation.
Monitor Lambda execution with Amazon CloudWatch to track processing errors and performance.
Set appropriate timeout values for your Lambda functions, especially when processing large files.
Consider incremental syncing to process only new or modified documents in your S3 bucket.
Use document attributes to track which documents have been processed by CDE.

Cleanup

Complete the following steps to clean up your resources:

Go to the Amazon Q Business application and select Remove and unsubscribe for users and groups.
Delete the Amazon Q Business application.
Delete the Lambda function.
Empty and delete the S3 bucket. For instructions, refer to Deleting a general purpose bucket.

Conclusion

This solution demonstrates how combining Amazon Q Business, custom document enrichment, and Amazon Bedrock can transform static visualizations into queryable knowledge assets, significantly enhancing the value of existing data visualizations without additional formatting work. By using these powerful AWS services together, organizations can bridge the gap between visual information and actionable insights, enabling users to interact with different file types in more intuitive ways.

Explore What is Amazon Q Business? and Getting started with Amazon Bedrock in the documentation to implement this solution for your specific use cases and unlock the potential of your visual data.

About the Authors

About the authors

Amit Chaudhary Amit Chaudhary is a Senior Solutions Architect at Amazon Web Services. His focus area is AI/ML, and he helps customers with generative AI, large language models, and prompt engineering. Outside of work, Amit enjoys spending time with his family.

Nikhil Jha Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, building Generative AI resources, and analytics. In his spare time, he enjoys exploring the outdoors with his family.

Source link

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Published

3 days ago

July 4, 2025

IndustryTrends

The year 2025 for AI education provides choices catering to learning style, career goal, and budget. The Logicmojo Advanced Data Science & AI Program has emerged as the top one, offering comprehensive training with proven results in placement for those wishing to pursue job-oriented training. It offers the kind of live training, projects, and career support that fellow professionals seek when interested in turning into a high-paying AI position.

On the other hand, for the independent learner seeking prestige credentials, a few other good options might include programs from Stanford, MIT, and DeepLearning.AI. Google and IBM certificates are an inexpensive footing for a beginner, while, at the opposite end of the spectrum, a Carnegie Mellon certificate is considered the ultimate academic credential in AI.

Whatever choice you make in 2025 to further your knowledge in AI will place you at the forefront of technology innovation. AI, expected to generate millions of jobs, has the potential to revolutionize every industry, and so whatever you learn today will be the deciding factor in your career waters for at least the next few decades.

Source link

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Published

3 days ago

July 4, 2025

John Terra

Artificial Intelligence and Machine Learning are noteworthy game-changers in today’s digital world. Technological wonders once limited to science fiction have become science fact, giving us innovations such as self-driving cars, intelligent voice-operated virtual assistants, and computers that learn and grow.

The two fields are making inroads into all areas of our lives, including the workplace, showing up in occupations such as Data Scientist and Digital Marketer. And for all the impressive things that Artificial Intelligence and Machine Learning have accomplished in the last ten years, there’s so much more in store.

Simplilearn wants today’s IT professionals to be better equipped to embrace these new technologies. Hence, it offers Machine Learning Bootcamp, held in conjunction with Caltech’s Center for Technology and Management Education (CTME) and in collaboration with IBM.

The bootcamp covers the relevant points of Artificial Intelligence and Machine Learning, exploring tools and concepts such as Python and TensorFlow. The course optimizes the academic excellence of Caltech and the industry prowess of IBM, creating an unbeatable learning resource that supercharges your skillset and prepares you to navigate the world of AI/ML better.

Why is This a Great Bootcamp?

When you bring together an impressive lineup of Simplilearn, Caltech, and IBM, you expect nothing less than an excellent result. The AI and Machine Learning Bootcamp delivers as promised.

This six-month program deals with vital AI/ML concepts such as Deep Learning, Statistics, and Data Science With Python. Here is a breakdown of the diverse and valuable information the bootcamp offers:

Orientation. The orientation session prepares you for the rigors of an intense, six-month learning experience, where you dedicate from five to ten hours a week to learning the latest in AI/ML skills and concepts.
Introduction to Artificial Intelligence. There’s a difference between AI and ML, and here’s where you start to learn this. This offering is a beginner course covering the basics of AI and workflows, Deep Learning, Machine Learning, and other details.
Python for Data Science. Many data scientists prefer to use the Python programming language when working with AI/ML. This section deals with Python, its libraries, and using a Jupyter-based lab environment to write scripts.
Applied Data Science with Python. Your exposure to Python continues with this study of Python’s tools and techniques used for Data Analytics.
Machine Learning. Now we come to the other half of the AI/ML partnership. You will learn all about Machine Learning’s chief techniques and concepts, including heuristic aspects, supervised/unsupervised learning, and developing algorithms.
Deep Learning with Keras and Tensorflow. This section shows you how to use Keras and TensorFlow frameworks to master Deep Learning models and concepts and prepare Deep Learning algorithms.
Advanced Deep Learning and Computer Vision. This advanced course takes Deep Learning to a new level. This module covers topics like Computer Vision for OCR and Object Detection, and Computer Vision Basics with Python.
Capstone project. Finally, it’s time to take what you have learned and implement your new AI/ML skills to solve an industry-relevant issue.

The course also offers students a series of electives:

Statistics Essentials for Data Science. Statistics are a vital part of Data Science, and this elective teaches you how to make data-driven predictions via statistical inference.
NLP and Speech Recognition. This elective covers speech-to-text conversion, text-to-speech conversion, automated speech recognition, voice-assistance devices, and much more.
Reinforcement Learning. Learn how to solve reinforcement learning problems by applying different algorithms and strategies like TensorFlow and Python.
Caltech Artificial Intelligence and Machine Learning Bootcamp Masterclass. These masterclasses are conducted by qualified Caltech and IBM instructors.

This AI and ML Bootcamp gives students a bounty of AI/ML-related benefits like:

Campus immersion, which includes an exclusive visit to Caltech’s robotics lab.
A program completion certificate from Caltech CTME.
A Caltech CTME Circle membership.
The chance to earn up to 22 CEUs courtesy of Caltech CTME.
An online convocation by the Caltech CTME Program Director.
A physical certificate from Caltech CTME if you request one.
Access to hackathons and Ask Me Anything sessions from IBM.
More than 25 hands-on projects and integrated labs across industry verticals.
A Level Up session by Andrew McAfee, Principal Research Scientist at MIT.
Access to Simplilearn’s Career Service, which will help you get noticed by today’s top hiring companies.
Industry-certified certificates for IBM courses.
Industry masterclasses delivered by IBM.
Hackathons from IBM.
Ask Me Anything (AMA) sessions held with the IBM leadership.

And these are the skills the course covers, all essential tools for working with today’s AI and ML projects:

Statistics
Python
Supervised Learning
Unsupervised Learning
Recommendation Systems
NLP
Neural Networks
GANs
Deep Learning
Reinforcement Learning
Speech Recognition
Ensemble Learning
Computer Vision

About Caltech CTME

Located in California, Caltech is a world-famous, highly respected science and engineering institution featuring some of today’s brightest scientific and technological minds. Contributions from Caltech alumni have earned worldwide acclaim, including over three dozen Nobel prizes. Caltech CTME instructors offer this quality of learning to our students by holding bootcamp master classes.

About IBM

IBM was founded in 1911 and has earned a reputation as the top IT industry leader and master of IT innovation.

How to Thrive in the Brave New World of AI and ML

Machine Learning and Artificial Intelligence have enormous potential to change our world for the better, but the fields need people of skill and vision to help lead the way. Somehow, there must be a balance between technological advancement and how it impacts people (quality of life, carbon footprint, job losses due to automation, etc.).

The AI and Machine Learning Bootcamp helps teach and train students, equipping them to assume a role of leadership in the new world that AI and ML offer.

Source link

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

Published

4 days ago

July 3, 2025

Andrew Stellman

Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying.

But if you’ve spent any real time coding with AI, you’ve probably hit a point where things stall. You keep refining your prompt and adjusting your approach, but the model keeps generating the same kind of answer, just phrased a little differently each time, and returning slight variations on the same incomplete solution. It feels close, but it’s not getting there. And worse, it’s not clear how to get back on track.

That moment is familiar to a lot of people trying to apply AI in real work. It’s what my recent talk at O’Reilly’s AI Codecon event was all about.

Over the last two years, while working on the latest edition of Head First C#, I’ve been developing a new kind of learning path, one that helps developers get better at both coding and using AI. I call it Sens-AI, and it came out of something I kept seeing:

There’s a learning gap with AI that’s creating real challenges for people who are still building their development skills.

My recent O’Reilly Radar article “Bridging the AI Learning Gap” looked at what happens when developers try to learn AI and coding at the same time. It’s not just a tooling problem—it’s a thinking problem. A lot of developers are figuring things out by trial and error, and it became clear to me that they needed a better way to move from improvising to actually solving problems.

From Vibe Coding to Problem Solving

Ask developers how they use AI, and many will describe a kind of improvisational prompting strategy: Give the model a task, see what it returns, and nudge it toward something better. It can be an effective approach because it’s fast, fluid, and almost effortless when it works.

That pattern is common enough to have a name: vibe coding. It’s a great starting point, and it works because it draws on real prompt engineering fundamentals—iterating, reacting to output, and refining based on feedback. But when something breaks, the code doesn’t behave as expected, or the AI keeps rehashing the same unhelpful answers, it’s not always clear what to try next. That’s when vibe coding starts to fall apart.

Senior developers tend to pick up AI more quickly than junior ones, but that’s not a hard-and-fast rule. I’ve seen brand-new developers pick it up quickly, and I’ve seen experienced ones get stuck. The difference is in what they do next. The people who succeed with AI tend to stop and rethink: They figure out what’s going wrong, step back to look at the problem, and reframe their prompt to give the model something better to work with.

When developers think critically, AI works better. (slide from my May 8, 2025, talk at O’Reilly AI Codecon)

The Sens-AI Framework

As I started working more closely with developers who were using AI tools to try to find ways to help them ramp up more easily, I paid attention to where they were getting stuck, and I started noticing that the pattern of an AI rehashing the same “almost there” suggestions kept coming up in training sessions and real projects. I saw it happen in my own work too. At first it felt like a weird quirk in the model’s behavior, but over time I realized it was a signal: The AI had used up the context I’d given it. The signal tells us that we need a better understanding of the problem, so we can give the model the information it’s missing. That realization was a turning point. Once I started paying attention to those breakdown moments, I began to see the same root cause across many developers’ experiences: not a flaw in the tools but a lack of framing, context, or understanding that the AI couldn’t supply on its own.

Over time—and after a lot of testing, iteration, and feedback from developers—I distilled the core of the Sens-AI learning path into five specific habits. They came directly from watching where learners got stuck, what kinds of questions they asked, and what helped them move forward. These habits form a framework that’s the intellectual foundation behind how Head First C# teaches developers to work with AI:

Context: Paying attention to what information you supply to the model, trying to figure out what else it needs to know, and supplying it clearly. This includes code, comments, structure, intent, and anything else that helps the model understand what you’re trying to do.
Research: Actively using AI and external sources to deepen your own understanding of the problem. This means running examples, consulting documentation, and checking references to verify what’s really going on.
Problem framing: Using the information you’ve gathered to define the problem more clearly so the model can respond more usefully. This involves digging deeper into the problem you’re trying to solve, recognizing what the AI still needs to know about it, and shaping your prompt to steer it in a more productive direction—and going back to do more research when you realize that it needs more context.
Refining: Iterating your prompts deliberately. This isn’t about random tweaks; it’s about making targeted changes based on what the model got right and what it missed, and using those results to guide the next step.
Critical thinking: Judging the quality of AI output rather than just simply accepting it. Does the suggestion make sense? Is it correct, relevant, plausible? This habit is especially important because it helps developers avoid the trap of trusting confident-sounding answers that don’t actually work.

These habits let developers get more out of AI while keeping control over the direction of their work.

From Stuck to Solved: Getting Better Results from AI

I’ve watched a lot of developers use tools like Copilot and ChatGPT—during training sessions, in hands-on exercises, and when they’ve asked me directly for help. What stood out to me was how often they assumed the AI had done a bad job. In reality, the prompt just didn’t include the information the model needed to solve the problem. No one had shown them how to supply the right context. That’s what the five Sens-AI habits are designed to address: not by handing developers a checklist but by helping them build a mental model for how to work with AI more effectively.

In my AI Codecon talk, I shared a story about my colleague Luis, a very experienced developer with over three decades of coding experience. He’s a seasoned engineer and an advanced AI user who builds content for training other developers, works with large language models directly, uses sophisticated prompting techniques, and has built AI-based analysis tools.

Luis was building a desktop wrapper for a React app using Tauri, a Rust-based toolkit. He pulled in both Copilot and ChatGPT, cross-checking output, exploring alternatives, and trying different approaches. But the code still wasn’t working.

Each AI suggestion seemed to fix part of the problem but break another part. The model kept offering slightly different versions of the same incomplete solution, never quite resolving the issue. For a while, he vibe-coded through it, adjusting the prompt and trying again to see if a small nudge would help, but the answers kept circling the same spot. Eventually, he realized the AI had run out of context and changed his approach. He stepped back, did some focused research to better understand what the AI was trying (and failing) to do, and applied the same habits I emphasize in the Sens-AI framework.

That shift changed the outcome. Once he understood the pattern the AI was trying to use, he could guide it. He reframed his prompt, added more context, and finally started getting suggestions that worked. The suggestions only started working once Luis gave the model the missing pieces it needed to make sense of the problem.

Applying the Sens-AI Framework: A Real-World Example

Before I developed the Sens-AI framework, I ran into a problem that later became a textbook case for it. I was curious whether COBOL, a decades-old language developed for mainframes that I had never used before but wanted to learn more about, could handle the basic mechanics of an interactive game. So I did some experimental vibe coding to build a simple terminal app that would let the user move an asterisk around the screen using the W/A/S/D keys. It was a weird little side project—I just wanted to see if I could make COBOL do something it was never really meant for, and learn something about it along the way.

The initial AI-generated code compiled and ran just fine, and at first I made some progress. I was able to get it to clear the screen, draw the asterisk in the right place, handle raw keyboard input that didn’t require the user to press Enter, and get past some initial bugs that caused a lot of flickering.

But once I hit a more subtle bug—where ANSI escape codes like ";10H" were printing literally instead of controlling the cursor—ChatGPT got stuck. I’d describe the problem, and it would generate a slightly different version of the same answer each time. One suggestion used different variable names. Another changed the order of operations. A few attempted to reformat the STRING statement. But none of them addressed the root cause.

*The COBOL app with a bug, printing a raw escape sequence instead of moving the asterisk.*

The pattern was always the same: slight code rewrites that looked plausible but didn’t actually change the behavior. That’s what a rehash loop looks like. The AI wasn’t giving me worse answers—it was just circling, stuck on the same conceptual idea. So I did what many developers do: I assumed the AI just couldn’t answer my question and moved on to another problem.

At the time, I didn’t recognize the rehash loop for what it was. I assumed ChatGPT just didn’t know the answer and gave up. But revisiting the project after developing the Sens-AI framework, I saw the whole exchange in a new light. The rehash loop was a signal that the AI needed more context. It got stuck because I hadn’t told it what it needed to know.

When I started working on the framework, I remembered this old failure and thought it’d be a perfect test case. Now I had a set of steps that I could follow:

First, I recognized that the AI had run out of context. The model wasn’t failing randomly—it was repeating itself because it didn’t understand what I was asking it to do.
Next, I did some targeted research. I brushed up on ANSI escape codes and started reading the AI’s earlier explanations more carefully. That’s when I noticed a detail I’d skimmed past the first time while vibe coding: When I went back through the AI explanation of the code that it generated, I saw that the PIC ZZ COBOL syntax defines a numeric-edited field. I suspected that could potentially cause it to introduce leading spaces into strings and wondered if that could break an escape sequence.
Then I reframed the problem. I opened a new chat and explained what I was trying to build, what I was seeing, and what I suspected. I told the AI I’d noticed it was circling the same solution and treated that as a signal that we were missing something fundamental. I also told it that I’d done some research and had three leads I suspected were related: how COBOL displays multiple items in sequence, how terminal escape codes need to be formatted, and how spacing in numeric fields might be corrupting the output. The prompt didn’t provide answers; it just gave some potential research areas for the AI to investigate. That gave it what it needed to find the additional context it needed to break out of the rehash loop.
Once the model was unstuck, I refined my prompt. I asked follow-up questions to clarify exactly what the output should look like and how to construct the strings more reliably. I wasn’t just looking for a fix—I was guiding the model toward a better approach.
And most of all, I used critical thinking. I read the answers closely, compared them to what I already knew, and decided what to try based on what actually made sense. The explanation checked out. I implemented the fix, and the program worked.

*My prompt that broke ChatGPT out of its rehash loop*

Once I took the time to understand the problem—and did just enough research to give the AI a few hints about what context it was missing—I was able to write a prompt that broke ChatGPT out of the rehash loop, and it generated code that did exactly what I needed. The generated code for the working COBOL app is available in this GitHub GIST.

*The working COBOL app that moves an asterisk around the screen*

Why These Habits Matter for New Developers

I built the Sens-AI learning path in Head First C# around the five habits in the framework. These habits aren’t checklists, scripts, or hard-and-fast rules. They’re ways of thinking that help people use AI more productively—and they don’t require years of experience. I’ve seen new developers pick them up quickly, sometimes faster than seasoned developers who didn’t realize they were stuck in shallow prompting loops.

The key insight into these habits came to me when I was updating the coding exercises in the most recent edition of Head First C#. I test the exercises using AI by pasting the instructions and starter code into tools like ChatGPT and Copilot. If they produce the correct solution, that means I’ve given the model enough information to solve it—which means I’ve given readers enough information too. But if it fails to solve the problem, something’s missing from the exercise instructions.

The process of using AI to test the exercises in the book reminded me of a problem I ran into in the first edition, back in 2007. One exercise kept tripping people up, and after reading a lot of feedback, I realized the problem: I hadn’t given readers all the information they needed to solve it. That helped connect the dots for me. The AI struggles with some coding problems for the same reason the learners were struggling with that exercise—because the context wasn’t there. Writing a good coding exercise and writing a good prompt both depend on understanding what the other side needs to make sense of the problem.

That experience helped me realize that to make developers successful with AI, we need to do more than just teach the basics of prompt engineering. We need to explicitly instill these thinking habits and give developers a way to build them alongside their core coding skills. If we want developers to succeed, we can’t just tell them to “prompt better.” We need to show them how to think with AI.

Where We Go from Here

If AI really is changing how we write software—and I believe it is—then we need to change how we teach it. We’ve made it easy to give people access to the tools. The harder part is helping them develop the habits and judgment to use them well, especially when things go wrong. That’s not just an education problem; it’s also a design problem, a documentation problem, and a tooling problem. Sens-AI is one answer, but it’s just the beginning. We still need clearer examples and better ways to guide, debug, and refine the model’s output. If we teach developers how to think with AI, we can help them become not just code generators but thoughtful engineers who understand what their code is doing and why it matters.

Source link

Funding & Business6 days ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Jobs & Careers6 days ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Mergers & Acquisitions6 days ago

Donald Trump suggests US government review subsidies to Elon Musk’s companies

Funding & Business6 days ago

Rethinking Venture Capital’s Talent Pipeline

Jobs & Careers6 days ago

Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)

Funding & Business4 days ago

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Funding & Business7 days ago

From chatbots to collaborators: How AI agents are reshaping enterprise work

Tools & Platforms6 days ago

Winning with AI – A Playbook for Pest Control Business Leaders to Drive Growth

Jobs & Careers6 days ago

Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle

Jobs & Careers6 days ago

Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure

aistoriz.com

Context extraction from image files in Amazon Q Business using LLMs

Books, Courses & Certifications

Context extraction from image files in Amazon Q Business using LLMs

Example scenario: Analyzing regional educational demographics

Solution overview

Prerequisites

Create an Amazon Q Business application and sync with an S3 bucket

Configure the Amazon Q Business application CDE for the Amazon S3 data source

Extract context from the images

Example prompts and results

Best practices for Amazon S3 CDE configuration

Cleanup

Conclusion

About the Authors

About the authors

Leave a Reply
Cancel reply

Leave a Reply

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Become the Highest Paid AI Engineer!

Why is This a Great Bootcamp?

Become the Highest Paid AI Engineer!

About Caltech CTME

About IBM

How to Thrive in the Brave New World of AI and ML

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Trending

aistoriz.com

Context extraction from image files in Amazon Q Business using LLMs

Example scenario: Analyzing regional educational demographics

Solution overview

Prerequisites

Create an Amazon Q Business application and sync with an S3 bucket

Configure the Amazon Q Business application CDE for the Amazon S3 data source

Extract context from the images

Example prompts and results

Best practices for Amazon S3 CDE configuration

Cleanup

Conclusion

About the Authors

About the authors

You may like

Leave a Reply Cancel reply

Leave a Reply

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Become the Highest Paid AI Engineer!

Why is This a Great Bootcamp?

Become the Highest Paid AI Engineer!

About Caltech CTME

About IBM

How to Thrive in the Brave New World of AI and ML

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Trending

Leave a Reply
Cancel reply