Books, Courses & Certifications

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Published

6 days ago

June 30, 2025

Amazon SageMaker Inference has been a popular tool for deploying advanced machine learning (ML) and generative AI models at scale. As AI applications become increasingly complex, customers want to deploy multiple models in a coordinated group that collectively process inference requests for an application. In addition, with the evolution of generative AI applications, many use cases now require inference workflows—sequences of interconnected models operating in predefined logical flows. This trend drives a growing need for more sophisticated inference offerings.

To address this need, we are introducing a new capability in the SageMaker Python SDK that revolutionizes how you build and deploy inference workflows on SageMaker. We will take Amazon Search as an example to show case how this feature is used in helping customers building inference workflows. This new Python SDK capability provides a streamlined and simplified experience that abstracts away the underlying complexities of packaging and deploying groups of models and their collective inference logic, allowing you to focus on what matter most—your business logic and model integrations.

In this post, we provide an overview of the user experience, detailing how to set up and deploy these workflows with multiple models using the SageMaker Python SDK. We walk through examples of building complex inference workflows, deploying them to SageMaker endpoints, and invoking them for real-time inference. We also show how customers like Amazon Search plan to use SageMaker Inference workflows to provide more relevant search results to Amazon shoppers.

Whether you are building a simple two-step process or a complex, multimodal AI application, this new feature provides the tools you need to bring your vision to life. This tool aims to make it easy for developers and businesses to create and manage complex AI systems, helping them build more powerful and efficient AI applications.

In the following sections, we dive deeper into details of the SageMaker Python SDK, walk through practical examples, and showcase how this new capability can transform your AI development and deployment process.

Key improvements and user experience

The SageMaker Python SDK now includes new features for creating and managing inference workflows. These additions aim to address common challenges in developing and deploying inference workflows:

Deployment of multiple models – The core of this new experience is the deployment of multiple models as inference components within a single SageMaker endpoint. With this approach, you can create a more unified inference workflow. By consolidating multiple models into one endpoint, you can reduce the number of endpoints that need to be managed. This consolidation can also improve operational tasks, resource utilization, and potentially costs.
Workflow definition with workflow mode – The new workflow mode extends the existing Model Builder capabilities. It allows for the definition of inference workflows using Python code. Users familiar with the ModelBuilder class might find this feature to be an extension of their existing knowledge. This mode enables creating multi-step workflows, connecting models, and specifying the data flow between different models in the workflows. The goal is to reduce the complexity of managing these workflows and enable you to focus more on the logic of the resulting compound AI system.
Development and deployment options – A new deployment option has been introduced for the development phase. This feature is designed to allow for quicker deployment of workflows to development environments. The intention is to enable faster testing and refinement of workflows. This could be particularly relevant when experimenting with different configurations or adjusting models.
Invocation flexibility – The SDK now provides options for invoking individual models or entire workflows. You can choose to call a specific inference component used in a workflow or the entire workflow. This flexibility can be useful in scenarios where access to a specific model is needed, or when only a portion of the workflow needs to be executed.
Dependency management – You can use SageMaker Deep Learning Containers (DLCs) or the SageMaker distribution that comes preconfigured with various model serving libraries and tools. These are intended to serve as a starting point for common use cases.

To get started, use the SageMaker Python SDK to deploy your models as inference components. Then, use the workflow mode to create an inference workflow, represented as Python code using the container of your choice. Deploy the workflow container as another inference component on the same endpoints as the models or a dedicated endpoint. You can run the workflow by invoking the inference component that represents the workflow. The user experience is entirely code-based, using the SageMaker Python SDK. This approach allows you to define, deploy, and manage inference workflows using SDK abstractions offered by this feature and Python programming. The workflow mode provides flexibility to specify complex sequences of model invocations and data transformations, and the option to deploy as components or endpoints caters to various scaling and integration needs.

Solution overview

The following diagram illustrates a reference architecture using the SageMaker Python SDK.

The improved SageMaker Python SDK introduces a more intuitive and flexible approach to building and deploying AI inference workflows. Let’s explore the key components and classes that make up the experience:

ModelBuilder simplifies the process of packaging individual models as inference components. It handles model loading, dependency management, and container configuration automatically.
The CustomOrchestrator class provides a standardized way to define custom inference logic that orchestrates multiple models in the workflow. Users implement the handle() method to specify this logic and can use an orchestration library or none at all (plain Python).
A single deploy() call handles the deployment of the components and workflow orchestrator.
The Python SDK supports invocation against the custom inference workflow or individual inference components.
The Python SDK supports both synchronous and streaming inference.

CustomOrchestrator is an abstract base class that serves as a template for defining custom inference orchestration logic. It standardizes the structure of entry point-based inference scripts, making it straightforward for users to create consistent and reusable code. The handle method in the class is an abstract method that users implement to define their custom orchestration logic.

class CustomOrchestrator (ABC):
"""
Templated class used to standardize the structure of an entry point based inference script.
"""

    @abstractmethod
    def handle(self, data, context=None):
        """abstract class for defining an entrypoint for the model server"""
        return NotImplemented

With this templated class, users can integrate into their custom workflow code, and then point to this code in the model builder using a file path or directly using a class or method name. Using this class and the ModelBuilder class, it enables a more streamlined workflow for AI inference:

Users define their custom workflow by implementing the CustomOrchestrator class.
The custom CustomOrchestrator is passed to ModelBuilder using the ModelBuilder inference_spec parameter.
ModelBuilder packages the CustomOrchestrator along with the model artifacts.
The packaged model is deployed to a SageMaker endpoint (for example, using a TorchServe container).
When invoked, the SageMaker endpoint uses the custom handle() function defined in the CustomOrchestrator to handle the input payload.

In the follow sections, we provide two examples of custom workflow orchestrators implemented with plain Python code. For simplicity, the examples use two inference components.

We explore how to create a simple workflow that deploys two large language models (LLMs) on SageMaker Inference endpoints along with a simple Python orchestrator that calls the two models. We create an IT customer service workflow where one model processes the initial request and another suggests solutions. You can find the example notebook in the GitHub repo.

Prerequisites

To run the example notebooks, you need an AWS account with an AWS Identity and Access Management (IAM) role with least-privilege permissions to manage resources created. For details, refer to Create an AWS account. You might need to request a service quota increase for the corresponding SageMaker hosting instances. In this example, we host multiple models on the same SageMaker endpoint, so we use two ml.g5.24xlarge SageMaker hosting instances.

Python inference orchestration

First, let’s define our custom orchestration class that inherits from CustomOrchestrator. The workflow is structured around a custom inference entry point that handles the request data, processes it, and retrieves predictions from the configured model endpoints. See the following code:

class PythonCustomInferenceEntryPoint(CustomOrchestrator):
    def __init__(self, region_name, endpoint_name, component_names):
        self.region_name = region_name
        self.endpoint_name = endpoint_name
        self.component_names = component_names
    
    def preprocess(self, data):
        payload = {
            "inputs": data.decode("utf-8")
        }
        return json.dumps(payload)

    def _invoke_workflow(self, data):
        # First model (Llama) inference
        payload = self.preprocess(data)
        
        llama_response = self.client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            Body=payload,
            ContentType="application/json",
            InferenceComponentName=self.component_names[0]
        )
        llama_generated_text = json.loads(llama_response.get('Body').read())['generated_text']
        
        # Second model (Mistral) inference
        parameters = {
            "max_new_tokens": 50
        }
        payload = {
            "inputs": llama_generated_text,
            "parameters": parameters
        }
        mistral_response = self.client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            Body=json.dumps(payload),
            ContentType="application/json",
            InferenceComponentName=self.component_names[1]
        )
        return {"generated_text": json.loads(mistral_response.get('Body').read())['generated_text']}
    
    def handle(self, data, context=None):
        return self._invoke_workflow(data)

This code performs the following functions:

Defines the orchestration that sequentially calls two models using their inference component names
Processes the response from the first model before passing it to the second model
Returns the final generated response

This plain Python approach provides flexibility and control over the request-response flow, enabling seamless cascading of outputs across multiple model components.

Build and deploy the workflow

To deploy the workflow, we first create our inference components and then build the custom workflow. One inference component will host a Meta Llama 3.1 8B model, and the other will host a Mistral 7B model.

from sagemaker.serve import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Create a ModelBuilder instance for Llama 3.1 8B
# Pre-benchmarked ResourceRequirements will be taken from JumpStart, as Llama-3.1-8b is a supported model.
llama_model_builder = ModelBuilder(
    model="meta-textgeneration-llama-3-1-8b",
    schema_builder=SchemaBuilder(sample_input, sample_output),
    inference_component_name=llama_ic_name,
    instance_type="ml.g5.24xlarge"
)

# Create a ModelBuilder instance for Mistral 7B model.
mistral_mb = ModelBuilder(
    model="huggingface-llm-mistral-7b",
    instance_type="ml.g5.24xlarge",
    schema_builder=SchemaBuilder(sample_input, sample_output),
    inference_component_name=mistral_ic_name,
    resource_requirements=ResourceRequirements(
        requests={
           "memory": 49152,
           "num_accelerators": 2,
           "copies": 1
        }
    ),
    instance_type="ml.g5.24xlarge"
)

Now we can tie it all together to create one more ModelBuilder to which we pass the modelbuilder_list, which contains the ModelBuilder objects we just created for each inference component and the custom workflow. Then we call the build() function to prepare the workflow for deployment.

# Create workflow ModelBuilder
orchestrator= ModelBuilder(
    inference_spec=PythonCustomInferenceEntryPoint(
        region_name=region,
        endpoint_name=llama_mistral_endpoint_name,
        component_names=[llama_ic_name, mistral_ic_name],
    ),
    dependencies={
        "auto": False,
        "custom": [
            "cloudpickle",
            "graphene",
            # Define other dependencies here.
        ],
    },
    sagemaker_session=Session(),
    role_arn=role,
    resource_requirements=ResourceRequirements(
        requests={
           "memory": 4096,
           "num_accelerators": 1,
           "copies": 1,
           "num_cpus": 2
        }
    ),
    name=custom_workflow_name, # Endpoint name for your custom workflow
    schema_builder=SchemaBuilder(sample_input={"inputs": "test"}, sample_output="Test"),
    modelbuilder_list=[llama_model_builder, mistral_mb] # Inference Component ModelBuilders created in Step 2
)
# call the build function to prepare the workflow for deployment
orchestrator.build()

In the preceding code snippet, you can comment out the section that defines the resource_requirements to have the custom workflow deployed on a separate endpoint instance, which can be a dedicated CPU instance to handle the custom workflow payload.

By calling the deploy() function, we deploy the custom workflow and the inference components to your desired instance type, in this example ml.g5.24.xlarge. If you choose to deploy the custom workflow to a separate instance, by default, it will use the ml.c5.xlarge instance type. You can set inference_workflow_instance_type and inference_workflow_initial_instance_count to configure the instances required to host the custom workflow.

predictors = orchestrator.deploy(
    instance_type="ml.g5.24xlarge",
    initial_instance_count=1,
    accept_eula=True, # Required for Llama3
    endpoint_name=llama_mistral_endpoint_name
    # inference_workflow_instance_type="ml.t2.medium", # default
    # inference_workflow_initial_instance_count=1 # default
)

Invoke the endpoint

After you deploy the workflow, you can invoke the endpoint using the predictor object:

from sagemaker.serializers import JSONSerializer
predictors[-1].serializer = JSONSerializer()
predictors[-1].predict("Tell me a story about ducks.")

You can also invoke each inference component in the deployed endpoint. For example, we can test the Llama inference component with a synchronous invocation, and Mistral with streaming:

from sagemaker.predictor import Predictor
# create predictor for the inference component of Llama model
llama_predictor = Predictor(endpoint_name=llama_mistral_endpoint_name, component_name=llama_ic_name)
llama_predictor.content_type = "application/json"

llama_predictor.predict(json.dumps(payload))

When handling the streaming response, we need to read each line of the output separately. The following example code demonstrates this streaming handling by checking for newline characters to separate and print each token in real time:

mistral_predictor = Predictor(endpoint_name=llama_mistral_endpoint_name, component_name=mistral_ic_name)
mistral_predictor.content_type = "application/json"

body = json.dumps({
    "inputs": prompt,
    # specify the parameters as needed
    "parameters": parameters
})

for line in mistral_predictor.predict_stream(body):
    decoded_line = line.decode('utf-8')
    if '\n' in decoded_line:
        # Split by newline to handle multiple tokens in the same line
        tokens = decoded_line.split('\n')
        for token in tokens[:-1]:  # Print all tokens except the last one with a newline
            print(token)
        # Print the last token without a newline, as it might be followed by more tokens
        print(tokens[-1], end='')
    else:
        # Print the token without a newline if it doesn't contain '\n'
        print(decoded_line, end='')

So far, we have walked through the example code to demonstrate how to build complex inference logic using Python orchestration, deploy them to SageMaker endpoints, and invoke them for real-time inference. The Python SDK automatically handles the following:

Model packaging and container configuration
Dependency management and environment setup
Endpoint creation and component coordination

Whether you’re building a simple workflow of two models or a complex multimodal application, the new SDK provides the building blocks needed to bring your inference workflows to life with minimal boilerplate code.

Customer story: Amazon Search

Amazon Search is a critical component of the Amazon shopping experience, processing an enormous volume of queries across billions of products across diverse categories. At the core of this system are sophisticated matching and ranking workflows, which determine the order and relevance of search results presented to customers. These workflows execute large deep learning models in predefined sequences, often sharing models across different workflows to improve price-performance and accuracy. This approach makes sure that whether a customer is searching for electronics, fashion items, books, or other products, they receive the most pertinent results tailored to their query.

The SageMaker Python SDK enhancement offers valuable capabilities that align well with Amazon Search’s requirements for these ranking workflows. It provides a standard interface for developing and deploying complex inference workflows crucial for effective search result ranking. The enhanced Python SDK enables efficient reuse of shared models across multiple ranking workflows while maintaining the flexibility to customize logic for specific product categories. Importantly, it allows individual models within these workflows to scale independently, providing optimal resource allocation and performance based on varying demand across different parts of the search system.

Amazon Search is exploring the broad adoption of these Python SDK enhancements across their search ranking infrastructure. This initiative aims to further refine and improve search capabilities, enabling the team to build, version, and catalog workflows that power search ranking more effectively across different product categories. The ability to share models across workflows and scale them independently offers new levels of efficiency and adaptability in managing the complex search ecosystem.

Vaclav Petricek, Sr. Manager of Applied Science at Amazon Search, highlighted the potential impact of these SageMaker Python SDK enhancements: “These capabilities represent a significant advancement in our ability to develop and deploy sophisticated inference workflows that power search matching and ranking. The flexibility to build workflows using Python, share models across workflows, and scale them independently is particularly exciting, as it opens up new possibilities for optimizing our search infrastructure and rapidly iterating on our matching and ranking algorithms as well as new AI features. Ultimately, these SageMaker Inference enhancements will allow us to more efficiently create and manage the complex algorithms powering Amazon’s search experience, enabling us to deliver even more relevant results to our customers.”

The following diagram illustrates a sample solution architecture used by Amazon Search.

Clean up

When you’re done testing the models, as a best practice, delete the endpoint to save costs if the endpoint is no longer required. You can follow the cleanup section the demo notebook or use following code to delete the model and endpoint created by the demo:

mistral_predictor.delete_predictor()
llama_predictor.delete_predictor()
llama_predictor.delete_endpoint()
workflow_predictor.delete_predictor()

Conclusion

The new SageMaker Python SDK enhancements for inference workflows mark a significant advancement in the development and deployment of complex AI inference workflows. By abstracting the underlying complexities, these enhancements empower inference customers to focus on innovation rather than infrastructure management. This feature bridges sophisticated AI applications with the robust SageMaker infrastructure, enabling developers to use familiar Python-based tools while harnessing the powerful inference capabilities of SageMaker.

Early adopters, including Amazon Search, are already exploring how these capabilities can drive major improvements in AI-powered customer experiences across diverse industries. We invite all SageMaker users to explore this new functionality, whether you’re developing classic ML models, building generative AI applications or multi-model workflows, or tackling multi-step inference scenarios. The enhanced SDK provides the flexibility, ease of use, and scalability needed to bring your ideas to life. As AI continues to evolve, SageMaker Inference evolves with it, providing you with the tools to stay at the forefront of innovation. Start building your next-generation AI inference workflows today with the enhanced SageMaker Python SDK.

About the authors

Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of Generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Osho Gupta is a Senior Software Developer at AWS SageMaker. He is passionate about ML infrastructure space, and is motivated to learn & advance underlying technologies that optimize Gen AI training & inference performance. In his spare time, Osho enjoys paddle boarding, hiking, traveling, and spending time with his friends & family.

Joseph Zhang is a software engineer at AWS. He started his AWS career at EC2 before eventually transitioning to SageMaker, and now works on developing GenAI-related features. Outside of work he enjoys both playing and watching sports (go Warriors!), spending time with family, and making coffee.

Gary Wang is a Software Developer at AWS SageMaker. He is passionate about AI/ML operations and building new things. In his spare time, Gary enjoys running, hiking, trying new food, and spending time with his friends and family.

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends. You can find him on LinkedIn.

Vaclav Petricek is a Senior Applied Science Manager at Amazon Search, where he led teams that built Amazon Rufus and now leads science and engineering teams that work on the next generation of Natural Language Shopping. He is passionate about shipping AI experiences that make people’s lives better. Vaclav loves off-piste skiing, playing tennis, and backpacking with his wife and three children.

Wei Li is a Senior Software Dev Engineer in Amazon Search. She is passionate about Large Language Model training and inference technologies, and loves integrating these solutions into Search Infrastructure to enhance natural language shopping experiences. During her leisure time, she enjoys gardening, painting, and reading.

Brian Granger is a Senior Principal Technologist at Amazon Web Services and a professor of physics and data science at Cal Poly State University in San Luis Obispo, CA. He works at the intersection of UX design and engineering on tools for scientific computing, data science, machine learning, and data visualization. Brian is a co-founder and leader of Project Jupyter, co-founder of the Altair project for statistical visualization, and creator of the PyZMQ project for ZMQ-based message passing in Python. At AWS he is a technical and open source leader in the AI/ML organization. Brian also represents AWS as a board member of the PyTorch Foundation. He is a winner of the 2017 ACM Software System Award and the 2023 NASA Exceptional Public Achievement Medal for his work on Project Jupyter. He has a Ph.D. in theoretical physics from the University of Colorado.

Source link

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Published

3 days ago

July 4, 2025

IndustryTrends

The year 2025 for AI education provides choices catering to learning style, career goal, and budget. The Logicmojo Advanced Data Science & AI Program has emerged as the top one, offering comprehensive training with proven results in placement for those wishing to pursue job-oriented training. It offers the kind of live training, projects, and career support that fellow professionals seek when interested in turning into a high-paying AI position.

On the other hand, for the independent learner seeking prestige credentials, a few other good options might include programs from Stanford, MIT, and DeepLearning.AI. Google and IBM certificates are an inexpensive footing for a beginner, while, at the opposite end of the spectrum, a Carnegie Mellon certificate is considered the ultimate academic credential in AI.

Whatever choice you make in 2025 to further your knowledge in AI will place you at the forefront of technology innovation. AI, expected to generate millions of jobs, has the potential to revolutionize every industry, and so whatever you learn today will be the deciding factor in your career waters for at least the next few decades.

Source link

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Published

3 days ago

July 4, 2025

John Terra

Artificial Intelligence and Machine Learning are noteworthy game-changers in today’s digital world. Technological wonders once limited to science fiction have become science fact, giving us innovations such as self-driving cars, intelligent voice-operated virtual assistants, and computers that learn and grow.

The two fields are making inroads into all areas of our lives, including the workplace, showing up in occupations such as Data Scientist and Digital Marketer. And for all the impressive things that Artificial Intelligence and Machine Learning have accomplished in the last ten years, there’s so much more in store.

Simplilearn wants today’s IT professionals to be better equipped to embrace these new technologies. Hence, it offers Machine Learning Bootcamp, held in conjunction with Caltech’s Center for Technology and Management Education (CTME) and in collaboration with IBM.

The bootcamp covers the relevant points of Artificial Intelligence and Machine Learning, exploring tools and concepts such as Python and TensorFlow. The course optimizes the academic excellence of Caltech and the industry prowess of IBM, creating an unbeatable learning resource that supercharges your skillset and prepares you to navigate the world of AI/ML better.

Why is This a Great Bootcamp?

When you bring together an impressive lineup of Simplilearn, Caltech, and IBM, you expect nothing less than an excellent result. The AI and Machine Learning Bootcamp delivers as promised.

This six-month program deals with vital AI/ML concepts such as Deep Learning, Statistics, and Data Science With Python. Here is a breakdown of the diverse and valuable information the bootcamp offers:

Orientation. The orientation session prepares you for the rigors of an intense, six-month learning experience, where you dedicate from five to ten hours a week to learning the latest in AI/ML skills and concepts.
Introduction to Artificial Intelligence. There’s a difference between AI and ML, and here’s where you start to learn this. This offering is a beginner course covering the basics of AI and workflows, Deep Learning, Machine Learning, and other details.
Python for Data Science. Many data scientists prefer to use the Python programming language when working with AI/ML. This section deals with Python, its libraries, and using a Jupyter-based lab environment to write scripts.
Applied Data Science with Python. Your exposure to Python continues with this study of Python’s tools and techniques used for Data Analytics.
Machine Learning. Now we come to the other half of the AI/ML partnership. You will learn all about Machine Learning’s chief techniques and concepts, including heuristic aspects, supervised/unsupervised learning, and developing algorithms.
Deep Learning with Keras and Tensorflow. This section shows you how to use Keras and TensorFlow frameworks to master Deep Learning models and concepts and prepare Deep Learning algorithms.
Advanced Deep Learning and Computer Vision. This advanced course takes Deep Learning to a new level. This module covers topics like Computer Vision for OCR and Object Detection, and Computer Vision Basics with Python.
Capstone project. Finally, it’s time to take what you have learned and implement your new AI/ML skills to solve an industry-relevant issue.

The course also offers students a series of electives:

Statistics Essentials for Data Science. Statistics are a vital part of Data Science, and this elective teaches you how to make data-driven predictions via statistical inference.
NLP and Speech Recognition. This elective covers speech-to-text conversion, text-to-speech conversion, automated speech recognition, voice-assistance devices, and much more.
Reinforcement Learning. Learn how to solve reinforcement learning problems by applying different algorithms and strategies like TensorFlow and Python.
Caltech Artificial Intelligence and Machine Learning Bootcamp Masterclass. These masterclasses are conducted by qualified Caltech and IBM instructors.

This AI and ML Bootcamp gives students a bounty of AI/ML-related benefits like:

Campus immersion, which includes an exclusive visit to Caltech’s robotics lab.
A program completion certificate from Caltech CTME.
A Caltech CTME Circle membership.
The chance to earn up to 22 CEUs courtesy of Caltech CTME.
An online convocation by the Caltech CTME Program Director.
A physical certificate from Caltech CTME if you request one.
Access to hackathons and Ask Me Anything sessions from IBM.
More than 25 hands-on projects and integrated labs across industry verticals.
A Level Up session by Andrew McAfee, Principal Research Scientist at MIT.
Access to Simplilearn’s Career Service, which will help you get noticed by today’s top hiring companies.
Industry-certified certificates for IBM courses.
Industry masterclasses delivered by IBM.
Hackathons from IBM.
Ask Me Anything (AMA) sessions held with the IBM leadership.

And these are the skills the course covers, all essential tools for working with today’s AI and ML projects:

Statistics
Python
Supervised Learning
Unsupervised Learning
Recommendation Systems
NLP
Neural Networks
GANs
Deep Learning
Reinforcement Learning
Speech Recognition
Ensemble Learning
Computer Vision

About Caltech CTME

Located in California, Caltech is a world-famous, highly respected science and engineering institution featuring some of today’s brightest scientific and technological minds. Contributions from Caltech alumni have earned worldwide acclaim, including over three dozen Nobel prizes. Caltech CTME instructors offer this quality of learning to our students by holding bootcamp master classes.

About IBM

IBM was founded in 1911 and has earned a reputation as the top IT industry leader and master of IT innovation.

How to Thrive in the Brave New World of AI and ML

Machine Learning and Artificial Intelligence have enormous potential to change our world for the better, but the fields need people of skill and vision to help lead the way. Somehow, there must be a balance between technological advancement and how it impacts people (quality of life, carbon footprint, job losses due to automation, etc.).

The AI and Machine Learning Bootcamp helps teach and train students, equipping them to assume a role of leadership in the new world that AI and ML offer.

Source link

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

Published

4 days ago

July 3, 2025

Andrew Stellman

Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying.

But if you’ve spent any real time coding with AI, you’ve probably hit a point where things stall. You keep refining your prompt and adjusting your approach, but the model keeps generating the same kind of answer, just phrased a little differently each time, and returning slight variations on the same incomplete solution. It feels close, but it’s not getting there. And worse, it’s not clear how to get back on track.

That moment is familiar to a lot of people trying to apply AI in real work. It’s what my recent talk at O’Reilly’s AI Codecon event was all about.

Over the last two years, while working on the latest edition of Head First C#, I’ve been developing a new kind of learning path, one that helps developers get better at both coding and using AI. I call it Sens-AI, and it came out of something I kept seeing:

There’s a learning gap with AI that’s creating real challenges for people who are still building their development skills.

My recent O’Reilly Radar article “Bridging the AI Learning Gap” looked at what happens when developers try to learn AI and coding at the same time. It’s not just a tooling problem—it’s a thinking problem. A lot of developers are figuring things out by trial and error, and it became clear to me that they needed a better way to move from improvising to actually solving problems.

From Vibe Coding to Problem Solving

Ask developers how they use AI, and many will describe a kind of improvisational prompting strategy: Give the model a task, see what it returns, and nudge it toward something better. It can be an effective approach because it’s fast, fluid, and almost effortless when it works.

That pattern is common enough to have a name: vibe coding. It’s a great starting point, and it works because it draws on real prompt engineering fundamentals—iterating, reacting to output, and refining based on feedback. But when something breaks, the code doesn’t behave as expected, or the AI keeps rehashing the same unhelpful answers, it’s not always clear what to try next. That’s when vibe coding starts to fall apart.

Senior developers tend to pick up AI more quickly than junior ones, but that’s not a hard-and-fast rule. I’ve seen brand-new developers pick it up quickly, and I’ve seen experienced ones get stuck. The difference is in what they do next. The people who succeed with AI tend to stop and rethink: They figure out what’s going wrong, step back to look at the problem, and reframe their prompt to give the model something better to work with.

When developers think critically, AI works better. (slide from my May 8, 2025, talk at O’Reilly AI Codecon)

The Sens-AI Framework

As I started working more closely with developers who were using AI tools to try to find ways to help them ramp up more easily, I paid attention to where they were getting stuck, and I started noticing that the pattern of an AI rehashing the same “almost there” suggestions kept coming up in training sessions and real projects. I saw it happen in my own work too. At first it felt like a weird quirk in the model’s behavior, but over time I realized it was a signal: The AI had used up the context I’d given it. The signal tells us that we need a better understanding of the problem, so we can give the model the information it’s missing. That realization was a turning point. Once I started paying attention to those breakdown moments, I began to see the same root cause across many developers’ experiences: not a flaw in the tools but a lack of framing, context, or understanding that the AI couldn’t supply on its own.

Over time—and after a lot of testing, iteration, and feedback from developers—I distilled the core of the Sens-AI learning path into five specific habits. They came directly from watching where learners got stuck, what kinds of questions they asked, and what helped them move forward. These habits form a framework that’s the intellectual foundation behind how Head First C# teaches developers to work with AI:

Context: Paying attention to what information you supply to the model, trying to figure out what else it needs to know, and supplying it clearly. This includes code, comments, structure, intent, and anything else that helps the model understand what you’re trying to do.
Research: Actively using AI and external sources to deepen your own understanding of the problem. This means running examples, consulting documentation, and checking references to verify what’s really going on.
Problem framing: Using the information you’ve gathered to define the problem more clearly so the model can respond more usefully. This involves digging deeper into the problem you’re trying to solve, recognizing what the AI still needs to know about it, and shaping your prompt to steer it in a more productive direction—and going back to do more research when you realize that it needs more context.
Refining: Iterating your prompts deliberately. This isn’t about random tweaks; it’s about making targeted changes based on what the model got right and what it missed, and using those results to guide the next step.
Critical thinking: Judging the quality of AI output rather than just simply accepting it. Does the suggestion make sense? Is it correct, relevant, plausible? This habit is especially important because it helps developers avoid the trap of trusting confident-sounding answers that don’t actually work.

These habits let developers get more out of AI while keeping control over the direction of their work.

From Stuck to Solved: Getting Better Results from AI

I’ve watched a lot of developers use tools like Copilot and ChatGPT—during training sessions, in hands-on exercises, and when they’ve asked me directly for help. What stood out to me was how often they assumed the AI had done a bad job. In reality, the prompt just didn’t include the information the model needed to solve the problem. No one had shown them how to supply the right context. That’s what the five Sens-AI habits are designed to address: not by handing developers a checklist but by helping them build a mental model for how to work with AI more effectively.

In my AI Codecon talk, I shared a story about my colleague Luis, a very experienced developer with over three decades of coding experience. He’s a seasoned engineer and an advanced AI user who builds content for training other developers, works with large language models directly, uses sophisticated prompting techniques, and has built AI-based analysis tools.

Luis was building a desktop wrapper for a React app using Tauri, a Rust-based toolkit. He pulled in both Copilot and ChatGPT, cross-checking output, exploring alternatives, and trying different approaches. But the code still wasn’t working.

Each AI suggestion seemed to fix part of the problem but break another part. The model kept offering slightly different versions of the same incomplete solution, never quite resolving the issue. For a while, he vibe-coded through it, adjusting the prompt and trying again to see if a small nudge would help, but the answers kept circling the same spot. Eventually, he realized the AI had run out of context and changed his approach. He stepped back, did some focused research to better understand what the AI was trying (and failing) to do, and applied the same habits I emphasize in the Sens-AI framework.

That shift changed the outcome. Once he understood the pattern the AI was trying to use, he could guide it. He reframed his prompt, added more context, and finally started getting suggestions that worked. The suggestions only started working once Luis gave the model the missing pieces it needed to make sense of the problem.

Applying the Sens-AI Framework: A Real-World Example

Before I developed the Sens-AI framework, I ran into a problem that later became a textbook case for it. I was curious whether COBOL, a decades-old language developed for mainframes that I had never used before but wanted to learn more about, could handle the basic mechanics of an interactive game. So I did some experimental vibe coding to build a simple terminal app that would let the user move an asterisk around the screen using the W/A/S/D keys. It was a weird little side project—I just wanted to see if I could make COBOL do something it was never really meant for, and learn something about it along the way.

The initial AI-generated code compiled and ran just fine, and at first I made some progress. I was able to get it to clear the screen, draw the asterisk in the right place, handle raw keyboard input that didn’t require the user to press Enter, and get past some initial bugs that caused a lot of flickering.

But once I hit a more subtle bug—where ANSI escape codes like ";10H" were printing literally instead of controlling the cursor—ChatGPT got stuck. I’d describe the problem, and it would generate a slightly different version of the same answer each time. One suggestion used different variable names. Another changed the order of operations. A few attempted to reformat the STRING statement. But none of them addressed the root cause.

*The COBOL app with a bug, printing a raw escape sequence instead of moving the asterisk.*

The pattern was always the same: slight code rewrites that looked plausible but didn’t actually change the behavior. That’s what a rehash loop looks like. The AI wasn’t giving me worse answers—it was just circling, stuck on the same conceptual idea. So I did what many developers do: I assumed the AI just couldn’t answer my question and moved on to another problem.

At the time, I didn’t recognize the rehash loop for what it was. I assumed ChatGPT just didn’t know the answer and gave up. But revisiting the project after developing the Sens-AI framework, I saw the whole exchange in a new light. The rehash loop was a signal that the AI needed more context. It got stuck because I hadn’t told it what it needed to know.

When I started working on the framework, I remembered this old failure and thought it’d be a perfect test case. Now I had a set of steps that I could follow:

First, I recognized that the AI had run out of context. The model wasn’t failing randomly—it was repeating itself because it didn’t understand what I was asking it to do.
Next, I did some targeted research. I brushed up on ANSI escape codes and started reading the AI’s earlier explanations more carefully. That’s when I noticed a detail I’d skimmed past the first time while vibe coding: When I went back through the AI explanation of the code that it generated, I saw that the PIC ZZ COBOL syntax defines a numeric-edited field. I suspected that could potentially cause it to introduce leading spaces into strings and wondered if that could break an escape sequence.
Then I reframed the problem. I opened a new chat and explained what I was trying to build, what I was seeing, and what I suspected. I told the AI I’d noticed it was circling the same solution and treated that as a signal that we were missing something fundamental. I also told it that I’d done some research and had three leads I suspected were related: how COBOL displays multiple items in sequence, how terminal escape codes need to be formatted, and how spacing in numeric fields might be corrupting the output. The prompt didn’t provide answers; it just gave some potential research areas for the AI to investigate. That gave it what it needed to find the additional context it needed to break out of the rehash loop.
Once the model was unstuck, I refined my prompt. I asked follow-up questions to clarify exactly what the output should look like and how to construct the strings more reliably. I wasn’t just looking for a fix—I was guiding the model toward a better approach.
And most of all, I used critical thinking. I read the answers closely, compared them to what I already knew, and decided what to try based on what actually made sense. The explanation checked out. I implemented the fix, and the program worked.

*My prompt that broke ChatGPT out of its rehash loop*

Once I took the time to understand the problem—and did just enough research to give the AI a few hints about what context it was missing—I was able to write a prompt that broke ChatGPT out of the rehash loop, and it generated code that did exactly what I needed. The generated code for the working COBOL app is available in this GitHub GIST.

*The working COBOL app that moves an asterisk around the screen*

Why These Habits Matter for New Developers

I built the Sens-AI learning path in Head First C# around the five habits in the framework. These habits aren’t checklists, scripts, or hard-and-fast rules. They’re ways of thinking that help people use AI more productively—and they don’t require years of experience. I’ve seen new developers pick them up quickly, sometimes faster than seasoned developers who didn’t realize they were stuck in shallow prompting loops.

The key insight into these habits came to me when I was updating the coding exercises in the most recent edition of Head First C#. I test the exercises using AI by pasting the instructions and starter code into tools like ChatGPT and Copilot. If they produce the correct solution, that means I’ve given the model enough information to solve it—which means I’ve given readers enough information too. But if it fails to solve the problem, something’s missing from the exercise instructions.

The process of using AI to test the exercises in the book reminded me of a problem I ran into in the first edition, back in 2007. One exercise kept tripping people up, and after reading a lot of feedback, I realized the problem: I hadn’t given readers all the information they needed to solve it. That helped connect the dots for me. The AI struggles with some coding problems for the same reason the learners were struggling with that exercise—because the context wasn’t there. Writing a good coding exercise and writing a good prompt both depend on understanding what the other side needs to make sense of the problem.

That experience helped me realize that to make developers successful with AI, we need to do more than just teach the basics of prompt engineering. We need to explicitly instill these thinking habits and give developers a way to build them alongside their core coding skills. If we want developers to succeed, we can’t just tell them to “prompt better.” We need to show them how to think with AI.

Where We Go from Here

If AI really is changing how we write software—and I believe it is—then we need to change how we teach it. We’ve made it easy to give people access to the tools. The harder part is helping them develop the habits and judgment to use them well, especially when things go wrong. That’s not just an education problem; it’s also a design problem, a documentation problem, and a tooling problem. Sens-AI is one answer, but it’s just the beginning. We still need clearer examples and better ways to guide, debug, and refine the model’s output. If we teach developers how to think with AI, we can help them become not just code generators but thoughtful engineers who understand what their code is doing and why it matters.

Source link

Funding & Business6 days ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Jobs & Careers6 days ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Mergers & Acquisitions6 days ago

Donald Trump suggests US government review subsidies to Elon Musk’s companies

Funding & Business6 days ago

Rethinking Venture Capital’s Talent Pipeline

Jobs & Careers6 days ago

Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)

Funding & Business3 days ago

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Funding & Business7 days ago

From chatbots to collaborators: How AI agents are reshaping enterprise work

Jobs & Careers6 days ago

Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure

Funding & Business6 days ago

Europe’s Most Ambitious Startups Aren’t Becoming Global; They’re Starting That Way

Jobs & Careers4 days ago

Ilya Sutskever Takes Over as CEO of Safe Superintelligence After Daniel Gross’s Exit

aistoriz.com

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Books, Courses & Certifications

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Key improvements and user experience

Solution overview

Prerequisites

Python inference orchestration

Build and deploy the workflow

Invoke the endpoint

Customer story: Amazon Search

Clean up

Conclusion

About the authors

Leave a Reply
Cancel reply

Leave a Reply

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Become the Highest Paid AI Engineer!

Why is This a Great Bootcamp?

Become the Highest Paid AI Engineer!

About Caltech CTME

About IBM

How to Thrive in the Brave New World of AI and ML

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Trending

aistoriz.com

Build and deploy AI inference workflows with new enhancements to the Amazon SageMaker Python SDK

Key improvements and user experience

Solution overview

Prerequisites

Python inference orchestration

Build and deploy the workflow

Invoke the endpoint

Customer story: Amazon Search

Clean up

Conclusion

About the authors

You may like

Leave a Reply Cancel reply

Leave a Reply

Books, Courses & Certifications

Complete Guide with Curriculum & Fees

Books, Courses & Certifications

Artificial Intelligence and Machine Learning Bootcamp Powered by Simplilearn

Become the Highest Paid AI Engineer!

Why is This a Great Bootcamp?

Become the Highest Paid AI Engineer!

About Caltech CTME

About IBM

How to Thrive in the Brave New World of AI and ML

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Trending

Leave a Reply
Cancel reply