Books, Courses & Certifications

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Published

4 days ago

July 2, 2025

Generative AI has revolutionized customer interactions across industries by offering personalized, intuitive experiences powered by unprecedented access to information. This transformation is further enhanced by Retrieval Augmented Generation (RAG), a technique that allows large language models (LLMs) to reference external knowledge sources beyond their training data. RAG has gained popularity for its ability to improve generative AI applications by incorporating additional information, often preferred by customers over techniques like fine-tuning due to its cost-effectiveness and faster iteration cycles.

The RAG approach excels in grounding language generation with external knowledge, producing more factual, coherent, and relevant responses. This capability proves invaluable in applications such as question answering, dialogue systems, and content generation, where accuracy and informative outputs are crucial. For businesses, RAG offers a powerful way to use internal knowledge by connecting company documentation to a generative AI model. When an employee asks a question, the RAG system retrieves relevant information from the company’s internal documents and uses this context to generate an accurate, company-specific response. This approach enhances the understanding and usage of internal company documents and reports. By extracting relevant context from corporate knowledge bases, RAG models facilitate tasks like summarization, information extraction, and complex question answering on domain-specific materials, enabling employees to quickly access vital insights from vast internal resources. This integration of AI with proprietary information can significantly improve efficiency, decision-making, and knowledge sharing across the organization.

A typical RAG workflow consists of four key components: input prompt, document retrieval, contextual generation, and output. The process begins with a user query, which is used to search a comprehensive knowledge corpus. Relevant documents are then retrieved and combined with the original query to provide additional context for the LLM. This enriched input allows the model to generate more accurate and contextually appropriate responses. RAG’s popularity stems from its ability to use frequently updated external data, providing dynamic outputs without the need for costly and compute-intensive model retraining.

To implement RAG effectively, many organizations turn to platforms like Amazon SageMaker JumpStart. This service offers numerous advantages for building and deploying generative AI applications, including access to a wide range of pre-trained models with ready-to-use artifacts, a user-friendly interface, and seamless scalability within the AWS ecosystem. By using pre-trained models and optimized hardware, SageMaker JumpStart enables rapid deployment of both LLMs and embedding models, minimizing the time spent on complex scalability configurations.

In the previous post, we showed how to build a RAG application on SageMaker JumpStart using Facebook AI Similarity Search (Faiss). In this post, we show how to use Amazon OpenSearch Service as a vector store to build an efficient RAG application.

Solution overview

To implement our RAG workflow on SageMaker, we use a popular open source Python library known as LangChain. With LangChain, the RAG components are simplified into independent blocks that you can bring together using a chain object that will encapsulate the entire workflow. The solution consists of the following key components:

LLM (inference) – We need an LLM that will do the actual inference and answer the end-user’s initial prompt. For our use case, we use Meta Llama3 for this component. LangChain comes with a default wrapper class for SageMaker endpoints with which we can simply pass in the endpoint name to define an LLM object in the library.
Embeddings model – We need an embeddings model to convert our document corpus into textual embeddings. This is necessary for when we’re doing a similarity search on the input text to see what documents share similarities or contain the information to help augment our response. For this post, we use the BGE Hugging Face Embeddings model available in SageMaker JumpStart.
Vector store and retriever – To house the different embeddings we have generated, we use a vector store. In this case, we use OpenSearch Service, which allows for similarity search using k-nearest neighbors (k-NN) as well as traditional lexical search. Within our chain object, we define the vector store as the retriever. You can tune this depending on how many documents you want to retrieve.

The following diagram illustrates the solution architecture.

In the following sections, we walk through setting up OpenSearch, followed by exploring the notebook that implements a RAG solution with LangChain, Amazon SageMaker AI, and OpenSearch Service.

Benefits of using OpenSearch Service as a vector store for RAG

In this post, we showcase how you can use a vector store such as OpenSearch Service as a knowledge base and embedding store. OpenSearch Service offers several advantages when used for RAG in conjunction with SageMaker AI:

Performance – Efficiently handles large-scale data and search operations
Advanced search – Offers full-text search, relevance scoring, and semantic capabilities
AWS integration – Seamlessly integrates with SageMaker AI and other AWS services
Real-time updates – Supports continuous knowledge base updates with minimal delay
Customization – Allows fine-tuning of search relevance for optimal context retrieval
Reliability – Provides high availability and fault tolerance through a distributed architecture
Analytics – Provides analytical features for data understanding and performance improvement
Security – Offers robust features such as encryption, access control, and audit logging
Cost-effectiveness – Serves as an economical solution compared to proprietary vector databases
Flexibility – Supports various data types and search algorithms, offering versatile storage and retrieval options for RAG applications

You can use SageMaker AI with OpenSearch Service to create powerful and efficient RAG systems. SageMaker AI provides the machine learning (ML) infrastructure for training and deploying your language models, and OpenSearch Service serves as an efficient and scalable knowledge base for retrieval.

OpenSearch Service optimization strategies for RAG

Based on our learnings from the hundreds of RAG applications deployed using OpenSearch Service as a vector store, we’ve developed several best practices:

If you are starting from a clean slate and want to move quickly with something simple, scalable, and high-performing, we recommend using an Amazon OpenSearch Serverless vector store collection. With OpenSearch Serverless, you benefit from automatic scaling of resources, decoupling of storage, indexing compute, and search compute, with no node or shard management, and you only pay for what you use.
If you have a large-scale production workload and want to take the time to tune for the best price-performance and the most flexibility, you can use an OpenSearch Service managed cluster. In a managed cluster, you pick the node type, node size, number of nodes, and number of shards and replicas, and you have more control over when to scale your resources. For more details on best practices for operating an OpenSearch Service managed cluster, see Operational best practices for Amazon OpenSearch Service.
OpenSearch supports both exact k-NN and approximate k-NN. Use exact k-NN if the number of documents or vectors in your corpus is less than 50,000 for the best recall. For use cases where the number of vectors is greater than 50,000, exact k-NN will still provide the best recall but might not provide sub-100 millisecond query performance. Use approximate k-NN in use cases above 50,000 vectors for the best performance.
OpenSearch uses algorithms from the NMSLIB, Faiss, and Lucene libraries to power approximate k-NN search. There are pros and cons to each k-NN engine, but we find that most customers choose Faiss due to its overall performance in both indexing and search as well as the variety of different quantization and algorithm options that are supported and the broad community support.
Within the Faiss engine, OpenSearch supports both Hierarchical Navigable Small World (HNSW) and Inverted File System (IVF) algorithms. Most customers find HNSW to have better recall than IVF and choose it for their RAG use cases. To learn more about the differences between these engine algorithms, see Vector search.
To reduce the memory footprint to lower the cost of the vector store while keeping the recall high, you can start with Faiss HNSW 16-bit scalar quantization. This can also reduce search latencies and improve indexing throughput when used with SIMD optimization.
If using an OpenSearch Service managed cluster, refer to Performance tuning for additional recommendations.

Prerequisites

Make sure you have access to one ml.g5.4xlarge and ml.g5.2xlarge instance each in your account. A secret should be created in the same region as the stack is deployed.Then complete the following prerequisite steps to create a secret using AWS Secrets Manager:

On the Secrets Manager console, choose Secrets in the navigation pane.
Choose Store a new secret.

For Secret type, select Other type of secret.
For Key/value pairs, on the Plaintext tab, enter a complete password.
Choose Next.

For Secret name, enter a name for your secret.
Choose Next.

Under Configure rotation, keep the settings as default and choose Next.

Choose Store to save your secret.

On the secret details page, note the secret Amazon Resource Name (ARN) to use in the next step.

Create an OpenSearch Service cluster and SageMaker notebook

We use AWS CloudFormation to deploy our OpenSearch Service cluster, SageMaker notebook, and other resources. Complete the following steps:

Launch the following CloudFormation template.
Provide the ARN of the secret you created as a prerequisite and keep the other parameters as default.

Choose Create to create your stack, and wait for the stack to complete (about 20 minutes).
When the status of the stack is CREATE_COMPLETE, note the value of OpenSearchDomainEndpoint on the stack Outputs tab.
Locate SageMakerNotebookURL in the outputs and choose the link to open the SageMaker notebook.

Run the SageMaker notebook

After you have launched the notebook in JupyterLab, complete the following steps:

Go to genai-recipes/RAG-recipes/llama3-RAG-Opensearch-langchain-SMJS.ipynb.

You can also clone the notebook from the GitHub repo.

Update the value of OPENSEARCH_URL in the notebook with the value copied from OpenSearchDomainEndpoint in the previous step (look for os.environ['OPENSEARCH_URL'] = ""). The port needs to be 443.
Run the cells in the notebook.

The notebook provides a detailed explanation of all the steps. We explain some of the key cells in the notebook in this section.

For the RAG workflow, we deploy the huggingface-sentencesimilarity-bge-large-en-v1-5 embedding model and meta-textgeneration-llama-3-8b-instruct LLM from Hugging Face. SageMaker JumpStart simplifies this process because the model artifacts, data, and container specifications are all prepackaged for optimal inference. These are then exposed using the SageMaker Python SDK high-level API calls, which let you specify the model ID for deployment to a SageMaker real-time endpoint:


 sagemaker.jumpstart.model  JumpStartModel

model_id  "meta-textgeneration-llama-3-8b-instruct"
accept_eula  
model  JumpStartModel(model_idmodel_id)
llm_predictor  modeldeploy(accept_eulaaccept_eula)

model_id  "huggingface-sentencesimilarity-bge-large-en-v1-5"
text_embedding_model  JumpStartModel(model_idmodel_id)
embedding_predictor  text_embedding_modeldeploy()

Content handlers are crucial for formatting data for SageMaker endpoints. They transform inputs into the format expected by the model and handle model-specific parameters like temperature and token limits. These parameters can be tuned to control the creativity and consistency of the model’s responses.

class Llama38BContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        payload = {
            "inputs": prompt,
            "parameters": {
                "max_new_tokens": 1000,
                "top_p": 0.9,
                "temperature": 0.6,
                "stop": ["<|eot_id|>"],
            },
        }
        input_str = json.dumps(
            payload,
        )
        #print(input_str)
        return input_str.encode("utf-8")

We use PyPDFLoader from LangChain to load PDF files, attach metadata to each document fragment, and then use RecursiveCharacterTextSplitter to break the documents into smaller, manageable chunks. The text splitter is configured with a chunk size of 1,000 characters and an overlap of 100 characters, which helps maintain context between chunks. This preprocessing step is crucial for effective document retrieval and embedding generation, because it makes sure the text segments are appropriately sized for the embedding model and the language model used in the RAG system.

import numpy as np
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
documents = []
for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]
    documents += document
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=1000,
    chunk_overlap=100,
)
docs = text_splitter.split_documents(documents)
print(docs[100])

The following block initializes a vector store using OpenSearch Service for the RAG system. It converts preprocessed document chunks into vector embeddings using a SageMaker model and stores them in OpenSearch Service. The process is configured with security measures like SSL and authentication to provide secure data handling. The bulk insertion is optimized for performance with a sizeable batch size. Finally, the vector store is wrapped with VectorStoreIndexWrapper, providing a simplified interface for operations like querying and retrieval. This setup creates a searchable database of document embeddings, enabling quick and relevant context retrieval for user queries in the RAG pipeline.

from langchain.indexes.vectorstore import VectorStoreIndexWrapper
# Initialize OpenSearchVectorSearch
vectorstore_opensearch = OpenSearchVectorSearch.from_documents(
    docs,
    sagemaker_embeddings,
    http_auth=awsauth,  # Auth will use the IAM role
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    bulk_size=2000  # Increase this to accommodate the number of documents you have
)
# Wrap the OpenSearch vector store with the VectorStoreIndexWrapper
wrapper_store_opensearch = VectorStoreIndexWrapper(vectorstore=vectorstore_opensearch)

Next, we use the wrapper from the previous step along with the prompt template. We define the prompt template for interacting with the Meta Llama 3 8B Instruct model in the RAG system. The template uses specific tokens to structure the input in a way that the model expects. It sets up a conversation format with system instructions, user query, and a placeholder for the assistant’s response. The PromptTemplate class from LangChain is used to create a reusable prompt with a variable for the user’s query. This structured approach to prompt engineering helps maintain consistency in the model’s responses and guides it to act as a helpful assistant.

prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["query"]
)
query = "How did AWS perform in 2021?"

answer = wrapper_store_opensearch.query(question=PROMPT.format(query=query), llm=llm)
print(answer)

Similarly, the notebook also shows how to use Retrieval QA, where you can customize how the documents fetched should be added to prompt using the chain_type parameter.

Clean up

Delete your SageMaker endpoints from the notebook to avoid incurring costs:

# Delete resources
llm_predictor.delete_model()
llm_predictor.delete_endpoint()
embedding_predictor.delete_model()
embedding_predictor.delete_endpoint()

Next, delete your OpenSearch cluster to stop incurring additional charges:aws cloudformation delete-stack --stack-name rag-opensearch

Conclusion

RAG has revolutionized how businesses use AI by enabling general-purpose language models to work seamlessly with company-specific data. The key benefit is the ability to create AI systems that combine broad knowledge with up-to-date, proprietary information without expensive model retraining. This approach transforms customer engagement and internal operations by delivering personalized, accurate, and timely responses based on the latest company data. The RAG workflow—comprising input prompt, document retrieval, contextual generation, and output—allows businesses to tap into their vast repositories of internal documents, policies, and data, making this information readily accessible and actionable. For businesses, this means enhanced decision-making, improved customer service, and increased operational efficiency. Employees can quickly access relevant information, while customers receive more accurate and personalized responses. Moreover, RAG’s cost-efficiency and ability to rapidly iterate make it an attractive solution for businesses looking to stay competitive in the AI era without constant, expensive updates to their AI systems. By making general-purpose LLMs work effectively on proprietary data, RAG empowers businesses to create dynamic, knowledge-rich AI applications that evolve with their data, potentially transforming how companies operate, innovate, and engage with both employees and customers.

SageMaker JumpStart has streamlined the process of developing and deploying generative AI applications. It offers pre-trained models, user-friendly interfaces, and seamless scalability within the AWS ecosystem, making it straightforward for businesses to harness the power of RAG.

Furthermore, using OpenSearch Service as a vector store facilitates swift retrieval from vast information repositories. This approach not only enhances the speed and relevance of responses, but also helps manage costs and operational complexity effectively.

By combining these technologies, you can create robust, scalable, and efficient RAG systems that provide up-to-date, context-aware responses to customer queries, ultimately enhancing user experience and satisfaction.

To get started with implementing this Retrieval Augmented Generation (RAG) solution using Amazon SageMaker JumpStart and Amazon OpenSearch Service, check out the example notebook on GitHub. You can also learn more about Amazon OpenSearch Service in the developer guide.

About the authors

Vivek Gangasani is a Lead Specialist Solutions Architect for Inference at AWS. He helps emerging generative AI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of large language models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

Harish Rao is a Senior Solutions Architect at AWS, specializing in large-scale distributed AI training and inference. He empowers customers to harness the power of AI to drive innovation and solve complex challenges. Outside of work, Harish embraces an active lifestyle, enjoying the tranquility of hiking, the intensity of racquetball, and the mental clarity of mindfulness practices.

Raghu Ramesha is an ML Solutions Architect. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Sohaib Katariwala is a Sr. Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. His interests are in all things data and analytics. More specifically he loves to help customers use AI in their data strategy to solve modern day challenges.

Karan Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Source link

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

Published

3 days ago

July 3, 2025

Andrew Stellman

Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying.

But if you’ve spent any real time coding with AI, you’ve probably hit a point where things stall. You keep refining your prompt and adjusting your approach, but the model keeps generating the same kind of answer, just phrased a little differently each time, and returning slight variations on the same incomplete solution. It feels close, but it’s not getting there. And worse, it’s not clear how to get back on track.

That moment is familiar to a lot of people trying to apply AI in real work. It’s what my recent talk at O’Reilly’s AI Codecon event was all about.

Over the last two years, while working on the latest edition of Head First C#, I’ve been developing a new kind of learning path, one that helps developers get better at both coding and using AI. I call it Sens-AI, and it came out of something I kept seeing:

There’s a learning gap with AI that’s creating real challenges for people who are still building their development skills.

My recent O’Reilly Radar article “Bridging the AI Learning Gap” looked at what happens when developers try to learn AI and coding at the same time. It’s not just a tooling problem—it’s a thinking problem. A lot of developers are figuring things out by trial and error, and it became clear to me that they needed a better way to move from improvising to actually solving problems.

From Vibe Coding to Problem Solving

Ask developers how they use AI, and many will describe a kind of improvisational prompting strategy: Give the model a task, see what it returns, and nudge it toward something better. It can be an effective approach because it’s fast, fluid, and almost effortless when it works.

That pattern is common enough to have a name: vibe coding. It’s a great starting point, and it works because it draws on real prompt engineering fundamentals—iterating, reacting to output, and refining based on feedback. But when something breaks, the code doesn’t behave as expected, or the AI keeps rehashing the same unhelpful answers, it’s not always clear what to try next. That’s when vibe coding starts to fall apart.

Senior developers tend to pick up AI more quickly than junior ones, but that’s not a hard-and-fast rule. I’ve seen brand-new developers pick it up quickly, and I’ve seen experienced ones get stuck. The difference is in what they do next. The people who succeed with AI tend to stop and rethink: They figure out what’s going wrong, step back to look at the problem, and reframe their prompt to give the model something better to work with.

When developers think critically, AI works better. (slide from my May 8, 2025, talk at O’Reilly AI Codecon)

The Sens-AI Framework

As I started working more closely with developers who were using AI tools to try to find ways to help them ramp up more easily, I paid attention to where they were getting stuck, and I started noticing that the pattern of an AI rehashing the same “almost there” suggestions kept coming up in training sessions and real projects. I saw it happen in my own work too. At first it felt like a weird quirk in the model’s behavior, but over time I realized it was a signal: The AI had used up the context I’d given it. The signal tells us that we need a better understanding of the problem, so we can give the model the information it’s missing. That realization was a turning point. Once I started paying attention to those breakdown moments, I began to see the same root cause across many developers’ experiences: not a flaw in the tools but a lack of framing, context, or understanding that the AI couldn’t supply on its own.

Over time—and after a lot of testing, iteration, and feedback from developers—I distilled the core of the Sens-AI learning path into five specific habits. They came directly from watching where learners got stuck, what kinds of questions they asked, and what helped them move forward. These habits form a framework that’s the intellectual foundation behind how Head First C# teaches developers to work with AI:

Context: Paying attention to what information you supply to the model, trying to figure out what else it needs to know, and supplying it clearly. This includes code, comments, structure, intent, and anything else that helps the model understand what you’re trying to do.
Research: Actively using AI and external sources to deepen your own understanding of the problem. This means running examples, consulting documentation, and checking references to verify what’s really going on.
Problem framing: Using the information you’ve gathered to define the problem more clearly so the model can respond more usefully. This involves digging deeper into the problem you’re trying to solve, recognizing what the AI still needs to know about it, and shaping your prompt to steer it in a more productive direction—and going back to do more research when you realize that it needs more context.
Refining: Iterating your prompts deliberately. This isn’t about random tweaks; it’s about making targeted changes based on what the model got right and what it missed, and using those results to guide the next step.
Critical thinking: Judging the quality of AI output rather than just simply accepting it. Does the suggestion make sense? Is it correct, relevant, plausible? This habit is especially important because it helps developers avoid the trap of trusting confident-sounding answers that don’t actually work.

These habits let developers get more out of AI while keeping control over the direction of their work.

From Stuck to Solved: Getting Better Results from AI

I’ve watched a lot of developers use tools like Copilot and ChatGPT—during training sessions, in hands-on exercises, and when they’ve asked me directly for help. What stood out to me was how often they assumed the AI had done a bad job. In reality, the prompt just didn’t include the information the model needed to solve the problem. No one had shown them how to supply the right context. That’s what the five Sens-AI habits are designed to address: not by handing developers a checklist but by helping them build a mental model for how to work with AI more effectively.

In my AI Codecon talk, I shared a story about my colleague Luis, a very experienced developer with over three decades of coding experience. He’s a seasoned engineer and an advanced AI user who builds content for training other developers, works with large language models directly, uses sophisticated prompting techniques, and has built AI-based analysis tools.

Luis was building a desktop wrapper for a React app using Tauri, a Rust-based toolkit. He pulled in both Copilot and ChatGPT, cross-checking output, exploring alternatives, and trying different approaches. But the code still wasn’t working.

Each AI suggestion seemed to fix part of the problem but break another part. The model kept offering slightly different versions of the same incomplete solution, never quite resolving the issue. For a while, he vibe-coded through it, adjusting the prompt and trying again to see if a small nudge would help, but the answers kept circling the same spot. Eventually, he realized the AI had run out of context and changed his approach. He stepped back, did some focused research to better understand what the AI was trying (and failing) to do, and applied the same habits I emphasize in the Sens-AI framework.

That shift changed the outcome. Once he understood the pattern the AI was trying to use, he could guide it. He reframed his prompt, added more context, and finally started getting suggestions that worked. The suggestions only started working once Luis gave the model the missing pieces it needed to make sense of the problem.

Applying the Sens-AI Framework: A Real-World Example

Before I developed the Sens-AI framework, I ran into a problem that later became a textbook case for it. I was curious whether COBOL, a decades-old language developed for mainframes that I had never used before but wanted to learn more about, could handle the basic mechanics of an interactive game. So I did some experimental vibe coding to build a simple terminal app that would let the user move an asterisk around the screen using the W/A/S/D keys. It was a weird little side project—I just wanted to see if I could make COBOL do something it was never really meant for, and learn something about it along the way.

The initial AI-generated code compiled and ran just fine, and at first I made some progress. I was able to get it to clear the screen, draw the asterisk in the right place, handle raw keyboard input that didn’t require the user to press Enter, and get past some initial bugs that caused a lot of flickering.

But once I hit a more subtle bug—where ANSI escape codes like ";10H" were printing literally instead of controlling the cursor—ChatGPT got stuck. I’d describe the problem, and it would generate a slightly different version of the same answer each time. One suggestion used different variable names. Another changed the order of operations. A few attempted to reformat the STRING statement. But none of them addressed the root cause.

*The COBOL app with a bug, printing a raw escape sequence instead of moving the asterisk.*

The pattern was always the same: slight code rewrites that looked plausible but didn’t actually change the behavior. That’s what a rehash loop looks like. The AI wasn’t giving me worse answers—it was just circling, stuck on the same conceptual idea. So I did what many developers do: I assumed the AI just couldn’t answer my question and moved on to another problem.

At the time, I didn’t recognize the rehash loop for what it was. I assumed ChatGPT just didn’t know the answer and gave up. But revisiting the project after developing the Sens-AI framework, I saw the whole exchange in a new light. The rehash loop was a signal that the AI needed more context. It got stuck because I hadn’t told it what it needed to know.

When I started working on the framework, I remembered this old failure and thought it’d be a perfect test case. Now I had a set of steps that I could follow:

First, I recognized that the AI had run out of context. The model wasn’t failing randomly—it was repeating itself because it didn’t understand what I was asking it to do.
Next, I did some targeted research. I brushed up on ANSI escape codes and started reading the AI’s earlier explanations more carefully. That’s when I noticed a detail I’d skimmed past the first time while vibe coding: When I went back through the AI explanation of the code that it generated, I saw that the PIC ZZ COBOL syntax defines a numeric-edited field. I suspected that could potentially cause it to introduce leading spaces into strings and wondered if that could break an escape sequence.
Then I reframed the problem. I opened a new chat and explained what I was trying to build, what I was seeing, and what I suspected. I told the AI I’d noticed it was circling the same solution and treated that as a signal that we were missing something fundamental. I also told it that I’d done some research and had three leads I suspected were related: how COBOL displays multiple items in sequence, how terminal escape codes need to be formatted, and how spacing in numeric fields might be corrupting the output. The prompt didn’t provide answers; it just gave some potential research areas for the AI to investigate. That gave it what it needed to find the additional context it needed to break out of the rehash loop.
Once the model was unstuck, I refined my prompt. I asked follow-up questions to clarify exactly what the output should look like and how to construct the strings more reliably. I wasn’t just looking for a fix—I was guiding the model toward a better approach.
And most of all, I used critical thinking. I read the answers closely, compared them to what I already knew, and decided what to try based on what actually made sense. The explanation checked out. I implemented the fix, and the program worked.

*My prompt that broke ChatGPT out of its rehash loop*

Once I took the time to understand the problem—and did just enough research to give the AI a few hints about what context it was missing—I was able to write a prompt that broke ChatGPT out of the rehash loop, and it generated code that did exactly what I needed. The generated code for the working COBOL app is available in this GitHub GIST.

*The working COBOL app that moves an asterisk around the screen*

Why These Habits Matter for New Developers

I built the Sens-AI learning path in Head First C# around the five habits in the framework. These habits aren’t checklists, scripts, or hard-and-fast rules. They’re ways of thinking that help people use AI more productively—and they don’t require years of experience. I’ve seen new developers pick them up quickly, sometimes faster than seasoned developers who didn’t realize they were stuck in shallow prompting loops.

The key insight into these habits came to me when I was updating the coding exercises in the most recent edition of Head First C#. I test the exercises using AI by pasting the instructions and starter code into tools like ChatGPT and Copilot. If they produce the correct solution, that means I’ve given the model enough information to solve it—which means I’ve given readers enough information too. But if it fails to solve the problem, something’s missing from the exercise instructions.

The process of using AI to test the exercises in the book reminded me of a problem I ran into in the first edition, back in 2007. One exercise kept tripping people up, and after reading a lot of feedback, I realized the problem: I hadn’t given readers all the information they needed to solve it. That helped connect the dots for me. The AI struggles with some coding problems for the same reason the learners were struggling with that exercise—because the context wasn’t there. Writing a good coding exercise and writing a good prompt both depend on understanding what the other side needs to make sense of the problem.

That experience helped me realize that to make developers successful with AI, we need to do more than just teach the basics of prompt engineering. We need to explicitly instill these thinking habits and give developers a way to build them alongside their core coding skills. If we want developers to succeed, we can’t just tell them to “prompt better.” We need to show them how to think with AI.

Where We Go from Here

If AI really is changing how we write software—and I believe it is—then we need to change how we teach it. We’ve made it easy to give people access to the tools. The harder part is helping them develop the habits and judgment to use them well, especially when things go wrong. That’s not just an education problem; it’s also a design problem, a documentation problem, and a tooling problem. Sens-AI is one answer, but it’s just the beginning. We still need clearer examples and better ways to guide, debug, and refine the model’s output. If we teach developers how to think with AI, we can help them become not just code generators but thoughtful engineers who understand what their code is doing and why it matters.

Source link

Books, Courses & Certifications

Honoring service, empowering futures: Coursera’s partnership with United Services Organization

Published

3 days ago

July 3, 2025

Laura Michetti

By Isa Rivera, Military ERG Lead, Coursera

On Independence Day, we celebrate the freedoms we enjoy — and we’re proud to support U.S. service members and military spouses with job-relevant skills to help build their futures.

As a leader of Coursera’s Military Employee Resource Group (ERG) and a member of a military family, I’m especially proud of our work supporting military organizations through our social impact program, which provides free learning to over 100 nonprofits.

Today, I want to highlight our collaboration with the United Services Organization (USO), a nonprofit dedicated to strengthening service members that offers free access to Coursera certificate programs alongside the USO’s wraparound services. Together, we’ve helped over 10,000 military learners build the skills and credentials they need to advance their careers.

“This partnership has always been such a strong pairing since we started back in April 2021 and it has been incredible to watch the collective impact Coursera and the USO have made to amplify career opportunities for our military community,” said Lisa Elswick, USO Vice President of Programs.

Creating pathways from service to civilian careers

In 2024, the USO Transition Program created over 10,000 personalized Action Plans to help service members and military spouses advance their careers, which include career counseling and access to the job-relevant catalog on Coursera. More than 75% of the enrollments came from active service members, with the Army and Navy being the most active military branches.

One learner, U.S. Army Capt. Philip H., had a degree in mechanical engineering before spending nine years in the Army, with four of those in special forces. He said, “I was interested in broadening my skill set as much as possible to make myself a more marketable candidate in the civilian workforce.”

Philip achieved his ultimate goal of becoming a software engineer after completing the IBM Data Analyst Certificate on Coursera.

Top certificate programs among USO members

Coursera offers over 90+ Entry-level Professional Certificates that build job-ready skills in high-demand fields – no degree or previous experience required. In 2024, certificate completions through the USO Program rose 97% over the previous year. Industry-relevant skills shined with the most popular certificates being:

We’re humbled to partner with the USO and other nonprofits to help military learners build the skills they need to shape their future.

Source link

Books, Courses & Certifications

Large Public Libraries Give Young Adults Across U.S. Access to Banned Books

Published

4 days ago

July 3, 2025

Claire Woodcock

Young adults are finding it harder to borrow books reflective of their lived experiences in their schools and public libraries. It isn’t because these stories don’t exist — they do — but because they’ve been challenged and removed, restricted, or were never purchased at all.

This is especially true in parts of the country where state legislatures have enacted laws criminalizing what educators can and can’t say about politically, religiously, or morally divisive topics, as well as regions where public services are underfunded and access to books is already scarce.

But in recent years, a handful of urban library systems have stepped up to offer readers who are at least 13 years old a chance to read the books that might be unavailable in their home areas.

Since 2022, thousands of eligible young adults have registered for a little-known program called Books Unbanned, which Brooklyn Public Library in New York created that year to counter efforts to restrict access to certain books.

Books Unbanned’s popularity among young readers — more than 8,000 have signed up — comes amid record-breaking book censorship efforts, according to data compiled by the American Library Association. The ALA’s Office of Intellectual Freedom has tracked a more-than-400-percent increase in the number of reported book challenges in the U.S. between 2020 and 2024. The challenges reported to the ALA in 2024 alone targeted 2,452 titles.

The Supreme Court’s recent ruling to allow parents to pull their children out of classroom discussions around books covering LGBTQ+ and other themes that may conflict with their religious beliefs could embolden efforts to restrict more titles.

Brooklyn’s program gives readers between 13 and 21 anywhere in the country the ability to opt in. As it turns out, its digital “banned book” library cards are a bit of a misnomer because they also provide access to materials unaffected by bans.

“It’s our entire book collection,” said Amy Mikel, director of customer experience and librarian at Brooklyn Public Library. “Half a million items. You can read whatever you want” that’s in a digital format.

The Brooklyn library’s records show Books Unbanned cardholders are collectively borrowing more than 100,000 unique titles a year, many of which have nothing to do with the most frequently challenged subjects for youth, such as race, sex, gender, or lived experiences that are decidedly difficult or hard to read.

“Obviously there are people who write to us and say, ‘thank you so much — now I can access the books that have been taken away from me,’” said Mikel. “But the fact is that these young people are accessing books that are not controversial at all.”

Other libraries have since launched their own programs, though not every library can afford to provide the level of access Brooklyn’s program does.

Private Funding

Each program is based on different parameters that are largely determined by the level of private funding libraries receive and the subsequent licensing agreements they’re able to secure.

Because most libraries with foundations are based in major cities, so far all of the programs come from urban libraries receiving robust support from their respective foundations, which raise money in addition to the funding they’ve historically received from the federal government to cover operational costs.

Many public libraries have “Friends of the Library” groups that raise money and advocate for their libraries by organizing community events such as used-book sales. Some foundations for larger library systems attract large philanthropic gifts that can pay for specific licenses negotiated with publishers. These negotiations often determine what type of digital book access libraries can afford to provide patrons.

The breadth of access differs among libraries. While Seattle Public Library’s Books Unbanned e-card gives young adults up to age 26 access to its entire OverDrive collection and is open to readers throughout the U.S., the LA County Library Books Unbanned program is limited to teens 13 to 18, and is available only to residents of California.

Boston Public Library and San Diego Public Library took a more refined approach to their Books Unbanned programs. Both offer access to young adults who register throughout the U.S., but their collections are limited to frequently challenged or banned titles.

Each of the participating libraries encourage young adults to apply for as many banned book e-cards as they’re eligible for to make use of as many collections as possible.

Empty Shelves

What Brooklyn Public Library did wasn’t novel in terms of what librarians routinely do. But it was innovative in the sense that it re-envisioned big ideas — like what is a service area in the post-digital age. Books Unbanned responded to a perceived threat to young adults’ First Amendment rights to receive information. The perceived threat has escalated.

Since the program launched, a patchwork of legislation across several states criminalizes teachers to varying degrees for what they say about sexual orientation, gender identity or racial ideology in an educational context. Moms for Liberty targeted young adult books with LGBTQ+ and BIPOC characters. The group’s website cites passages about sexual content from young adult books out of context and then rates them according to its own proprietary system. This website equipped adults with the quotes they needed to challenge books on school library shelves, leading to record bans nearly every year since 2021.

In rural areas, the problem is less likely to be book challenges but instead chronic underfunding of library services.

“This program wouldn’t need to exist if everybody just had access to a robust digital collection where they live,” said Mikel at Brooklyn Public Library.

Participating libraries invite cardholders to share their experiences with book censorship when they sign up or renew a banned book card. Last year, Brooklyn Public Library and Seattle Public Library issued a report documenting how teenagers and young adults are encountering censorship in their communities.

Teens reported witnessing the obvious shrinking of collections, with gaps on shelves where certain books used to be. They also said that if they do have access to a library, that its collection was dated or limited. And some reported intentional self-censorship: Jennifer Jenkins, deputy director of customer experience with the San Diego Public Library, heard from several young adults who said they could check out a frequently challenged book from their local library, but they chose not to in order to protect their teachers and librarians from retaliation.

Cardholders also cite state-specific legislation that alters what their teachers can teach and their libraries can shelve, and librarians who draw unwanted attention to the age-appropriateness of the titles they check out. This aligns with other restrictive policies some libraries have introduced, including age limitations, parental permissions, content warning labels, and removing tags from online catalogs, which makes certain books harder to find in the system.

Mikel in Brooklyn says restrictions can be hard to measure but can significantly impact a young adult’s ability to access information.

“When people say things like, ‘It’s not a book ban, we just removed it from the school library,’” Mikel said. “In some cases, removing that book from that one place of access is effectively erasing the book altogether from that young person’s life.”

Tacit censorship resulting from restrictive lending policies is harder for researchers to track.

“Most librarians work really hard to give their students what they need, but there are certainly a group of librarians who just aren’t comfortable with these trends of LGBTQ+ and BIPOC literature,” said Tasslyn Magnussun, an independent consultant for PEN America and other groups tracking the rise of book censorship. “So there’s what was purchased and what wasn’t purchased: Self-censorship before the rise of big censorship.”

Limits of Privacy

The types of censorship librarians are experiencing is also true of teachers. A 2024 RAND Corporation report found that while roughly half of K-12 public school teachers face some sort of state or district policies that limit what they can say about political and social issues, some teachers are still more likely to avoid certain topics even with supportive administrators and parents. Jenkins says digital cardholder comments give library workers in urban systems more insight into how the cards are affecting librarians outside major metropolitan areas.

“There is a chilling effect happening, self-censorship, where it’s affecting the decision-making ability of educated, trained, [and] skilled librarians and educators, in terms of selecting materials that are age-appropriate and appropriate for various readers,” Jenkins said. “It’s inadvertently causing people to make more conservative choices just by default.”

Part of the appeal for Books Unbanned e-card holders is some semblance of a private reading life. And while the librarians involved in the program through their institutions are committed to connecting readers with the titles they want to read, access doesn’t necessarily come easily to everyone because it’s not safe to assume every young adult has a device with e-reader capabilities, reliable internet access or working headphones. Or privacy, for that matter.

In the case of digital books, librarians work closely with vendors to secure licenses to circulate ebook and audiobook copies of titles. These professional partnerships are sometimes fraught. Part of that has to do with librarians having to relinquish control over infrastructure and access to the vendors’ applications, which take users from the library’s website to platforms like Libby. This is different from how physical book vendors work with libraries. Once books are ordered from a distributor, they belong to the library. Libraries don’t have to keep paying for digital borrowers. The digital rules don’t apply.

One criticism librarians have of vendor software is that it’s designed to support the licensing model for publishers but not the end-users facing challenges to their First Amendment rights. Vendors are facing pressure to comply with legislation in states where the right to receive information through school curriculums and library collections is vulnerable.

Take, for instance, Destiny, a widely used book checkout system in school libraries across the country. In 2022, its parent company announced and quickly walked back that it was considering a parental control module in its Destiny software to address requests to opt out of LGBTQ+ tagged books. But the company canceled the feature after librarians pointed out how it could be abused by releasing their library checkout history and placing borrowing restrictions on accounts — in violation of both the American Library Association’s Library Bill of Rights and student privacy rights under the Family Educational Rights and Privacy Act (FERPA).

Melissa Andrews, Boston Public Library’s chief of collection management, says it’s important for libraries to retain the ability to opt out of contractual clauses. Without it, digital contracts could result in a book being removed from circulation for everyone, including young adults living in areas without book bans.

“Once it’s coded into that software, it makes it easier for other libraries to do that without the law in place,” said Andrews. “And it also doesn’t necessarily go away if our culture changes in three to four years.”

InterLibrary Loan Threatened

In certain parts of the country, searching for the nearest copy of a frequently banned, challenged, or restricted book through the Worldcat catalog might show one that is 200 miles away, creating an ersatz banned-book desert akin to a news desert.

What’s more, libraries are vulnerable to the whims of political spending. The Trump Administration’s budget, if passed, is expected to result in the elimination of InterLibrary Loan for most institutions, unless they have the money in their budgets to opt in.

“The amount [for] my library to buy into the InterLibrary Loan system, if it’s not [federally] funded, is like the size of our entire budget,” Magnussun said. “There’s just no way our tiny little one-room library would be able to participate. So then those kids are definitely not getting those books.”

If InterLibrary Loan became too expensive for most libraries, it would put more pressure on the resources belonging to libraries participating in Books Unbanned. Such an outcome raises important questions about young readers in rural America accessing digital books from just a handful of well-resourced urban libraries hundreds of miles away. But Magnussun says the cost of not making the books accessible for queer and Brown youth, especially, is worse.

“There’s a question of a balance between, what’s the ideal situation — certainly not having [only] three libraries in the country fund the only LGBTQ+ literature that will be available to young people, but that’s where we are at this moment in time,” said Magnussun of PEN. “What I don’t want to see people doing, especially the library organizations, is [saying], ‘Oh, problem solved. We’re going to have Brooklyn Public Library or San Diego carry the rest of the country.’

“Because,” Magnussun adds, “that’s not right.”

Mikel said Brooklyn and other participating libraries are looking for new participant libraries. She remains confident in the program’s private funding even amid interference from groups and lawmakers in favor of bans. But despite the interest in Books Unbanned, most knowledge workers agree that it’s far from ideal. The program should be regarded as a stop-gap while communities wrestle with the tougher question of censorship.

“We’re proud of this initiative — it’s really important, but this is not the solution to anything,” said Andrews at Boston Public Library. Yet for the young readers putting their banned book e-library cards to use, “[H]opefully it helps right now.”

Source link

Funding & Business6 days ago

Kayak and Expedia race to build AI travel agents that turn social posts into itineraries

Jobs & Careers6 days ago

Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding

Mergers & Acquisitions6 days ago

Donald Trump suggests US government review subsidies to Elon Musk’s companies

Funding & Business6 days ago

Rethinking Venture Capital’s Talent Pipeline

Jobs & Careers5 days ago

Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)

Funding & Business3 days ago

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

Jobs & Careers5 days ago

Telangana Launches TGDeX—India’s First State‑Led AI Public Infrastructure

Jobs & Careers3 days ago

Ilya Sutskever Takes Over as CEO of Safe Superintelligence After Daniel Gross’s Exit

Funding & Business6 days ago

From chatbots to collaborators: How AI agents are reshaping enterprise work

Funding & Business3 days ago

Dust hits $6M ARR helping enterprises build AI agents that actually do stuff instead of just talking

aistoriz.com

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Books, Courses & Certifications

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Solution overview

Benefits of using OpenSearch Service as a vector store for RAG

OpenSearch Service optimization strategies for RAG

Prerequisites

Create an OpenSearch Service cluster and SageMaker notebook

Run the SageMaker notebook

Clean up

Conclusion

About the authors

Leave a Reply
Cancel reply

Leave a Reply

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Books, Courses & Certifications

Honoring service, empowering futures: Coursera’s partnership with United Services Organization

Books, Courses & Certifications

Large Public Libraries Give Young Adults Across U.S. Access to Banned Books

Private Funding

Empty Shelves

Limits of Privacy

InterLibrary Loan Threatened

Trending

aistoriz.com

Optimize RAG in production environments using Amazon SageMaker JumpStart and Amazon OpenSearch Service

Solution overview

Benefits of using OpenSearch Service as a vector store for RAG

OpenSearch Service optimization strategies for RAG

Prerequisites

Create an OpenSearch Service cluster and SageMaker notebook

Run the SageMaker notebook

Clean up

Conclusion

About the authors

You may like

Leave a Reply Cancel reply

Leave a Reply

Books, Courses & Certifications

Teaching Developers to Think with AI – O’Reilly

From Vibe Coding to Problem Solving

The Sens-AI Framework

From Stuck to Solved: Getting Better Results from AI

Applying the Sens-AI Framework: A Real-World Example

Why These Habits Matter for New Developers

Where We Go from Here

Books, Courses & Certifications

Honoring service, empowering futures: Coursera’s partnership with United Services Organization

Books, Courses & Certifications

Large Public Libraries Give Young Adults Across U.S. Access to Banned Books

Private Funding

Empty Shelves

Limits of Privacy

InterLibrary Loan Threatened

Trending

Leave a Reply
Cancel reply