Books, Courses & Certifications

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations

Published

2 months ago

July 25, 2025

This post is co-written with Bogdan Arsenie and Nick Mattei from PerformLine.

PerformLine operates within the marketing compliance industry, a specialized subset of the broader compliance software market, which includes various compliance solutions like anti-money laundering (AML), know your customer (KYC), and others. Specifically, marketing compliance refers to adhering to regulations and guidelines set by government agencies that make sure a company’s marketing, advertising, and sales content and communications are truthful, accurate, and not misleading for consumers. PerformLine is the leading service providing comprehensive compliance oversight across marketing, sales, and partner channels. As pioneers of the marketing compliance industry, PerformLine has conducted over 1.1 billion compliance observations over the past 10+ years, automating the entire compliance process—from pre-publication review of materials to continuous monitoring of consumer-facing channels such as websites, emails, and social media. Trusted by consumer finance brands and global organizations, PerformLine uses AI-driven solutions to protect brands and their consumers, transforming compliance efforts into a competitive advantage.

“Discover. Monitor. Act. This isn’t just our tagline—it’s the foundation of our innovation at PerformLine,” says PerformLine’s CTO Bogdan Arsenie. PerformLine’s engineering team brings these principles to life by developing AI-powered technology solutions. In this post, PerformLine and AWS explore how PerformLine used Amazon Bedrock to accelerate compliance processes, generate actionable insights, and provide contextual data—delivering the speed and accuracy essential for large-scale oversight.

The problem

One of PerformLine’s enterprise customers needed a more efficient process for running compliance checks on newly launched product pages, particularly those that integrate multiple products within the same visual and textual framework. These complex pages often feature overlapping content that can apply to one product, several products, or even all of them at once, necessitating a context-aware interpretation that mirrors how a typical consumer would view and interact with the content. By adopting AWS and the architecture discussed in this post, PerformLine can retrieve and analyze these intricate pages through AI-driven processing, generating detailed insights and contextual data that capture the nuanced interplay between various product elements. After the relevant information is extracted and structured, it’s fed directly into their rules engine, enabling robust compliance checks. This accomplishes a seamless flow, from data ingestion to rules-based analysis. It not only preserves the depth of each product’s presentation but also delivers the speed and accuracy critical to large-scale oversight. Monitoring millions of webpages daily for compliance demands a system that can intelligently parse, extract, and analyze content at scale—much like the approach PerformLine has developed for their enterprise customers. In this dynamic landscape, the ever-evolving nature of web content challenges traditional static parsing, requiring a context-aware and adaptive solution. This architecture not only processes bulk data offline but also delivers near real-time performance for one-time requests, dynamically scaling to manage the diverse complexity of each page. By using AI-powered inference, PerformLine provides comprehensive coverage of every product and marketing element across the web, while striking a careful balance between accuracy, performance, and cost.

Solution overview

With this flexible, adaptable solution, PerformLine can tackle even the most challenging webpages, providing comprehensive coverage when extracting and analyzing web content with multiple products. At the same time, by combining consistency with the adaptability of foundation models (FMs), PerformLine can maintain reliable performance across the diverse range of products and websites their customers monitor. This dual focus on agility and operational consistency makes sure their customers benefit from robust compliance checks and data integrity, without sacrificing the speed or scale needed to remain competitive.

PerformLine’s upstream ingestion pipeline efficiently collects millions of web pages and their associated metadata in a batch process. Downstream assets are submitted to PerformLine’s rules engine and compliance review processes. It was imperative that they not disrupt those processes or introduce cascading changes for this solution.

PerformLine decided to use generative AI and Amazon Bedrock to address their core challenges. Amazon Bedrock allows for a broad selection of models, including Amazon Nova. Amazon Bedrock is continuously expanding feature sets around using FMs at scale. This provides a reliable foundation to build a highly available and efficient content processing system.

PerformLine’s solution incorporates the following key components:

PerformLine implemented a scalable, serverless event-driven architecture (shown in the following diagram) that seamlessly integrates with their existing system, requiring less than a day to develop and deploy. This made it possible to focus on prompt optimization, evaluation, and cost management rather than infrastructure overhead. This architecture allows PerformLine to dynamically parse, extract, and analyze web content with high reliability, flexibility, and cost-efficiency.

The system implements multiple queue types (Incoming, DLQ, Results) and includes error handling mechanisms. Data flows through various AWS services including: Amazon RDS for initial data storage Amazon MQ RabbitMQ for message handling Amazon S3 for asset storage Amazon EventBridge for event management Amazon SQS for queue management AWS Lambda for serverless processing Amazon DynamoDB for NoSQL data storage

PerformLine’s process consists of several steps, including processing (Step 1), event trigger and storage (Steps 2–6), structured output and storage (Step 7), and downstream processing and compliance checks (Steps 8–9):

Millions of pages are processed by an upstream extract, transform, and load (ETL) process from PerformLine’s core systems running on the AWS Cloud.
When a page is retrieved, it triggers an event in the compliance check system.
Amazon S3 allows for storage of the data from a page according to metadata.
EventBridge uses event-driven processing to route Amazon S3 events to Amazon SQS.
Amazon SQS queues messages for processing and enables messages to be retried on failure.
A Lambda Function consumes SQS messages and also scales dynamically to handle even unpredictable workloads:
1. This function uses Amazon Bedrock to perform extraction and generative AI analysis of the content from Amazon SQS. Amazon Bedrock offers the greatest flexibility to choose the right model for the job. For PerformLine’s use case, Amazon’s Nova Pro was best suited for complex requests that require a powerful model but still allows for a high performance to cost ratio. Anthropic’s Claude Haiku model allows for optimized quick calls, where a fast response is paramount for additional processing if needed. Amazon Bedrock features, including Amazon Bedrock Prompt Management and inference profiles are used to increase input code variability without affecting output and reduce complexity in usage of FMs through Amazon Bedrock.
2. The function stores customer-defined product schemas in Amazon DynamoDB, enabling dynamic large language model (LLM) targeting and schema-driven output generation.
Amazon S3 stores the extracted data, which is formatted as structured JSON adhering to the target schema.
EventBridge forwards Amazon S3 events to Amazon SQS, making extracted data available for downstream processing.
Compliance checks and business rules, running on other PerformLine’s systems, are applied to validate and enforce regulatory requirements.

Cost optimizations

The solution offers several cost optimizations, including change data capture (CDC) on the web and strategic multi-pass inference. After a page’s content has been analyzed and formatted, it’s written back to a partition that includes a metadata hash of the asset. This enables upstream processes to determine whether a page has already been processed and if its content has changed. The key benefits of this approach include:

Alleviating redundant processing of the same pages, contributing to PerformLine experiencing a 15% workload reduction in human evaluation tasks. This frees time for human evaluators and allows them focus on critical pages rather than all the pages.
Avoiding reprocessing unchanged pages, dynamically reducing PerformLine’s analysts’ workload by over 50% in addition to deduplication gains.

LLM inference costs can escalate at scale, but context and carefully structured prompts are critical for accuracy. To optimize costs while maintaining precision, PerformLine implemented a multi-pass approach using Amazon Bedrock:

Initial filtering with Amazon Nova Micro – This lightweight model efficiently identifies relevant products with minimal cost.
Targeted extraction with Amazon Nova Lite – Identified products are batched into smaller groups and passed to Amazon Nova Lite for deeper analysis. This keeps PerformLine within token limits while improving extraction accuracy.
Increased accuracy through context-aware processing – By first identifying the target content and then processing it in smaller batches, PerformLine significantly improved accuracy while minimizing token consumption.

Use of Amazon Bedrock

During initial testing, PerformLine quickly realized the need for a more scalable approach to prompt management. Manually tracking multiple prompt versions and templates became inefficient as PerformLine iterated and collaborated.

Amazon Bedrock’s Prompt Management service provided a centralized solution, enabling them to version, manage, and seamlessly deploy prompts to production. After the prompts are deployed, they can be dynamically referenced in AWS Lambda, allowing for flexible configuration. Additionally, by using Amazon Bedrock application profile inference endpoints, PerformLine can dynamically adjust the models the Lambda function invokes, track cost per invocation, and attribute costs to specific application instances through setting up cost tags.

To streamline model interactions, PerformLine chose the Amazon Bedrock Converse API which provides a developer-friendly, standardized interface for model invocation. When combined with inference endpoints and prompt management, a Lambda function using the Amazon Bedrock Converse API becomes highly configurable—PerformLine developers can rapidly test new models and prompts, evaluate results, and iterate without needing to rebuild or redeploy. The simplification of prompt management and ability to deploy various models through Amazon Bedrock is shown in the following diagram.

ML model configuration system showing inference setup, prompt versioning, and environment-specific parameter management

Comprehensive AWS ML model configuration architecture highlighting three main components: Inference System: Model ID integration Profile configuration Content management Inference settings Prompt Management: Version control (V1 and Draft versions) Publish ID tracking Model A specifications Store configurations Environment Control: Separate PROD and DEV paths Environment-specific parameter stores Invoke ID management Engineering iteration tracking

Future plans and enhancements

PerformLine is excited to dive into additional Amazon Bedrock features, including prompt caching and Amazon Bedrock Flows.

With prompt caching, users can checkpoint prompt tokens, effectively caching context for reuse in subsequent API calls. Prompt caching on Amazon Bedrock offers up to 85% latency improvements and 90% cost reduction in comparison to calls without prompt caching. PerformLine sees prompt caching as a feature that will become the standard moving forward. They have a number of use cases for their data, and having the ability to apply further analysis on the same content at a lower cost creates new opportunities for feature expansion and development.

Amazon Bedrock Flows is a visual workflow builder that enables users to orchestrate multi-step generative AI tasks by connecting FMs and APIs without extensive coding. Amazon Bedrock Flows is a next step in simplifying PerformLine’s orchestration of knowledge bases, prompt caching, and even Amazon Bedrock agents in the future. Creating flows can help reduce time to feature deployment and maintenance.

Summary

PerformLine has implemented a highly scalable, serverless, AI-driven architecture that enhances efficiency, cost-effectiveness, and compliance in the web content processing pipeline. By using Amazon Bedrock, EventBridge, Amazon SQS, Lambda, and DynamoDB, they have built a solution that can dynamically scale, optimize AI inference costs, and reduce redundant processing—all while maintaining operational flexibility and compliance integrity. Based on their current volume and workflow, PerformLine is projected to process between 1.5 to 2 million pages daily, from which they expect to extract approximately 400,000 to 500,000 products. Additionally, PerformLine anticipates applying rules to each asset, resulting in about 500,000 rule observations that will require review each day.Throughout the design process PerformLine made sure their solution remains as simple as possible while still delivering operational flexibility and integrity. This approach minimizes complexity, enhances maintainability, and accelerates deployment, empowering them to adapt quickly to evolving business needs without unnecessary overhead.

By using a serverless AI-driven architecture built on Amazon Bedrock, PerformLine helps their customers tackle even the most complex, multi-product webpages with unparalleled accuracy and efficiency. This holistic approach interprets visual and textual elements as a typical consumer would, verifying that every product variant is accurately assessed for compliance. The resulting insights are then fed directly into a rules engine, enabling rapid, data-driven decisions. For PerformLine’s customers, this means less redundant processing, lower operational costs, and a dramatically simplified compliance workflow, all without compromising on speed or accuracy. By reducing the overhead of large-scale data analysis and streamlining compliance checks, PerformLine’s solution ultimately frees teams to focus on driving innovation and delivering value.

About the authors

Bogdan Arsenie is the Chief Technology Officer at PerformLine, with over two decades of experience leading technological innovation across digital advertising, big data, mobile gaming, and social engagement. Bogdan began programming at age 13, customizing bulletin board software to fund his passion for Star Trek memorabilia. He served as PerformLine’s founding CTO from 2007–2009, pioneering their initial compliance platform. Later, as CTO at the Rumie Initiative, he helped scale a global education initiative recognized by Google’s Impact Challenge.

Nick Mattei is a Senior Software Engineer at PerformLine. He is focused on solutions architecture and distributed application development in AWS. Outside of work, Nick is an avid cyclist and skier, always looking for the next great climb or powder day.

Shervin Suresh is a Generative AI Solutions Architect at AWS. He supports generative AI adoption both internally at AWS and externally with fast-growing startup customers. He is passionate about using technology to help improve the lives of people in all aspects. Outside of work, Shervin loves to cook, build LEGO, and collaborate with people on things they are passionate about.

Medha Aiyah is a Solutions Architect at AWS. She graduated from the University of Texas at Dallas with an MS in Computer Science, with a focus on AI/ML. She supports ISV customers in a wide variety of industries, by empowering customers to use AWS optimally to achieve their business goals. She is especially interested in guiding customers on ways to implement AI/ML solutions and use generative AI. Outside of work, Medha enjoys hiking, traveling, and dancing.

Michael Zhang is a generalist Solutions Architect at AWS working with small to medium businesses. He has been with Amazon for over 3 years and uses his background in computer science and machine learning to support customers on AWS. In his free time, Michael loves to hike and explore other cultures.

Source link

Books, Courses & Certifications

Teachers Turn Toward Virtual Schools for Better Work-Life Balance

Published

4 hours ago

September 16, 2025

Lauren Coffey

As Molly Hamill explains the origin of the Declaration of Independence to her students, she dons a white wig fashioned into a ponytail, appearing as John Adams, before sporting a bald cap in homage to Benjamin Franklin, then wearing a red wig to imitate Thomas Jefferson. But instead of looking out to an enraptured sea of 28 fifth graders leaning forward in their desks, she is speaking directly into a camera.

Hamill is one of a growing number of educators who forwent brick-and-mortar schools post-pandemic. She now teaches fully virtually through the public, online school California Virtual Academies, having swapped desks for desktops.

Molly Hamill teaches a lesson about the Declaration of Independence.

After the abrupt shift to virtual schooling during the COVID-19 health crisis — and the stress for many educators because of it — voluntarily choosing the format may seem unthinkable.

“You hear people say, ‘I would never want to go back to virtual,’ and I get it, it was super stressful because we were building the plane as we were flying it, deciding if we were going to have live video or recordings, and adapt all the teaching materials to virtual,” Hamill says. “But my school is a pretty well-oiled machine … there’s a structure already in place. And kids are adaptable, they already like being on a computer.”

And for Hamill, and thousands of other teachers, instructing through a virtual school is a way to attempt striking a rare work-life balance in the education world.

More Flexibility for Teaching Students

The number of virtual schools has grown, as has the number of U.S. children enrolled in them. In the 2022-2023 school year, about 2.5 percent of K-12 students were enrolled in full-time virtual education (1.8 percent of them through public or private online schools, and 0.7 percent as homeschoolers), according to data published in 2024 by the National Center for Education Statistics. And parents reported that 7 percent of students who learned at home that year took at least one virtual course.

There’s been an accompanying rise in the number of teachers instructing remotely via virtual schools.

The number of teachers employed by K12, which is under the parent company Stride Inc. and one of the largest and longest-running providers of virtual schools, has jumped from 6,500 to 8,000 over the last three or four years, says Niyoka McCoy, chief learning officer at the company.

McCoy credits the growth in part to teachers wanting to homeschool their own children, and therefore needing to do their own work from home, but she also thinks it is a sign of a shifting preference for technology-based offerings.

“They think this is the future, that more online programs will open up,” McCoy says.

Connections Academy, which is the parent company of Pearson Online Academy and a similarly long-standing online learning provider, employs 3,500 teachers. Nik Osborne, senior vice president of partnerships and customer success at Pearson, says it’s been easy to both recruit and keep teachers: roughly 91 percent of teachers in the 2024-2025 school year returned this academic year.

“Teaching in a virtual space is very different than brick-and-mortar; even the type of role teachers play appeals to some teachers,” Osborne says. “They become more of a guide to help the kids understand content.”

Courtney Entsminger, a middle school math teacher at the public, online school Virginia Connections Academy, teaches asynchronously and likes the ability to record her own lesson plans in addition to teaching them live, which she says helps a wider variety of learners. Hamill, who teaches synchronously, similarly likes that the virtual format can be leveraged to build more creative lesson plans, like her Declaration of Independence video, or a fake livestream of George Washington during the Battle of Trenton, both which are on her YouTube channel.

Whether a school is asynchronous or not largely depends on the standard of the provider. Pearson, which runs the Virtual Academies where Entsminger teaches, is asynchronous. For other standalone public school districts, such as Georgia Cyber Academy, the decision comes down to what students need: if they are performing at or above grade level, they get more flexibility, but if they come to the school below grade level — reading at a second grade level, for example, but placed in a fourth grade classroom — they need more structure.

“I do feel like a TikTok star where I record myself teaching through different aspects of that curriculum because students work in different ways,” says Entsminger, who has 348 online students across three grades. “In person you’re able to realize ‘this student works this way,’ and I’ll do a song and dance in front of you. Online, I can do it in different mediums.”

Karen Bacon, a transition liaison at Ohio Virtual Academy who works with middle and high school students in special education, was initially drawn to virtual teaching because of its flexibility for supporting students through a path that works best for them.

“I always like a good challenge and thought this was interesting to dive into how this works and different ways to help students,” says Bacon, who was a high school French teacher before making the switch to virtual in 2017. “There’s obviously a lot to learn and understand, but once you dive in and see all the options, there really are a lot of different possibilities out there.”

Bacon says there are “definitely less distractions,” than in a brick-and-mortar environment, allowing her to get more creative. For example, she had noticed stories crop up across the nation showcasing special education students in physical environments working to serve coffee to teachers and students as a way to learn workplace skills. She, adapting to the virtual environment, created the “Cardinal Cafe,” where students can accomplish the same goals, albeit with a virtual cup of joe.

The “Cardinal Cafe” allows students to “serve” coffee to fellow “customers,” similar to brick-and-mortar setups.

“I don’t really consider myself super tech-y, but I have that curiosity and love going outside the box and looking at ways to really help my students,” she says.

A Way to Curb Teacher Burnout?

The flexibility that comes with teaching in a virtual environment is not just appealing for what it offers students. Teachers say it can also help cushion the consistently lower wages and lack of benefits most educators grapple with, conditions that drive many to leave the field.

“So many of us have said, ‘I felt so burned out, I wasn’t sure I could keep teaching,’” Hamill says, adding she felt similarly at the start of her career as a first grade teacher. “But doing it this way helps it feel sustainable. We’re still underpaid and not appreciated enough as a whole profession, but at least virtually some of the big glaring issues aren’t there in terms of how we’re treated.”

Entsminger was initially drawn to teaching in part because she hoped it would allow her to have more time with her future children than other careers might offer. But as she became a mother while teaching for a decade in a brick-and-mortar environment — both at the elementary school and the high school level — she found she was unable to pick up or drop her daughter off at school, despite working in the same district her daughter attended.

In contrast, while teaching online,“in this environment I’m able to take her to school, make her breakfast,” she says. “I’m able to do life and my job. On the daily, I’m able to be ‘Mom’ and ‘Ms. Entsminger’ with less fighting for my time.”

Because of the more-flexible schedule for students enrolled in virtual learning programs, teachers do not have to be “on” for eight straight hours. And they do not necessarily have to participate in the sorts of shared systems that keep physical schools running. In a brick-and-mortar school, even if Bacon, Hamill or Entsminger were not slated to teach a class, they might be assigned to spend their time walking their students to their next class or the bus stop, or tasked with supervising the cafeteria during a lunch period. But in the virtual environment, they have the ability to close their laptop, and to quietly plan lessons or grade papers.

However, that is not to say these teachers operate as islands. Hamill says one of the largest perks of teaching virtual school is working with other fifth grade teachers across the nation, who often share PowerPoints or other lesson plans, whereas, she says, “I think sometimes in person, people can be a little precious about that.”

The workload varies for teachers in virtual programs. Entsminger’s 300-plus students are enrolled in three grades. Some live as close as her same city, others as far-flung as Europe, where they play soccer. Hamill currently has 28 students, expecting to get to 30 as the school continuously admits more. According to the National Policy Education Center, the average student-teacher ratio in the nation’s public schools was 14.8 students per teacher in 2023, with virtual schools reporting having 24.4 students per teacher.

Hamill also believes that virtual environments keep both teachers and students safer. She says she was sick for nine months of the year her first year teaching, getting strep throat twice. She also points to the seemingly endless onslaught of school shootings and the worsening of behavior issues among children.

“The trade-off for not having to do classroom management of behavioral issues is huge,” she says. “If the kid is mean in the chat, I turn off the chat. If kids aren’t listening, I can mute everyone and say, ‘I’ll let you talk one at a time.’ Versus, in my last classroom, the kids threw chairs at me.”

There are still adjustments to managing kids remotely, the teachers acknowledge. Hamill coaches her kids through internet safety and online decorum, like learning that typing in all-caps, for example, can come across rudely.

And while the virtual teachers were initially concerned about bonding with their students, they have found those worries largely unfounded. During online office hours, Hamill plays Pictionary with her students and has met most of their pets over a screen. Meanwhile, Entsminger offers online tutoring and daily opportunities to meet, where she has “learned more than I ever thought about K-pop this year.”

There are also opportunities for in-person gatherings with students. Hamill does once-a-month meetups, often in a park. Bacon attended an in-person picnic earlier this month to meet the students who live near her. And both K12 and Connections Academy hold multiple in-person events for students, including field trips and extracurriculars, like sewing or bowling clubs.

“Of course I wish I could see them more in person, and do arts and crafts time — that’s a big thing I miss,” Hamill says. “But we have drawing programs or ways they can post their artwork; we find ways to adapt to it.”

And that adaptation is largely worth it to virtual teachers.

“Teaching is teaching; even if I’m behind a computer screen, kids are still going to be kids,” Entsminger says. “The hurdles are still there. We’re still working hard, but it’s really nice to work with my students, and then walk to my kitchen to get coffee, then come back to connect to my students again.”

Source link

Books, Courses & Certifications

Schedule topology-aware workloads using Amazon SageMaker HyperPod task governance

Published

19 hours ago

September 15, 2025

Nisha Nadkarni

Today, we are excited to announce a new capability of Amazon SageMaker HyperPod task governance to help you optimize training efficiency and network latency of your AI workloads. SageMaker HyperPod task governance streamlines resource allocation and facilitates efficient compute resource utilization across teams and projects on Amazon Elastic Kubernetes Service (Amazon EKS) clusters. Administrators can govern accelerated compute allocation and enforce task priority policies, improving resource utilization. This helps organizations focus on accelerating generative AI innovation and reducing time to market, rather than coordinating resource allocation and replanning tasks. Refer to Best practices for Amazon SageMaker HyperPod task governance for more information.

Generative AI workloads typically demand extensive network communication across Amazon Elastic Compute Cloud (Amazon EC2) instances, where network bandwidth impacts both workload runtime and processing latency. The network latency of these communications depends on the physical placement of instances within a data center’s hierarchical infrastructure. Data centers can be organized into nested organizational units such as network nodes and node sets, with multiple instances per network node and multiple network nodes per node set. For example, instances within the same organizational unit experience faster processing time compared to those across different units. This means fewer network hops between instances results in lower communication.

To optimize the placement of your generative AI workloads in your SageMaker HyperPod clusters by considering the physical and logical arrangement of resources, you can use EC2 network topology information during your job submissions. An EC2 instance’s topology is described by a set of nodes, with one node in each layer of the network. Refer to How Amazon EC2 instance topology works for details on how EC2 topology is arranged. Network topology labels offer the following key benefits:

Reduced latency by minimizing network hops and routing traffic to nearby instances
Improved training efficiency by optimizing workload placement across network resources

With topology-aware scheduling for SageMaker HyperPod task governance, you can use topology network labels to schedule your jobs with optimized network communication, thereby improving task efficiency and resource utilization for your AI workloads.

In this post, we introduce topology-aware scheduling with SageMaker HyperPod task governance by submitting jobs that represent hierarchical network information. We provide details about how to use SageMaker HyperPod task governance to optimize your job efficiency.

Solution overview

Data scientists interact with SageMaker HyperPod clusters. Data scientists are responsible for the training, fine-tuning, and deployment of models on accelerated compute instances. It’s important to make sure data scientists have the necessary capacity and permissions when interacting with clusters of GPUs.

To implement topology-aware scheduling, you first confirm the topology information for all nodes in your cluster, then run a script that tells you which instances are on the same network nodes, and finally schedule a topology-aware training task on your cluster. This workflow facilitates higher visibility and control over the placement of your training instances.

In this post, we walk through viewing node topology information and submitting topology-aware tasks to your cluster. For reference, NetworkNodes describes the network node set of an instance. In each network node set, three layers comprise the hierarchical view of the topology for each instance. Instances that are closest to each other will share the same layer 3 network node. If there are no common network nodes in the bottom layer (layer 3), then see if there is commonality at layer 2.

Prerequisites

To get started with topology-aware scheduling, you must have the following prerequisites:

An EKS cluster
A SageMaker HyperPod cluster with instances enabled for topology information
The SageMaker HyperPod task governance add-on installed (version 1.2.2 or later)
Kubectl installed
(Optional) The SageMaker HyperPod CLI installed

Get node topology information

Run the following command to show node labels in your cluster. This command provides network topology information for each instance.

kubectl get nodes -L topology.k8s.aws/network-node-layer-1
kubectl get nodes -L topology.k8s.aws/network-node-layer-2
kubectl get nodes -L topology.k8s.aws/network-node-layer-3

Instances with the same network node layer 3 are as close as possible, following EC2 topology hierarchy. You should see a list of node labels that look like the following:topology.k8s.aws/network-node-layer-3: nn-33333exampleRun the following script to show the nodes in your cluster that are on the same layers 1, 2, and 3 network nodes:

git clone https://github.com/aws-samples/awsome-distributed-training.git
cd awsome-distributed-training/1.architectures/7.sagemaker-hyperpod-eks/task-governance 
chmod +x visualize_topology.sh
bash visualize_topology.sh

The output of this script will print a flow chart that you can use in a flow diagram editor such as Mermaid.js.org to visualize the node topology of your cluster. The following figure is an example of the cluster topology for a seven-instance cluster.

Submit tasks

SageMaker HyperPod task governance offers two ways to submit tasks using topology awareness. In this section, we discuss these two options and a third alternative option to task governance.

Modify your Kubernetes manifest file

First, you can modify your existing Kubernetes manifest file to include one of two annotation options:

kueue.x-k8s.io/podset-required-topology – Use this option if you must have all pods scheduled on nodes on the same network node layer in order to begin the job
kueue.x-k8s.io/podset-preferred-topology – Use this option if you ideally want all pods scheduled on nodes in the same network node layer, but you have flexibility

The following code is an example of a sample job that uses the kueue.x-k8s.io/podset-required-topology setting to schedule pods that share the same layer 3 network node:

apiVersion: batch/v1
kind: Job
metadata:
  name: test-tas-job
  namespace: hyperpod-ns-team-a
  labels:
    kueue.x-k8s.io/queue-name: hyperpod-ns-team-a-localqueue
    kueue.x-k8s.io/priority-class: inference-priority
spec:
  parallelism: 10
  completions: 10
  suspend: true
  template:
    metadata:
      labels:
        kueue.x-k8s.io/queue-name: hyperpod-ns-team-a-localqueue
      annotations:
        kueue.x-k8s.io/podset-required-topology: "topology.k8s.aws/network-node-layer-3"
    spec:
      containers:
        - name: dummy-job
          image: public.ecr.aws/docker/library/alpine:latest
          command: ["sleep", "3600s"]
          resources:
            requests:
              cpu: "1"
      restartPolicy: Never

To verify which nodes your pods are running on, use the following command to view node IDs per pod:kubectl get pods -n hyperpod-ns-team-a -o wide

Use the SageMaker HyperPod CLI

The second way to submit a job is through the SageMaker HyperPod CLI. Be sure to install the latest version (version pending) to use topology-aware scheduling. To use topology-aware scheduling with the SageMaker HyperPod CLI, you can include either the --preferred-topology parameter or the --required-topology parameter in your create job command.

The following code is an example command to start a topology-aware mnist training job using the SageMaker HyperPod CLI, replace XXXXXXXXXXXX with your AWS account ID:

hyp create hyp-pytorch-job \
--job-name test-pytorch-job-cli \
--image XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/ptjob:mnist \
--pull-policy "Always" \
--tasks-per-node 1 \
--max-retry 1 \
--preferred-topology topology.k8s.aws/network-node-layer-3

Clean up

If you deployed new resources while following this post, refer to the Clean Up section in the SageMaker HyperPod EKS workshop to make sure you don’t accrue unwanted charges.

Conclusion

During large language model (LLM) training, pod-to-pod communication distributes the model across multiple instances, requiring frequent data exchange between these instances. In this post, we discussed how SageMaker HyperPod task governance helps schedule workloads to enable job efficiency by optimizing throughput and latency. We also walked through how to schedule jobs using SageMaker HyperPod topology network information to optimize network communication latency for your AI tasks.

We encourage you to try out this solution and share your feedback in the comments section.

About the authors

Nisha Nadkarni is a Senior GenAI Specialist Solutions Architect at AWS, where she guides companies through best practices when deploying large scale distributed training and inference on AWS. Prior to her current role, she spent several years at AWS focused on helping emerging GenAI startups develop models from ideation to production.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.

Zican Li is a Senior Software Engineer at Amazon Web Services (AWS), where he leads software development for Task Governance on SageMaker HyperPod. In his role, he focuses on empowering customers with advanced AI capabilities while fostering an environment that maximizes engineering team efficiency and productivity.

Anoop Saha is a Sr GTM Specialist at Amazon Web Services (AWS) focusing on generative AI model training and inference. He partners with top frontier model builders, strategic customers, and AWS service teams to enable distributed training and inference at scale on AWS and lead joint GTM motions. Before AWS, Anoop held several leadership roles at startups and large corporations, primarily focusing on silicon and system architecture of AI infrastructure.

Source link

Books, Courses & Certifications

How msg enhanced HR workforce transformation with Amazon Bedrock and msg.ProfileMap

Published

19 hours ago

September 15, 2025

Stefan Walter

This post is co-written with Stefan Walter from msg.

With more than 10,000 experts in 34 countries, msg is both an independent software vendor and a system integrator operating in highly regulated industries, with over 40 years of domain-specific expertise. msg.ProfileMap is a software as a service (SaaS) solution for skill and competency management. It’s an AWS Partner qualified software available on AWS Marketplace, currently serving more than 7,500 users. HR and strategy departments use msg.ProfileMap for project staffing and workforce transformation initiatives. By offering a centralized view of skills and competencies, msg.ProfileMap helps organizations map their workforce’s capabilities, identify skill gaps, and implement targeted development strategies. This supports more effective project execution, better alignment of talent to roles, and long-term workforce planning.

In this post, we share how msg automated data harmonization for msg.ProfileMap, using Amazon Bedrock to power its large language model (LLM)-driven data enrichment workflows, resulting in higher accuracy in HR concept matching, reduced manual workload, and improved alignment with compliance requirements under the EU AI Act and GDPR.

The importance of AI-based data harmonization

HR departments face increasing pressure to operate as data-driven organizations, but are often constrained by the inconsistent, fragmented nature of their data. Critical HR documents are unstructured, and legacy systems use mismatched formats and data models. This not only impairs data quality but also leads to inefficiencies and decision-making blind spots.Accurate and harmonized HR data is foundational for key activities such as matching candidates to roles, identifying internal mobility opportunities, conducting skills gap analysis, and planning workforce development. msg identified that without automated, scalable methods to process and unify this data, organizations would continue to struggle with manual overhead and inconsistent results.

Solution overview

HR data is typically scattered across diverse sources and formats, ranging from relational databases to Excel files, Word documents, and PDFs. Additionally, entities such as personnel numbers or competencies have different unique identifiers as well as different text descriptions, although with the same semantics. msg addressed this challenge with a modular architecture, tailored for IT workforce scenarios. As illustrated in the following diagram, at the core of msg.ProfileMap is a robust text extraction layer, which transforms heterogeneous inputs into structured data. This is then passed to an AI-powered harmonization engine that provides consistency across data sources by avoiding duplication and aligning disparate concepts.

The harmonization process uses a hybrid retrieval approach that combines vector-based semantic similarity and string-based matching techniques. These methods align incoming data with existing entities in the system. Amazon Bedrock is used to semantically enrich data, improving cross-source compatibility and matching precision. Extracted and enriched data is indexed and stored using Amazon OpenSearch Service and Amazon DynamoDB, facilitating fast and accurate retrieval, as shown in the following diagram.

The framework is designed to be unsupervised and domain independent. Although it’s optimized for IT workforce use cases, it has demonstrated strong generalization capabilities in other domains as well.

msg.ProfileMap is a cloud-based application that uses several AWS services, notably Amazon Neptune, Amazon DynamoDB, and Amazon Bedrock. The following diagram illustrates the full solution architecture.

Results and technical validation

msg evaluated the effectiveness of the data harmonization framework through internal testing on IT workforce concepts and external benchmarking in the Bio-ML Track of the Ontology Alignment Evaluation Initiative (OAEI), an international and EU-funded research initiative that evaluates ontology matching technologies since 2004.

During internal testing, the system processed 2,248 concepts across multiple suggestion types. High-probability merge recommendations reached 95.5% accuracy, covering nearly 60% of all inputs. This helped msg reduce manual validation workload by over 70%, significantly improving time-to-value for HR teams.

During OAEI 2024, msg.ProfileMap ranked at the top of the 2024 Bio-ML benchmark, outperforming other systems across multiple biomedical datasets. On NCIT-DOID, it achieved a 0.918 F1 score, with Hits@1 exceeding 92%, validating the engine’s generalizability beyond the HR domain. Additional details are available in the official test results.

Why Amazon Bedrock

msg relies on LLMs to semantically enrich data in near real time. These workloads require low-latency inference, flexible scaling, and operational simplicity. Amazon Bedrock met these needs by providing a fully managed, serverless interface to leading foundation models—without the need to manage infrastructure or deploy custom machine learning stacks.

Unlike hosting models on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon SageMaker, Amazon Bedrock abstracts away provisioning, versioning, scaling, and model selection. Its consumption-based pricing aligns directly with msg’s SaaS delivery model—resources are used (and billed) only when needed. This simplified integration reduced overhead and helped msg scale elastically as customer demand grew.

Amazon Bedrock also helped msg meet compliance goals under the EU AI Act and GDPR by enabling tightly scoped, auditable interactions with model APIs—critical for HR use cases that handle sensitive workforce data.

Conclusion

msg’s successful integration of Amazon Bedrock into msg.ProfileMap demonstrates that large-scale AI adoption doesn’t require complex infrastructure or specialized model training. By combining modular design, ontology-based harmonization, and the fully managed LLM capabilities of Amazon Bedrock, msg delivered an AI-powered workforce intelligence platform that is accurate, scalable, and compliant.This solution improved concept match precision and achieved top marks in international AI benchmarks, demonstrating what’s possible when generative AI is paired with the right cloud-based service. With Amazon Bedrock, msg has built a platform that’s ready for today’s HR challenges—and tomorrow’s.

msg.ProfileMap is available as a SaaS offering on AWS Marketplace. If you are interested in knowing more, you can reach out to msg.hcm.backoffice@msg.group.

The content and opinions in this blog post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the authors

Stefan Walter is Senior Vice President of AI SaaS Solutions at msg. With over 25 years of experience in IT software development, architecture, and consulting, Stefan Walter leads with a vision for scalable SaaS innovation and operational excellence. As a BU lead at msg, Stefan has spearheaded transformative initiatives that bridge business strategy with technology execution, especially in complex, multi-entity environments.

Gianluca Vegetti is a Senior Enterprise Architect in the AWS Partner Organization, aligned to Strategic Partnership Collaboration and Governance (SPCG) engagements. In his role, he supports the definition and execution of Strategic Collaboration Agreements with selected AWS partners.

Yuriy Bezsonov is a Senior Partner Solution Architect at AWS. With over 25 years in the tech, Yuriy has progressed from a software developer to an engineering manager and Solutions Architect. Now, as a Senior Solutions Architect at AWS, he assists partners and customers in developing cloud solutions, focusing on container technologies, Kubernetes, Java, application modernization, SaaS, developer experience, and GenAI. Yuriy holds AWS and Kubernetes certifications, and he is a recipient of the AWS Golden Jacket and the CNCF Kubestronaut Blue Jacket.