Events & Conferences

The science behind an Amazon Echo feature that helped save a puppy

Published

4 years ago

May 17, 2021

Jonathan and Kathy, an Orlando-based couple, were out visiting a neighbor a few days before Christmas 2020 when Jonathan got an unusual alert from their Amazon Echo. Using the Alexa App, they dropped in on their Echo device, which allowed them to hear what was happening in their home in real time.

“You could hear things crackling and popping, and the smoke alarm was going off like crazy,” Jonathan told About Amazon. He then rushed home. “Upon rolling into the neighborhood, it was very smoky,” Jonathan said. “I pulled up into the driveway, opened the garage, and smoke just started billowing out. I went into our house, and more black smoke poured out. It was so thick you couldn’t see six inches in front of your face. The only thing I could think of was Cooper.”

How an Amazon Echo alert helped save Cooper the dog

Jonathan managed to get Cooper, the couple’s French bulldog, from his pen as smoke billowed from the house. The fire department was also able to extinguish the fire and minimize damage. However, neither outcome may have occurred if it weren’t for a Smart Alert mobile notification from Alexa.

The feature that alerted Jonathan is called Alexa Guard, a smart-home capability that relies on acoustic event detection (AED). AED is an emerging field that focuses on training models to detect and process sounds.

“The technology behind Alexa Guard was developed in an effort to augment the utility of Echo devices,” said Angel Calvo, director of software for Alexa Smart Home team.

How Guard works

When set to away mode, Guard is trained to identify sounds related to home security and safety events, like a smoke alarm sounding, and to distinguish those sounds from something more prosaic, like a microwave beeping.

I am so glad this couple and their pet are OK – We built #AlexaGuard with this customer use case in mind, so learning that we helped with Guard to save this puppy from a fire, emphasis why I love my job… kudos to the Alexa Guard team! https://t.co/rX48tbCNko

— Angel Calvo (@ANGELCALVOS) January 6, 2021

The detection service relies on two models applied in a two-step system, one on the device, another in the cloud.

The first step utilizes a recurrent neural network — a type of deep learning model that uses sequential data or time series data to learn — on the Echo device itself. The on-device detection works by converting the audio input into features that feed into a recurrent neural network (RNN).

The device uses long short-term memory (LSTM) — a type of recurrent neural network that has shown a significant improvement in speech recognition and has high accuracy, “particularly when it’s applied to sequential data,” said Ming Sun, applied science manager for AED. This is particularly important for determining when a specific sound occurred.

The Echo must also occasionally be able to distinguish between multiple sounds at once. Layered over the RNN is a multi-task learning framework that is trained to detect multiple events. These multiple output layers work like branches off the base neural network, each trained to recognize a different event in the captured audio.

This helps Echo devices detect multiple concurrent incidents (those which customers have selected for detection) such as footsteps and glass breaking, for example.

Layering multiple output layers over a single neural network also makes the detection system in Echo devices very scalable; the device can be trained to recognize new sounds with minimal additions.

“Without this design, we would need to update the whole model every time we update one existing sound event or add a new sound event,” Sun said. “Now, we only have to update the output layer for a target existing event, or add a new output layer for a new event.”

When one of the sounds a customer has selected for detection triggers Guard on the Echo device, that audio is then sent to the cloud for the second verification step to confirm the on-device detection. The cloud runs a much more powerful recognition system to filter out false triggers that might be linked to ambient noise around the home, Sun said.

If the validation process confirms the sound is the one that the device is actively monitoring for, the customer gets a notification in their Alexa app along with an audio clip of the detection.

Getting creative to teach Guard sounds

Because home security events are relatively rare — and the data sets for these audio events are quite meager — semi-supervised learning and self-supervised learning have been critical as Sun’s team expands and refines Guard’s capabilities.

“Semi-supervised learning relies on small sets of annotated training data to leverage larger sets of unannotated data,” Sun said. “While self-supervised learning utilizes larger sets of unannotated data with training targets derived from data itself in an unsupervised way — no human annotations.

“Another technique is to detect for a longer time and aggregate events to be more accurate,” Sun said. To improve the accuracy of sounds with repeating patterns, the detectors look for shorter repeating patterns, such as an appliance beeping. This allows Guard to distinguish between that type of repetitive beeping and an alarm, which can run for 30 seconds or longer. Guard can also detect the difference between a smoke alarm and a carbon monoxide alarm, and notify customers of a specific risk.

Since the very beginning, it’s been critical to build accurate models that consume less resources. We apply lots of optimization so that this system can be as small and efficient as possible.

Guard Plus, a subscription service launched in January, detects sounds that could be an intruder — like footsteps, a door closing, or glass breaking — and can send a Smart Alert mobile notification or plays a siren on the Echo device. Alexa can also notify customers about the sound of smoke alarms or carbon monoxide alarms. Because the ambient sounds in places like dense urban environments or apartment complexes can make this tricky, the team added a feature allowing customers to adjust the sensitivity to accommodate the noise in their home environments.

The limited annotated data the Guard team had access to has also required them to get creative. Glass breaking, for example, is a rare sound, it’s over in two to three seconds, and it varies based on the type of glass. To bolster their data set, the Guard team rented a warehouse and contracted a construction crew to break hundreds of windows: single pane, double pane, different compositions. This allowed the team to build an authentic data set to build the initial model — also called a seed model — before deploying Guard to beta testers.

All of the strategies Sun’s team employed to optimize the recognition system on Echo devices have minimized the error rate.

This is where the powerful AED models in the cloud — Guard’s second validation step — are so essential. The chances of false alarm are much smaller when audio is processed through both local and cloud systems, Sun said. And, he emphasized, audio is sent to the cloud only after running it through a device-side model to protect privacy.

“Since the very beginning, it’s been critical to build accurate models that consume less resources,” Sun said. “We apply lots of optimization so that this system can be as small and efficient as possible.”

Edge devices like Echo only send data to the cloud when it’s essential. In the case of Guard, that means the majority of the audio data is processed and discarded by the neural network on the device. Only potential triggers make it to the cloud. For those events, customers are able to view, listen, and delete the audio that Guard detects directly from their Guard History in the Alexa app, or from the Alexa Privacy Settings page.

Source link

Related Topics:Acoustic-event detection Recurrent neural networks (RNNs)Semi-supervised learning

Up Next

3 questions with Ryan Tibshirani: The science behind COVIDcast and pandemic tracking

Don't Miss

Amazon Alexa scientist Yang Liu named an ISCA Fellow

Staff writer

Click to comment

Events & Conferences

A New Ranking Framework for Better Notification Quality on Instagram

Published

1 day ago

September 2, 2025

Xian Sun

We’re sharing how Meta is applying machine learning (ML) and diversity algorithms to improve notification quality and user experience.
We’ve introduced a diversity-aware notification ranking framework to reduce uniformity and deliver a more varied and engaging mix of notifications.
This new framework reduces the volume of notifications and drives higher engagement rates through more diverse outreach.

Notifications are one of the most powerful tools for bringing people back to Instagram and enhancing engagement. Whether it’s a friend liking your photo, another close friend posting a story, or a suggestion for a reel you might enjoy, notifications help surface moments that matter in real time.

Instagram leverages machine learning (ML) models to decide who should get a notification, when to send it, and what content to include. These models are trained to optimize for user positive engagement such as click-through-rate (CTR) – the probability of a user clicking a notification – as well as other metrics like time spent.

However, while engagement-optimized models are effective at driving interactions, there’s a risk that they might overprioritize the product types and authors someone has previously engaged with. This can lead to overexposure to the same creators or the same product types while overlooking other valuable and diverse experiences.

This means people could miss out on content that would give them a more balanced, satisfying, and enriched experience. Over time, this can make notifications feel spammy and increase the likelihood that people will disable them altogether.

The real challenge lies in finding the right balance: How can we introduce meaningful diversity into the notification experience without sacrificing the personalization and relevance people on Instagram have come to expect?

To tackle this, we’ve introduced a diversity-aware notification ranking framework that helps deliver more diverse, better curated, and less repetitive notifications. This framework has significantly reduced daily notification volume while improving CTR. It also introduces several benefits:

The extensibility of incorporating customized soft penalty (demotion) logic for each dimension, enabling more adaptive and sophisticated diversity strategies.
The flexibility of tuning demotion strength across dimensions like content, author, and product type via adjustable weights.
The integration of balancing personalization and diversity, ensuring notifications remain both relevant and varied.

The Risks of Notifications without Diversity

The issue of overexposure in notifications often shows up in two major ways:

Overexposure to the same author: People might receive notifications that are mostly about the same friend. For example, if someone often interacts with content from a particular friend, the system may continue surfacing notifications from that person alone – ignoring other friends they also engage with. This can feel repetitive and one-dimensional, reducing the overall value of notifications.

Overexposure to the same product surface: People might mostly receive notifications from the same product surface such as Stories, even when Feed or Reels could provide value. For example, someone may be interested in both reel and story notifications but has recently interacted more often with stories. Because the system heavily prioritizes past engagement, it sends only story notifications, overlooking the person’s broader interests.

Introducing Instagram’s Diversity-Aware Notification Ranking Framework

Instagram’s diversity-aware notification ranking framework is designed to enhance the notification experience by balancing the predicted potential for user engagement with the need for content diversity. This framework introduces a diversity layer on top of the existing engagement ML models, applying multiplicative penalties to the candidate scores generated by these models, as figure1, below, shows.

The diversity layer evaluates each notification candidate’s similarity to recently sent notifications across multiple dimensions such as content, author, notification type, and product surface. It then applies carefully calibrated penalties—expressed as multiplicative demotion factors—to downrank candidates that are too similar or repetitive. The adjusted scores are used to re-rank the candidates, enabling the system to select notifications that maintain high engagement potential while introducing meaningful diversity. In the end, the quality bar selects the top-ranked candidate that passes both the ranking and diversity criteria.

Figure.1: Instagram’s diversity-aware ranking framework where the diversity layer sits on top of the existing modeling layer and penalizes notifications that are too similar to recently sent ones.

Mathematical Formulation

Within the diversity layer, we apply a multiplicative demotion factor to the base relevance score of each candidate. Given a notification candidate 𝑐, we compute its final score as the product of its base ranking score and a diversity demotion multiplier:

$\text{Score}(c) = R(c) \times D(c)$

where R(c) represents the candidate’s base relevance score, and D(c) ∈ [0,1] is a penalty factor that reduces the score based on similarity to recently sent notifications. We define a set of semantic dimensions (e.g., author, product type) along which we want to promote diversity. For each dimension i, we compute a similarity signal p_i(c) between candidate c and the set of historical notifications H, using a maximal marginal relevance (MMR) approach:

$p_i(c) = \mathrm{max}_{h \in H}\mathrm{sim}_i(c, h)$

where sim_i(·,·) is a predefined similarity function for dimension i. In our baseline implementation, p_i(c) is binary: it equals 1 if the similarity exceeds a threshold 𝜏_i and 0 otherwise.

The final demotion multiplier is defined as:

$D(c) = \prod_{i=1}^{m} \left( 1 - w_i \cdot p_i(c) \right)$

where each w_i∈ [0,1] controls the strength of demotion for its respective dimension. This formulation ensures that candidates similar to previously delivered notifications along one or more dimensions are proportionally down-weighted, reducing redundancy and promoting content variation. The use of a multiplicative penalty allows for flexible control across multiple dimensions, while still preserving high-relevance candidates.

The Future of Diversity-Aware Ranking

As we continue evolving our notification diversity-aware ranking system, a next step is to introduce more adaptive, dynamic demotion strategies. Instead of relying on static rules, we plan to make demotion strength responsive to notification volume and delivery timing. For example, as a user receives more notifications—especially of similar type or in rapid succession—the system progressively applies stronger penalties to new notification candidates, effectively mitigating overwhelming experiences caused by high notification volume or tightly spaced deliveries.

Longer term, we see an opportunity to bring large language models (LLMs) into the diversity pipeline. LLMs can help us go beyond surface-level rules by understanding semantic similarity between messages and rephrasing content in more varied, user-friendly ways. This would allow us to personalize notification experiences with richer language and improved relevance while maintaining diversity across topics, tone, and timing.

Source link

Events & Conferences

Simplifying book discovery with ML-powered visual autocomplete suggestions

Published

1 day ago

September 2, 2025

Mao Sheng Liu

Every day, millions of customers search for books in various formats (audiobooks, e-books, and physical books) across Amazon and Audible. Traditional keyword autocomplete suggestions, while helpful, usually require several steps before customers find their desired content. Audible took on the challenge of making book discovery more intuitive and personalized while reducing the number of steps to purchase.

We developed an instant visual autocomplete system that enhances the search experience across Amazon and Audible. As the user begins typing a query, our solution provides visual previews with book covers, enabling direct navigation to relevant landing pages instead of the search result page. It also delivers real-time personalized format recommendations and incorporates multiple searchable entities, such as book pages, author pages, and series pages.

1 of 2

Audible’s visual-autocomplete experience.

2 of 2

Amazon’s visual-autocomplete experience.

Our system needed to understand user intent from just a few keystrokes and determine the most relevant books to display, all while maintaining low latency for millions of queries. Using historical search data, we match keystrokes to products, transforming partial inputs into meaningful search suggestions. To ensure quality, we implemented confidence-based filtering mechanisms, which are particularly important for distinguishing between general queries like “mystery” and specific title searches. To reflect customers’ most recent interests, the system applies time-decay functions to long historical user interaction data.

Events & Conferences

Revolutionizing warehouse automation with scientific simulation

Published

1 week ago

August 26, 2025

Deniz Akyildiz

Modern warehouses rely on complex networks of sensors to enable safe and efficient operations. These sensors must detect everything from packages and containers to robots and vehicles, often in changing environments with varying lighting conditions. More important for Amazon, we need to be able to detect barcodes in an efficient way.