Events & Conferences

Paper on graph database schemata wins best-industry-paper award

Published

2 years ago

June 20, 2023

Where a standard relational database stores data in linked tables, graph databases store data in graphs, where the edges represent relationships between data items. Graph databases are popular with customers for use cases like single-customer view, fraud detection, recommendations, and security, where you need to create relationships between data and quickly navigate these connections. Amazon Neptune is AWS’s graph database service, which is designed for scalability and availability and allows our customers to query billions of relationships in milliseconds.

Labeled-property graphs

The labeled-property-graph (LPG) data model is a prominent choice for building graph applications. LPGs build upon three primitives to model graph-shaped data: nodes, edges, and properties. The figure below represents an excerpt from a labeled property graph in a financial-fraud scenario. Nodes are represented as green circles, edges are represented as directed arrows connecting nodes, and properties are enclosed in orange boxes.

The node with identifier 1, for instance, is labeled Customer and carries two properties, specifying the name with string value “Jane Doe” and a customerId. Both node 1 and 2 two are connected to node 3, which represents a shared account with a fixed iban number; the two edges are marked with the label Owns, which specifies the nature of the relationship. Just like vertices, edges can carry properties. In this example, the property since specifies 2021-03-05 as the start date of ownership.

Sample graph representing two customers that own a shared account.

Relational vs. graph schema

One property that differentiates graph databases from, for instance, relational databases — where the schema needs to be defined upfront and is often hard to change — is that graph databases do not require explicit schema definitions. To illustrate the difference, compare the graph data model from the figure above to a comparable relational-database schema, shown below, with the primary-key attributes underlined.

A possible relational-database model for the scenario above.

Schema-level information of the relational model — tables and attribute names — are represented as part of the data itself in graphs. Said otherwise, by inserting or changing graph elements such as node labels, edge labels, and property names, one can extend or change the schema implicitly, without having to run (oftentimes tedious) schema manipulations such as ALTER TABLE commands.

Related content

Prioritizing predictability over efficiency, adapting data partitioning to traffic, and continuous verification are a few of the principles that help ensure stability, availability, and efficiency.

As an example, in a graph database one can simply add an edge with the previously unseen label Knows to connect the two nodes representing Jane Doe and John Doe or introduce nodes with new labels (such as FinancialTransaction) at any time. Such extensions would require table manipulations in our relational sample schema.

The absence of an explicit schema is a key differentiator that lowers the burden of getting started with data modeling and application building in graphs: following a pay-as-you-go paradigm, graph application developers who build new applications can start out with a small portion of the data and insert new node types, properties, and interconnecting edges as their applications evolve, without having to maintain explicit schemata.

Schemata evolution

While this contributes to the initial velocity of building graph applications, what we often see is that — throughout the life cycle of graph applications — it becomes desirable to shift from implicit to explicit schemata. Once the database has been seeded with an initial (and typically yet-to-be-refined) version of the graph data, there is a demand for what we call flexible-schema support.

Evolution of schema requirements throughout the graph application life cycle.

In that stage, the schema primarily plays a descriptive role: knowing the most important node/edge labels and their properties tells application developers what to expect in the data and guides them in writing queries. As the application life cycle progresses, the graph data model stabilizes, and developers may benefit from a more rigorous, prescriptive schema approach that strongly asserts shapes and logical invariants in the graph.

PG-Schema

Motivated by these requirements, our SIGMOD publication proposes a data definition language (DDL) called PG-Schema, which aims to expose the full breadth of schema flexibility to users. The figure below shows a visual representation of such a graph schema, as well as the corresponding syntactical representation, as it could be provided by a data architect or application developer to formally define the schema of our fraud graph example.

Schema for the graph data from the graph database above (left: graphical representation; right: corresponding data definition language).

In this example, the overall schema is composed of the six elements enclosed in the top-level GRAPH TYPE definition:

The first three lines of the GRAPH TYPE definition introduce so-called node types: person, customer, and account; they describe structural constraints on the nodes in the graph data. The customer node type, for instance, tells us that there can be nodes with label Customer, which carry a property customerId and are derived from a more general person node type. Concretely, this means that nodes with the label Customer inherit the properties name and birthDate defined in node type person. Note that properties also specify a data type (such as string, date, or numerical values) and may be marked as optional.

Edge types build upon node types and specify the type and structure of edges that connect nodes. Our example defines a single edge type connecting nodes of node type customer with nodes of type account. Informally speaking, this tells us that Customer-labeled nodes in our data graph can be connected to Account-labeled nodes via an edge labeled Owns, which is annotated with a property since, pointing to a date value.

The last two lines specify additional constraints that go beyond the mere structure of our graph. The KEY constraint demands that the value of the iban property uniquely identifies an account, i.e., no two Account-labeled nodes can share the same IBAN number. This can be thought of as the equivalent of primary keys in relational databases, which enforce the uniqueness of one or more attributes within the scope of a given table. The second constraint enforces that every account has at least one owner, which is reminiscent of a foreign-key constraint in relational databases.

Also note the keyword STRICT in the graph type definition: it enforces that all elements in the graph obey one of the types defined in the graph type body, and that all constraints are satisfied. Concretely, it implies that our graph can contain onlyPerson-, Customer-, and Account-labeled nodes with the respective sets of properties that the only possible edge type is between customers and accounts with label Owns and that the key and foreign constraints must be satisfied. Hence, the STRICT keyword can be understood as a mechanism to implement the schema-first paradigm, as it is maximally prescriptive and strongly constrains the graph structure.

Related content

Optimizing placement of configuration data ensures that it’s available and consistent during “network partitions”.

To account for flexible- and partial-schema use cases, PG-Schema offers a LOOSE keyword as an alternative to STRICT, which comes with a more relaxed interpretation: graph types that are defined as LOOSE allow for node and edge types that are not explicitly listed in the graph type definition. Mechanisms similar to STRICT vs. LOOSE keywords at graph type level can be found at different levels of the language.

For instance, keywords such as OPEN (vs. the implicit default, CLOSED) can be used to either partially or fully specify the set of properties that can be carried by vertices with a given vertex label (e.g., expressing that a Person-labeled node must have a name but may have an arbitrary set of other (unknown) properties, without requiring enumeration of the entire set). The flexibility arising from these mechanisms makes it easy to define partial schemata that can be adjusted and refined incrementally, to capture the schema evolution requirements sketched above.

Not only does PG-Schema provide a concrete proposal for a graph schema and constraint language, but it also aims to raise awareness of the importance of a standardized approach to graph schemata. The concepts and ideas in the paper were codeveloped by major companies and academics in the graph space, and there are ongoing initiatives within the LDBC that aim toward a standardization of these concepts.

In particular, the LDBC has close ties with the ISO committee that is currently in the process of standardizing a new graph query language (GQL). As some GQL ISO committee members are coauthors of the PG-Schema paper, there has been a continuous bilateral exchange, and it is anticipated that future versions of the GQL standard will include a rich DDL, which may pick up concepts and ideas presented in the paper.

Source link

Events & Conferences

Read Meta’s 2025 Sustainability Report

Published

2 days ago

September 12, 2025

Meta Sustainability

The post Read Meta’s 2025 Sustainability Report appeared first on Engineering at Meta.

Source link

Events & Conferences

Scientific frontiers of agentic AI

Published

3 days ago

September 11, 2025

Michael Kearns

It feels as though we’ve barely absorbed the rapid development and adoption of generative AI technologies such as large language models (LLMs) before the next phenomenon is already upon us, namely agentic AI. Standalone LLMs can be thought of as “chatbots in a sandbox”, the sandbox being a metaphor for a safe and contained play space with limited interaction with the world beyond. In contrast, the vision of agentic AI is a near (or already here?) future in which LLMs are the underlying engines for complex systems that have access to rich external resources such as consumer apps and services, social media, banking and payment systems — in principle, anything you can reach on the Internet. A dream of the AI industry for decades, the “agent” of agentic AI is an intelligent personal assistant that knows your goals and preferences and that you trust to act on your behalf in the real world, much as you might a human assistant.

What language will agents speak?

The history of computing technology features a steady march toward systems and devices that are ever more friendly, accessible, and intuitive to human users. Examples include the gradual displacement of clunky teletype monitors and obscure command-line incantations by graphical user interfaces with desktop and folder metaphors, and the evolution from low-level networked file transfer protocols to the seamless ease of the web. And generative AI itself has also made previously specialized tasks like coding accessible to a much broader base of users. In other words, modern technology is human-centric, designed for use and consumption by ordinary people with little or no specialized training.

But now these same technologies and systems will also need to be navigated by agentic AI, and as adept as LLMs are with human language, it may not be their most natural mode of communication and understanding. Thus, a parallel migration to the native language of generative AI may be coming.

What is that native language? When generative AI consumes a piece of content — whether it be a user prompt, a document, or an image — it translates it into an internal representation that is more convenient for subsequent processing and manipulation. There are many examples in biology of such internal representations. For instance, in our own visual systems, it has been known for some time that certain types of inputs (such as facial images) cause specific cells in our brains to respond (a phenomenon known as neuronal selectivity). Thus, an entire category of important images elicits similar neural behaviors.

Related content

Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

In a similar vein, the neural networks underlying modern AI typically translate any input into what is known as an embedding space, which can be thought of as a physical map in which items with similar meanings are placed near each other, and those with unrelated meanings are placed far apart. For example, in an image-embedding space, two photos of different families would be nearer to each other than either would be to a landscape. In a language-embedding space, two romance novels would be nearer to each other than to a car owner’s manual. And hybrid or multimodal embedding spaces would place images of cars near their owner manuals.

Embeddings are an abstraction that provides great power and generality, in the form of the ability to represent not the literal original content (like a long sequence of words) but something closer to its underlying meaning. The price for this abstraction is loss of detail and information. For instance, the embedding of this entire article would place it in close proximity to similar content (for instance, general-audience science prose) but would not contain enough information to re-create the article verbatim. The lossy nature of embeddings has implications we shall return to shortly.

Embeddings are learned from the massive amount of information on the Internet and elsewhere about implicit correspondences. Even aliens landing on earth who could read English but knew nothing else about the world would quickly realize that “doctor” and “hospital” are closely related because of their frequent proximity in text, even if they had no idea what these words actually signified. Furthermore, not only do embeddings permit generative AI to understand existing content, but they allow it to generate new content. When we ask for a picture of a squirrel on a snowboard in the style of Andy Warhol, it is the embedding that lets the technology explore novel images that interpolate between those of actual Warhols, squirrels, and snowboards.

Thus, the inherent language of generative (and therefore agentic) AI is not the sentences and images we are so familiar with but their embeddings. Let us now reconsider a world in which agents interact with humans, content, and other agents. Obviously, we will continue to expect agentic AI to communicate with humans in ordinary language and images. But there is no reason for agent-to-agent communication to take place in human languages; per the discussion above, it would be more natural for it to occur in the native embedding language of the underlying neural networks.

My personal agent, working on a vacation itinerary, might ingest materials such as my previous flights, hotels, and vacation photos to understand my interests and preferences. But to communicate those preferences to another agent — say, an agent aggregating hotel details, prices, and availability — it will not provide the raw source materials; in addition to being massively inefficient and redundant, that could present privacy concerns (more on this below). Rather, my agent will summarize my preferences as a point, or perhaps many points, in an embedding space.

In this example, the red, green, and blue points are three-dimensional embeddings of restaurants at which three people (Alice, Bob, and Chris) have eaten. (A real-world embedding, by contrast, might have hundreds of dimensions.) Each glowing point represents the center of one of the clusters, and its values summarize the restaurant preferences of the corresponding person. AI agents could use such vector representations, rather than text, to share information with each other.

By similar reasoning, we might also expect the gradual development of an “agentic Web” meant for navigation by AI, in which the text and images on websites are pre-translated into embeddings that are illegible to humans but are massively more efficient than requiring agents to perform these translations themselves with every visit. In the same way that many websites today have options for English, Spanish, Chinese, and many other languages, there would be an option for Agentic.

All the above presupposes that embedding spaces are shared and standardized across generative and agentic AI systems. This is not true today: embeddings differ from model to model and are often considered proprietary. It’s as if all generative AI systems speak slightly different dialects of some underlying lingua franca. But these observations about agentic language and communication may foreshadow the need for AI scientists to work toward standardization, at least in some form. Each agent can have some special and proprietary details to its embeddings — for instance, a financial-services agent might want to use more of its embedding space for financial terminology than an agentic travel assistant would — but the benefits of a common base embedding are compelling.

Keeping things in context

Even casual users of LLMs may be aware of the notion of “context”, which is informally what and how much the LLM remembers and understands about its recent interactions and is typically measured (at least cosmetically) by the number of words or tokens (word parts) recalled. There is again an apt metaphor with human cognition, in the sense that context can be thought of as the “working memory” of the LLM. And like our own working memory, it can be selective and imperfect.

If we participate in an experiment to test how many random digits or words we can memorize at different time scales, we will of course eventually make mistakes if asked to remember too many things for too long. But we will not forget what the task itself is; our short-term memory may be fallible, but we generally grasp the bigger picture.

Related content

Large language models’ emergent abilities are improving with scale; as scale grows, where are LLMs heading? Insights from Ray Solomonoff’s theory of induction and stochastic realization theory may help us envision — and guide — the limits of scaling.

These same properties broadly hold for LLM context — which is sometimes surprising to users, since we expect computers to be perfect at memorization but highly fallible on more abstract tasks. But when we remember that LLMs do not operate directly on the sequence of words or tokens in the context but on the lossy embedding of that sequence, these properties become less mysterious (though perhaps not less frustrating when an LLM can’t remember something it did just a few steps ago).

Some of the principal advances in LLM technology have been around improvements in context: LLMs can now remember and understand more context and leverage that context to tailor their responses with greater accuracy and sophistication. This greater window of working memory is crucial for many tasks to which we would like to apply agentic AI, such as having an LLM read and understand the entire code base of a large software development project, or all the documents relevant to a complex legal case, and then be able to reason about the contents.

How will context and its limitations affect agentic AI? If embeddings are the language of LLMs, and context is the expression of an LLM’s working memory in that language, a crucial design decision in agent-agent interactions will be how much context to share. Sharing too little will handicap the functionality and efficiency of agentic dialogues; sharing too much will result in unnecessary complexity and potential privacy concerns (just as in human-to-human interactions).

Let us illustrate by returning to my personal agent, who having found and booked my hotel is working with an external airline flight aggregation agent. It would be natural for my agent to communicate lots of context about my travel preferences, perhaps including conditions under which I might be willing to pay or use miles for an upgrade to business class (such as an overnight international flight). But my agent should not communicate context about my broader financial status (savings, debt, investment portfolio), even though in theory these details might correlate with my willingness to pay for an upgrade. When we consider that context is not my verbatim history with my travel agent, but an abstract summary in embedding space, decisions about contextual boundaries and how to enforce them become difficult.

Indeed, this is a relatively untouched scientific topic, and researchers are only just beginning to consider questions such as what can be reverse-engineered about raw data given only its embedding. While human or system prompts to shape inter-agent dealings might be a stopgap (“be sure not to tell the flight agent any unnecessary financial information”), a principled understanding of embedding privacy vulnerabilities and how to mitigate them (perhaps via techniques such as differential privacy) is likely to be an important research area going forward.

Agentic bargains

So far, we’ve talked a fair amount about interagent dialogues but have treated these conversations rather generally, much as if we were speaking about two humans in a collaborative setting. But there will be important categories of interaction that will need to be more structured and formal, with identifiable outcomes that all parties commit to. Negotiation, bargaining, and other strategic interactions are a prime example.

I obviously want my personal agent, when booking hotels and flights for my trips, to get the best possible prices and other conditions (room type and view, flight seat location, and so on). The agents aggregating hotels and flights would similarly prefer that I pay more rather than less, on behalf of their own clients and users.

For my agent to act in my interests in these settings, I’ll need to specify at least some broad constraints on my preferences and willingness to pay for them, and not in fuzzy terms: I can’t expect my agent to simply “know a bargain when it sees one” the way I might if I were handling all the arrangements myself, especially because my notion of a bargain might be highly subjective and dependent on many factors. Again, a near-term makeshift approach might address this via prompt shaping — “be sure to get the best deal possible, as long as the flight is nonstop and leaves in the morning, and I have an aisle seat” — but longer-term solutions will have to be more sophisticated and granular.

Related content

Amazon Research Award recipient Éva Tardos studies complex theoretical questions that have far-ranging practical consequences.

Of course, the mathematical and scientific foundations of negotiating and bargaining have been well studied for decades by game theorists, microeconomists, and related research communities. Their analyses typically begin by presuming the articulation of utility functions for all the parties involved — an abstraction capturing (for example) my travel preferences and willingness to pay for them. The literature also considers settings in which I can’t quantitatively express my own utilities but “know bargains when I see them”, in the sense that given two options (a middle seat on a long flight for $200 vs. a first-class seat for $2,000), I will make the choice consistent with my unknown utilities. (This is the domain of the aptly named utility elicitation.)

Much of the science in such areas is devoted to the question of what “should” happen when fully rational parties with precisely specified utilities, perfect memory, and unlimited computational power come to the proverbial bargaining table; equilibrium analysis in game theory is just one example of this kind of research. But given our observations about the human-like cognitive abilities and shortcomings of LLMs, perhaps a more relevant starting point for agentic negotiation is the field of behavioral economics. Instead of asking what should happen when perfectly rational agents interact, behavioral economics asks what does happen when actual human agents interact strategically. And this is often quite different, in interesting ways, than what fully rational agents would do.

For instance, consider the canonical example of behavioral game theory known as the Ultimatum Game. In this game, there is $10 to potentially divide between two players, Alice and Bob. Alice first proposes any split she likes. Bob then either accepts Alice’s proposal, in which case both parties get their proposed shares, or rejects Alice’s proposal, in which case each party receives nothing. The equilibrium analysis is straightforward: Alice, being fully rational and knowing that Bob is also, proposes the smallest nonzero amount to Bob, which is a penny. Bob, being fully rational, would prefer to receive a penny than nothing, so he accepts.

Game theory (left) supposes that the recipient in the ultimatum game will accept a low offer, since something is better than nothing, but behavioral economics (right) reveals that, in fact, offers tend to concentrate in the range of $3 to $5, and lower offers are frequently rejected.

Nothing remotely like this happens when humans play. Across hundreds of experiments varying myriad conditions — social, cultural, gender, wealth, etc. — a remarkably consistent aggregate behavior emerges. Alice almost always proposes a share to Bob of between $3 and $5 (the fact that Alice gets to move first seems to prime both players for Bob to potentially get less than half the pie). And conditioned on Alice’s proposal being in this range, Bob almost always accepts her offer. But on those rare occasions in which Alice is more aggressive and offers Bob an amount much less than $3, Bob’s rejection rate skyrockets. It’s as if pairs of people — who have never heard of or played the Ultimatum Game before — have an evolutionarily hardwired sense of what’s “fair” in this setting.

The way in which the ultimatum game is played — the frequency of particular offers and the rate of rejection — varies across cultures, but this graph illustrates general trends in the data. Offers tend to concentrate between $3 and $5, with a steep falloff above $5, and the rejection rate is high for low offers.

Now back to LLMs and agentic AI. There is already a small but growing literature on what we might call LLM behavioral game theory and economics, in which experiments like the one above are replicated — except human participants are replaced by AI. One early work showed that LLMs almost exactly replicated human behavior in the Ultimatum Game, as well as other classical behavioral-economics findings.

Note that it is possible to simulate the demographic variability of human subjects in such experiments via LLM prompting, e.g., “You are Alice, a 37-year-old Hispanic medical technician living in Boston, Massachusetts”. Other studies have again shown human-like behavior of LLMs in trading games, price negotiations, and other settings. A very recent study claims that LLMs can even engage in collusive price-fixing behaviors and discusses potential regulatory implications for AI agents.

Once we have a grasp on the behaviors of agentic AI in strategic settings, we can turn to shaping that behavior in desired ways. The field of mechanism design in economics complements areas like game theory by asking questions like “given that this is how agents generally negotiate, how can we structure those negotiations to make them fair and beneficial?” A classic example is the so-called second-price auction, where the highest bidder wins the item — but only pays the second highest bid. This design is more truthful than a standard first-price auction, in the sense that everyone’s optimal strategy is to simply bid the price at which they are indifferent to winning or losing (their subjective valuation of the item); nobody needs to think about other agents’ behaviors or valuations.

We anticipate a proliferation of research on topics like these, as agentic bargaining becomes commonplace and an important component of what we delegate to our AI assistants.

The enduring challenge of common sense

I’ll close with some thoughts on a topic that has bedeviled AI from its earliest days and will continue to do so in the agentic era, albeit in new and more personalized ways. It’s a topic that is as fundamental as it is hard to define: common sense.

By common sense, we mean things that are “obvious”, that any human with enough experience in the world would know without explicitly being told. For example, imagine a glass full of water sitting on a table. We would all agree that if we move the glass to the left or right on the table, it’s still a glass of water. But if we turn it upside down, it’s still a glass on the table, but no longer a glass of water (and is also a mess to be cleaned up). It’s quite unlikely any of us were ever sat down and run through this narrative, and it’s also a good bet that you’ve never deliberately considered such facts before. But we all know and agree on them.

Related content

Using large language models to discern commonsense relationships can improve performance on downstream tasks by as much as 60%.

Figuring out how to imbue AI models and systems with common sense has been a priority of AI research for decades. Before the advent of modern large-scale machine learning, there were efforts like the Cyc project (for “encyclopedia”), part of which was devoted to manually constructing a database of commonsense facts like the ones above about glasses, tables, and water. Eventually the consumer Internet generated enough language and visual data that many such general commonsense facts could be learned or inferred: show a neural network millions of pictures of glasses, tables and water and it will figure things out. Very early research also demonstrated that it was possible to directly encode certain invariances (similar to shifting a glass of water on a table) into the network architecture, and LLM architectures are similarly carefully designed in the modern era.

But in agentic AI, we expect our proxies to understand not only generic commonsense facts of the type we’ve been discussing but also “common sense” particular to our own preferences — things that would make sense to most people if only they understood our contexts and perspectives. Here a pure machine learning approach will likely not suffice. There just won’t be enough data to learn from scratch my subjective version of common sense.

For example, consider your own behavior or “policy” around leaving doors open or closed, locked or unlocked. If you’re like me, these policies can be surprisingly nuanced, even though I follow them without thought all the time. Often, I will close and lock doors behind me — for instance, when I leave my car or my house (unless I’m just stepping right outside to water the plants). Other times I will leave a door unlocked and open, such as when I’m in my office and want to signal I am available to chat with colleagues or students. I might close but leave unlocked that same door when I need to focus on something or take a call. And sometimes I’ll leave my office door unlocked and open even when I’m not in it, despite there being valuables present, because I trust the people on my floor and I’m going to be nearby.

We might call behaviors like these subjective common sense, because to me they are natural and obvious and have good reasons behind them, even though I follow them almost instinctually, the same way I know not to turn a glass of water upside down on the table. But you of course might have very different behaviors or policies in the same or similar situations, with your own good reasons.

Related content

Dataset contains more than 11,000 newly collected dialogues to aid research in open-domain conversation.

The point is that even an apparently simple matter like my behavior regarding doors and locks can be difficult to articulate. But agentic AI will need specifications like this: simply replace doors with online accounts and services and locks with passwords and other authentication credentials. Sometimes we might share passwords with family or friends for less-critical privacy-sensitive resources like Netflix or Spotify, but we would not do the same for bank accounts and medical records. I might be less rigorous about restricting access to, or even encrypting, the files on my laptop than I would be about files I store in the cloud.

The circumstances under which I trust my own or other agents with resources that need to be private and secure will be at least as complex as those regarding door closing and locking. The primary difficulty is not in having the right language or formalisms to specify such policies: there are good proposals for such specification frameworks and even for proving the correctness of their behaviors. The problem is in helping people articulate and translate their subjective common sense into these frameworks in the first place.

Conclusion

The agentic-AI era is in its infancy, but we should not take that to mean we have a long and slow development and adoption period before us. We need only look at the trajectory of the underlying generative AI technology — from being almost entirely unknown outside of research circles as recently as early 2022 to now being arguably the single most important scientific innovation of the century so far. And indeed, there is already widespread use of what we might consider early agentic systems, such as the latest coding agents.

Far beyond the initial “autocomplete for Python” tools of a few years ago, such agents now do so much more — writing working code from natural-language prompts and descriptions, accessing external resources and datasets, proactively designing experiments and visualizing the results, and most importantly (especially for a novice programmer like me), seamlessly handling the endless complexity of environment settings, software package installs and dependencies, and the like. My Amazon Scholar and University of Pennsylvania colleague Aaron Roth and I recently wrote a machine learning paper of almost 50 pages — complete with detailed definitions, theorem statements and proofs, code, and experiments — using nothing except (sometimes detailed) English prompts to such a tool, along with expository text we wrote directly. This would have been unthinkable just a year ago.

Despite the speed with which generative AI has permeated industry and society at large, its scientific underpinnings go back many decades, arguably to the birth of AI but certainly no later than the development of neural-network theory and practice in the 1980s. Agentic AI — built on top of these generative foundations, but quite distinct in its ambitions and challenges — has no such deep scientific substrate on which to systematically build. It’s all quite fresh territory. I’ve tried to anticipate some of the more fundamental challenges here, and I’ve probably got half of them wrong. To paraphrase the Philadelphia department store magnate John Wanamaker, I just don’t know which half — yet.

Source link

Events & Conferences

A New Ranking Framework for Better Notification Quality on Instagram

Published

2 weeks ago

September 2, 2025

Xian Sun

We’re sharing how Meta is applying machine learning (ML) and diversity algorithms to improve notification quality and user experience.
We’ve introduced a diversity-aware notification ranking framework to reduce uniformity and deliver a more varied and engaging mix of notifications.
This new framework reduces the volume of notifications and drives higher engagement rates through more diverse outreach.

Notifications are one of the most powerful tools for bringing people back to Instagram and enhancing engagement. Whether it’s a friend liking your photo, another close friend posting a story, or a suggestion for a reel you might enjoy, notifications help surface moments that matter in real time.

Instagram leverages machine learning (ML) models to decide who should get a notification, when to send it, and what content to include. These models are trained to optimize for user positive engagement such as click-through-rate (CTR) – the probability of a user clicking a notification – as well as other metrics like time spent.

However, while engagement-optimized models are effective at driving interactions, there’s a risk that they might overprioritize the product types and authors someone has previously engaged with. This can lead to overexposure to the same creators or the same product types while overlooking other valuable and diverse experiences.

This means people could miss out on content that would give them a more balanced, satisfying, and enriched experience. Over time, this can make notifications feel spammy and increase the likelihood that people will disable them altogether.

The real challenge lies in finding the right balance: How can we introduce meaningful diversity into the notification experience without sacrificing the personalization and relevance people on Instagram have come to expect?

To tackle this, we’ve introduced a diversity-aware notification ranking framework that helps deliver more diverse, better curated, and less repetitive notifications. This framework has significantly reduced daily notification volume while improving CTR. It also introduces several benefits:

The extensibility of incorporating customized soft penalty (demotion) logic for each dimension, enabling more adaptive and sophisticated diversity strategies.
The flexibility of tuning demotion strength across dimensions like content, author, and product type via adjustable weights.
The integration of balancing personalization and diversity, ensuring notifications remain both relevant and varied.

The Risks of Notifications without Diversity

The issue of overexposure in notifications often shows up in two major ways:

Overexposure to the same author: People might receive notifications that are mostly about the same friend. For example, if someone often interacts with content from a particular friend, the system may continue surfacing notifications from that person alone – ignoring other friends they also engage with. This can feel repetitive and one-dimensional, reducing the overall value of notifications.

Overexposure to the same product surface: People might mostly receive notifications from the same product surface such as Stories, even when Feed or Reels could provide value. For example, someone may be interested in both reel and story notifications but has recently interacted more often with stories. Because the system heavily prioritizes past engagement, it sends only story notifications, overlooking the person’s broader interests.

Introducing Instagram’s Diversity-Aware Notification Ranking Framework

Instagram’s diversity-aware notification ranking framework is designed to enhance the notification experience by balancing the predicted potential for user engagement with the need for content diversity. This framework introduces a diversity layer on top of the existing engagement ML models, applying multiplicative penalties to the candidate scores generated by these models, as figure1, below, shows.

The diversity layer evaluates each notification candidate’s similarity to recently sent notifications across multiple dimensions such as content, author, notification type, and product surface. It then applies carefully calibrated penalties—expressed as multiplicative demotion factors—to downrank candidates that are too similar or repetitive. The adjusted scores are used to re-rank the candidates, enabling the system to select notifications that maintain high engagement potential while introducing meaningful diversity. In the end, the quality bar selects the top-ranked candidate that passes both the ranking and diversity criteria.

Figure.1: Instagram’s diversity-aware ranking framework where the diversity layer sits on top of the existing modeling layer and penalizes notifications that are too similar to recently sent ones.

Mathematical Formulation

Within the diversity layer, we apply a multiplicative demotion factor to the base relevance score of each candidate. Given a notification candidate 𝑐, we compute its final score as the product of its base ranking score and a diversity demotion multiplier:

$\text{Score}(c) = R(c) \times D(c)$

where R(c) represents the candidate’s base relevance score, and D(c) ∈ [0,1] is a penalty factor that reduces the score based on similarity to recently sent notifications. We define a set of semantic dimensions (e.g., author, product type) along which we want to promote diversity. For each dimension i, we compute a similarity signal p_i(c) between candidate c and the set of historical notifications H, using a maximal marginal relevance (MMR) approach:

$p_i(c) = \mathrm{max}_{h \in H}\mathrm{sim}_i(c, h)$

where sim_i(·,·) is a predefined similarity function for dimension i. In our baseline implementation, p_i(c) is binary: it equals 1 if the similarity exceeds a threshold 𝜏_i and 0 otherwise.

The final demotion multiplier is defined as:

$D(c) = \prod_{i=1}^{m} \left( 1 - w_i \cdot p_i(c) \right)$

where each w_i∈ [0,1] controls the strength of demotion for its respective dimension. This formulation ensures that candidates similar to previously delivered notifications along one or more dimensions are proportionally down-weighted, reducing redundancy and promoting content variation. The use of a multiplicative penalty allows for flexible control across multiple dimensions, while still preserving high-relevance candidates.

The Future of Diversity-Aware Ranking

As we continue evolving our notification diversity-aware ranking system, a next step is to introduce more adaptive, dynamic demotion strategies. Instead of relying on static rules, we plan to make demotion strength responsive to notification volume and delivery timing. For example, as a user receives more notifications—especially of similar type or in rapid succession—the system progressively applies stronger penalties to new notification candidates, effectively mitigating overwhelming experiences caused by high notification volume or tightly spaced deliveries.

Longer term, we see an opportunity to bring large language models (LLMs) into the diversity pipeline. LLMs can help us go beyond surface-level rules by understanding semantic similarity between messages and rephrasing content in more varied, user-friendly ways. This would allow us to personalize notification experiences with richer language and improved relevance while maintaining diversity across topics, tone, and timing.

Source link