Connect with us

AI Research

Breaking the networking wall in AI infrastructure  – Microsoft Research

Published

on


Memory and network bottlenecks are increasingly limiting AI system performance by reducing GPU utilization and overall efficiency, ultimately preventing infrastructure from reaching its full potential despite enormous investments. At the core of this challenge is a fundamental trade-off in the communication technologies used for memory and network interconnects.

Datacenters typically deploy two types of physical cables for communication between GPUs. Traditional copper links are power-efficient and reliable, but limited to very short distances (< 2 meters) that restrict their use to within a single GPU rack. Optical fiber links can reach tens of meters, but they consume far more power and fail up to 100 times as often as copper. A team working across Microsoft aims to resolve this trade-off by developing MOSAIC, a novel optical link technology that can provide low power and cost, high reliability, and long reach (up to 50 meters) simultaneously. This approach leverages a hardware-system co-design and adopts a wide-and-slow design with hundreds of parallel low-speed channels using microLEDs. 

The fundamental trade-off among power, reliability, and reach stems from the narrow-and-fast architecture deployed in today’s copper and optical links, comprising a few channels operating at very high data rates. For example, an 800 Gbps link consists of eight 100 Gbps channels. With copper links, higher channel speeds lead to greater signal integrity challenges, which limits their reach. With optical links, high-speed transmission is inherently inefficient, requiring power-hungry laser drivers and complex electronics to compensate for transmission impairments. These challenges grow as speeds increase with every generation of networks. Transmitting at high speeds also pushes the limits of optical components, reducing systems margins and increasing failure rates. 

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience


These limitations force systems designers to make unpleasant choices, limiting the scalability of AI infrastructure. For example, scale-up networks connecting AI accelerators at multi-Tbps bandwidth typically must rely on copper links to meet the power budget, requiring ultra-dense racks that consume hundreds of kilowatts per rack. This creates significant challenges in cooling and mechanical design, which constrain the practical scale of these networks and end-to-end performance. This imbalance ultimately erects a networking wall akin to the memory wall, in which CPU speeds have outstripped memory speeds, creating performance bottlenecks.

A technology offering copper-like power efficiency and reliability over long distances can overcome this networking wall, enabling multi-rack scale-up domains and unlocking new architectures. This is a highly active R&D area, with many candidate technologies currently being developed across the industry. In our recent paper delivered at ACM SIGCOMM MOSAIC: Breaking the Optics versus Copper Trade-off with a Wide-and-Slow Architecture and MicroLEDs, we present one such promising approach that is the result of a multi-year collaboration between Microsoft Research, Azure, and M365. This work is centered around an optical wide-and-slow architecture, shifting from a small number of high-speed serial channels towards hundreds of parallel low-speed channels. This would be impractical to realize with today’s copper and optical technologies because of i) electromagnetic interference challenges in high-density copper cables and ii) the high cost and power consumption of lasers in optical links, as well as the increase in packaging complexity. MOSAIC overcomes these issues by leveraging directly modulated microLEDs, a technology originally developed for screen displays. 

MicroLEDs are significantly smaller than traditional LEDs (ranging from a few to tens of microns) and, due to their small size, they can be modulated at several Gbps. They are manufactured in large arrays, with over half a million in a small physical footprint for high-resolution displays like head-mounted devices or smartwatches. For example, assuming 2 Gbps per microLED channel, an 800 Gbps MOSAIC link can be realized by using a 20×20 microLED array, which can fit in less than 1 mm×1 mm silicon die. 

MOSAIC’s wide-and-slow design provides four core benefits.

  • Operating at low speed improves power efficiency by eliminating the need for complex electronics and reducing optical power requirements.
  • By leveraging optical transmission (via microLEDs), MOSAIC sidesteps copper’s reach issues, supporting distances up to 50 meters, or > 10x further than copper.
  • MicroLEDs’ simpler structure and temperature insensitivity make them more reliable than lasers. The parallel nature of wide-and-slow also makes it easy to add redundant channels, further increasing reliability, up to two orders of magnitude higher than optical links. 
  • The approach is also scalable, as higher aggregate speeds (e.g., 1.6 Tbps or 3.2 Tbps) can be achieved by increasing the number of channels and/or raising per-channel speed (e.g., to 4-8 Gbps). 

Further, MOSAIC is fully compatible with today’s pluggable transceivers’ form factor and it provides a drop-in replacement for today’s copper and optical cables, without requiring any changes to existing server and network infrastructure. MOSAIC is protocol-agnostic, as it simply relays bits from one endpoint to another without terminating or inspecting the connection and, hence, it’s fully compatible with today’s protocols (e.g., Ethernet, PCIe, CXL). We are currently working with our suppliers to productize this technology and scale to mass production. 

While conceptually simple, realizing this architecture posed a few key challenges across the stack, which required a multi-disciplinary team with expertise spanning across integrated photonics, lens design, optical transmission, and analog and digital design. For example, using individual fibers per channel would be prohibitively complex and costly due to the large number of channels. We addressed this by employing imaging fibers, which are typically used for medical applications (e.g., endoscopy). They can support thousands of cores per fiber, enabling multiplexing of many channels within a single fiber. Also, microLEDs are a less pure light source than lasers, with a larger beam shape (which complicates fiber coupling) and a broader spectrum (which degrades fiber transmission due to chromatic dispersion). We tackled these issues through a novel microLED and optical lens design, and a power-efficient analog-only electronic back end, which does not require any expensive digital signal processing.  

Based on our current estimates, this approach can save up to 68% of power, i.e., more than 10W per cable while reducing failure rates by up to 100x. With global annual shipments of optical cables reaching into the tens of millions, this translates to over 100MW of power savings per year, enough to power more than 300,000 homes. While these immediate gains are already significant, the unique combination of low power consumption, reduced cost, high reliability, and long reach opens up exciting new opportunities to rethink AI infrastructure from network and cluster architectures to compute and memory designs.

For example, by supporting low-power, high-bandwidth connectivity at long reach, MOSAIC removes the need for ultra-dense racks and enables novel network topologies, which would be impractical today. The resulting redesign could reduce resource fragmentation and simplify collective optimization. Similarly, on the compute front, the ability to connect silicon dies at low power over long distances could enable resource disaggregation, shifting from today’s large, multi-die packages to smaller, more cost-effective, ones. Bypassing packaging area constraints would also make it possible to drastically increase GPU memory capacity and bandwidth, while facilitating adoption of novel memory technologies

Historically, step changes in network technology have unlocked entirely new classes of applications and workloads. While our SIGCOMM paper provides possible future directions, we hope this work sparks broader discussion and collaboration across the research and industry communities.





Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Mira Murati’s Thinking Machines Lab Publishes First Research on Deterministic AI Models

Published

on


Thinking Machines Lab, the AI research company founded by former OpenAI CTO Mira Murati, has released its first public research under a new blog series titled Connectionism. Backed by $2 billion in seed funding and a team of former OpenAI researchers, the lab is focused on solving fundamental challenges in AI.

The inaugural post, authored by Horace He, explores how randomness in large language model inference arises from GPU kernel orchestration. The research outlines techniques to create deterministic responses, a breakthrough with potential applications in enterprise reliability, scientific research, and reinforcement learning. The publication marks a rare glimpse into one of Silicon Valley’s most closely watched AI startups as it prepares its first product launch.



Source link

Continue Reading

AI Research

When you call Donatos, you might be talking to AI

Published

on


If you call Donatos Pizza to place an order, you might be speaking with artificial intelligence.

The Columbus-based pizza chain announced that it has completed a systemwide rollout of voice-ordering technology powered by Revmo AI. The company says the system is now live at all 174 Donatos locations and has already handled more than 301,000 calls since June.

Donatos Reports Higher Order Accuracy, More Efficient Operations

According to Donatos, the AI system has converted 71% of calls into orders, up from 58% before the rollout, and has achieved 99.9% order accuracy. The company also says the switch freed up nearly 5,000 hours of staff time in August alone, allowing employees to focus more on preparing food and serving in-store customers.

“Our focus was simple: deliver a better guest experience on the phone and increase order conversions,” Kevin King, President of Donatos Pizza, said in a statement.

Ben Smith, Donatos’ Director of Operations Development, said the change provided immediate relief on the phones, allowing staff to redirect time to order accuracy and hospitality.

Donatos said it plans to expand the system to handle more types of calls and to make greater use of its centralized answering center. The company did not say whether it plans to reduce call center staffing or rely more heavily on automation in the future.

Other chains report trouble with AI ordering systems

Taco Bell recently started re-evaluating its used of AI to take orders in the drive-thru after viral videos exposed its flaws. In one well-known video, a man crashed the system by ordering 18,000 cups of water. The company is now looking at how AI can help during busy times and when it’s appropriate for a human employee to step in and take the order.

Last year, McDonald’s ended its AI test in 100 restaurants after similar problems surfaced. In one case, AI added bacon to a customer’s ice cream. A McDonald’s executive told the BBC that artificial intelligence will still be part of the chain’s future.



Source link

Continue Reading

AI Research

Why Ibex Stock Surged 41% to All-Time Highs Today (Hint: It’s Artificial Intelligence)

Published

on


Key Points

  • Ibex reported record revenue for its fourth quarter and full year of 2025.

  • Ibex is expanding its AI tools and targeting new verticals.

  • The stock hit an all-time high on Sept. 12.

  • 10 stocks we like better than Ibex ›

Shares of little-known company Ibex (NASDAQ: IBEX) went parabolic today, shooting 41.1% higher in early-morning trading. The stock was still trading around 33% up at 1:15 p.m. ET Friday.

Ibex is a business process outsourcing company, providing a wide array of services such as customer and technical support, lead generation, surveys, and business intelligence and analytics.

Where to invest $1,000 right now? Our analyst team just revealed what they believe are the 10 best stocks to buy right now. Continue »

Turns out, Ibex’s efforts to build a digital business have already started to pay off, and that is drawing attention to the stock today. The keyword here is artificial intelligence (AI).

Image source: Getty Images.

AI-driven growth

Ibex reported numbers for its 2025 fourth quarter and fiscal year (ended June 30) after the Sept. 11 market close. Ibex’s Q4 revenue jumped 18% year over year to $147 million, driven by strong growth in its top three markets: retail and e-commerce; healthcare; and travel, transportation, and logistics.

The real deal, however, is what Ibex’s full earnings report looked like:

  • Record fourth-quarter and full-year revenue
  • Highest revenue growth in 11 quarters
  • Fastest revenue growth in three years for the full year
  • Record free cash flow

These are big milestones, but they’re not really why Ibex stock is going to the moon. It’s these words from CEO Bob Dechant: “Importantly, this quarter marked the shift from proof of concept for our AI solutions to full-scale deployments, setting the table for future growth.”

Ibex is “transforming into a digital-first business” by leveraging AI through its Wave iX platform, which uses generative AI to improve customer experiences. Earlier this month, Ibex said it is targeting the government sector now.

What’s next for Ibex stock?

The company’s capital expenditures more than doubled to $18.4 million in 2025, driven by capacity expansion. Ibex generated record free cash flow of $27.3 million in the year and repurchased nearly 3.9 million shares, almost 23% of its outstanding shares.

Following Ibex’s strong earnings report, analysts at RBC Capital were quick to raise their price target on the stock to $39 per share from $31 a share. Ibex stock already hit an all-time high of $42.99 per share today.

With Ibex projecting 7.5% revenue growth at the midpoint for FY 2026 and capital expenditure of $20 million to $25 million on further expansions, this is one stock you should have on your radar.

Should you invest $1,000 in Ibex right now?

Before you buy stock in Ibex, consider this:

The Motley Fool Stock Advisor analyst team just identified what they believe are the 10 best stocks for investors to buy now… and Ibex wasn’t one of them. The 10 stocks that made the cut could produce monster returns in the coming years.

Consider when Netflix made this list on December 17, 2004… if you invested $1,000 at the time of our recommendation, you’d have $649,037!* Or when Nvidia made this list on April 15, 2005… if you invested $1,000 at the time of our recommendation, you’d have $1,086,028!*

Now, it’s worth noting Stock Advisor’s total average return is 1,056% — a market-crushing outperformance compared to 188% for the S&P 500. Don’t miss out on the latest top 10 list, available when you join Stock Advisor.

See the 10 stocks »

*Stock Advisor returns as of September 8, 2025

Neha Chamaria has no position in any of the stocks mentioned. The Motley Fool has no position in any of the stocks mentioned. The Motley Fool has a disclosure policy.

The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.



Source link

Continue Reading

Trending