Connect with us

AI Research

IBM AI Research Releases Two English Granite Embedding Models, Both Based on the ModernBERT Architecture

Published

on


IBM has quietly built a strong presence in the open-source AI ecosystem, and its latest release shows why it shouldn’t be overlooked. The company has introduced two new embedding models—granite-embedding-english-r2 and granite-embedding-small-english-r2—designed specifically for high-performance retrieval and RAG (retrieval-augmented generation) systems. These models are not only compact and efficient but also licensed under Apache 2.0, making them ready for commercial deployment.

What Models Did IBM Release?

The two models target different compute budgets. The larger granite-embedding-english-r2 has 149 million parameters with an embedding size of 768, built on a 22-layer ModernBERT encoder. Its smaller counterpart, granite-embedding-small-english-r2, comes in at just 47 million parameters with an embedding size of 384, using a 12-layer ModernBERT encoder.

Despite their differences in size, both support a maximum context length of 8192 tokens, a major upgrade from the first-generation Granite embeddings. This long-context capability makes them highly suitable for enterprise workloads involving long documents and complex retrieval tasks.

https://arxiv.org/abs/2508.21085

What’s Inside the Architecture?

Both models are built on the ModernBERT backbone, which introduces several optimizations:

  • Alternating global and local attention to balance efficiency with long-range dependencies.
  • Rotary positional embeddings (RoPE) tuned for positional interpolation, enabling longer context windows.
  • FlashAttention 2 to improve memory usage and throughput at inference time.

IBM also trained these models with a multi-stage pipeline. The process started with masked language pretraining on a two-trillion-token dataset sourced from web, Wikipedia, PubMed, BookCorpus, and internal IBM technical documents. This was followed by context extension from 1k to 8k tokens, contrastive learning with distillation from Mistral-7B, and domain-specific tuning for conversational, tabular, and code retrieval tasks.

How Do They Perform on Benchmarks?

The Granite R2 models deliver strong results across widely used retrieval benchmarks. On MTEB-v2 and BEIR, the larger granite-embedding-english-r2 outperforms similarly sized models like BGE Base, E5, and Arctic Embed. The smaller model, granite-embedding-small-english-r2, achieves accuracy close to models two to three times larger, making it particularly attractive for latency-sensitive workloads.

https://arxiv.org/abs/2508.21085

Both models also perform well in specialized domains:

  • Long-document retrieval (MLDR, LongEmbed) where 8k context support is critical.
  • Table retrieval tasks (OTT-QA, FinQA, OpenWikiTables) where structured reasoning is required.
  • Code retrieval (CoIR), handling both text-to-code and code-to-text queries.

Are They Fast Enough for Large-Scale Use?

Efficiency is one of the standout aspects of these models. On an Nvidia H100 GPU, the granite-embedding-small-english-r2 encodes nearly 200 documents per second, which is significantly faster than BGE Small and E5 Small. The larger granite-embedding-english-r2 also reaches 144 documents per second, outperforming many ModernBERT-based alternatives.

Crucially, these models remain practical even on CPUs, allowing enterprises to run them in less GPU-intensive environments. This balance of speed, compact size, and retrieval accuracy makes them highly adaptable for real-world deployment.

What Does This Mean for Retrieval in Practice?

IBM’s Granite Embedding R2 models demonstrate that embedding systems don’t need massive parameter counts to be effective. They combine long-context support, benchmark-leading accuracy, and high throughput in compact architectures. For companies building retrieval pipelines, knowledge management systems, or RAG workflows, Granite R2 provides a production-ready, commercially viable alternative to existing open-source options.

https://arxiv.org/abs/2508.21085

Summary

In short, IBM’s Granite Embedding R2 models strike an effective balance between compact design, long-context capability, and strong retrieval performance. With throughput optimized for both GPU and CPU environments, and an Apache 2.0 license that enables unrestricted commercial use, they present a practical alternative to bulkier open-source embeddings. For enterprises deploying RAG, search, or large-scale knowledge systems, Granite R2 stands out as an efficient and production-ready option.


Check out the Paper, granite-embedding-small-english-r2 and granite-embedding-english-r2. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

AI Research

Pentagon research official wants to have AI on every desktop in 6 to 9 months

Published

on


The Pentagon is angling to introduce artificial intelligence across its workforce within nine months following the reorganization of its key AI office.

Emil Michael, under secretary of defense for research and engineering at the Department of Defense, talked about the agency’s plans for introducing AI to its operations as it continues its modernization journey. 

“We want to have an AI capability on every desktop — 3 million desktops — in six or nine months,” Michael said during a Politico event on Tuesday. “We want to have it focus on applications for corporate use cases like efficiency, like you would use in your own company … for intelligence and for warfighting.”

This announcement follows the recent shakeups and restructuring of the Pentagon’s main artificial intelligence office. A senior defense official said the Chief Digital and Artificial Intelligence Office will serve as a new addition to the department’s research portfolio.

Michael also said he is “excited” about the restructured CDAO, adding that its new role will pivot to a focus on research that is similar to the Defense Advanced Research Projects Agency and Missile Defense Agency. This change is intended to enhance research and engineering priorities that will help advance AI for use by the armed forces and not take agency focus away from AI deployment and innovation.

“To add AI to that portfolio means it gets a lot of muscle to it,” he said. “So I’m spending at least a third of my time –– maybe half –– rethinking how the AI deployment strategy is going to be at DOD.”

Applications coming out of the CDAO and related agencies will then be tailored to corporate workloads, such as efficiency-related work, according to Michael, along with intelligence and warfighting needs.

The Pentagon first stood up the CDAO and brought on its first chief digital and artificial intelligence officer in 2022 to advance the agency’s AI efforts.

The restructuring of the CDAO this year garnered attention due to its pivotal role in investigating the defense applications of emerging technologies and defense acquisition activities. Job cuts within the office added another layer of concern, with reports estimating a 60% reduction in the CDAO workforce.





Source link

Continue Reading

AI Research

Pentagon CTO wants AI on every desktop in 6 to 9 months

Published

on


The Pentagon aims to get AI tools to its entire workforce next year, the department’s chief technical officer said one month after being given control of its main AI office.

“We want to have an AI capability on every desktop — 3 million desktops — in six or nine months,” Emil Michael, defense undersecretary for research and engineering, said at a Politico event on Tuesday. “We want to have it focus on applications for corporate use cases like efficiency, like you would use in your own company…for intelligence and for warfighting.”

Four weeks ago, the Chief Digital and Artificial Intelligence Office was demoted from reporting to Deputy Defense Secretary Stephen Feinberg to Michael, a subordinate.

Michael said CDAO will become a research body like the Defense Advanced Research Projects Agency and Missile Defense Agency. He said the change is meant to boost research and engineering into AI for the military, but not reduce its efforts to deploy AI and make innovations.

“To add AI to that portfolio means it gets a lot of muscle to it,” he said. “So I’m spending at least a third of my time—maybe half—rethinking how the AI-deployment strategy is going to be at DOD.”

He said applications would emerge from the CDAO and related agencies that will be tailored to corporate workloads.

The Pentagon created the CDAO in 2022 to advance the agency’s AI efforts and look into defense applications for emerging technologies. The office’s restructuring earlier this year garnered attention. Job cuts within the office added another layer of concern, with reports estimating a 60% reduction in the CDAO workforce.





Source link

Continue Reading

AI Research

Panelists Will Question Who Controls AI | ACS CC News

Published

on


Artificial intelligence (AI) has become one of the fastest-growing technologies in the world today. In many industries, individuals and organizations are racing to better understand AI and incorporate it into their work. Surgery is no exception, and that is why Clinical Congress 2025 has made AI one of the six themes of its Opening Day Thematic Sessions.

The first full day of the conference, Sunday, October 5, will include two back-to-back Panel Sessions on AI. The first session, “Using ChatGPT and AI for Beginners” (PS104), offers a foundation for surgeons not yet well versed in AI. The second, “AI: Who Is In Control?” (PS 110), will offer insights into the potential upsides and drawbacks of AI use, as well as its limitations and possible future applications, so that surgeons can involve this technology in their clinical care safely and effectively.

“AI: Who Is In Control?” will be moderated by Anna N. Miller, MD, FACS, an orthopaedic surgeon at Dartmouth Hitchcock Medical Center in Lebanon, New Hampshire, and Gabriel Brat, MD, MPH, MSc, FACS, a trauma and acute care surgeon at Beth Israel Deaconess Medical Center and an assistant professor at Harvard Medical School, both in Boston, Massachusetts.

In an interview, Dr. Brat shared his view that the use of AI is not likely to replace surgeons or decrease the need for surgical skills or decision-making. “It’s not an algorithm that’s going to be throwing the stitch. It’s still the surgeon.”

Nonetheless, he said that the starting presumption of the session is that AI is likely to be highly transformative to the profession over time.  

“Once it has significant uptake, it’ll really change elements of how we think about surgery,” he said, including creating meaningful opportunities for improvements.

The key question of the session, therefore, is not whether to engage with AI, but to do so in ways that ensure the best outcomes: “We as surgeons need to have a role in defining how to do so safely and effectively. Otherwise, people will start to use these tools, and we will be swept along with a movement as opposed to controlling it.”

To that end, Dr. Brat explained that the session will offer “a really strong translational focus by people who have been in the trenches working with these technologies.” He and Dr. Miller have specifically chosen an “all-star panel” designed to represent academia, healthcare associations, and industry. 

The panelists include Rachael A. Callcut, MD, MSPH, FACS, who is the division chief of trauma, acute care surgery and surgical critical care as well as associate dean of data science and innovation at the University of California-Davis Health in Sacramento, California. She will share the perspective on AI from academic surgery.

Genevieve Melton-Meaux, MD, PhD, FACS, FACMI, the inaugural ACS Chief Health Informatics Officer, will present on AI usage in healthcare associations. She also is a colorectal surgeon and the senior associate dean for health informatics and data science at the University of Minnesota and chief health informatics and AI officer for Fairview Health Services, both in Minneapolis.

Finally, Khan Siddiqui, MD, a radiologist and serial entrepreneur who is the cofounder, chairman, and CEO of a company called HOPPR AI, will present the view from industry. HOPPR AI is a for-profit company focused on building AI apps for medical imaging. As a radiologist, Dr. Siddiqui represents a medical specialty that is thought to likely undergo sweeping change as AI is incorporated into image-reading and diagnosis. His comments will focus on professional insights relevant to surgeons.

Their presentations will provide insights on general usage of AI at present, as well as predictions on what the landscape for AI in healthcare will look like in approximately 5 years. The session will include advice on what approaches to AI may be most effective for surgeons interested in ensuring positive outcomes and avoiding negative ones.

Additional information on AI usage pervades Clinical Congress 2025. In addition to various sessions that will comment on AI throughout the 4 days of the conference, various researchers will present studies that involve AI in their methods, starting presumptions, and/or potential applications to practice.

Access the Interactive Program Planner for more details about Clinical Congress 2025 sessions.



Source link

Continue Reading

Trending