Connect with us

Tools & Platforms

NPU core improves inference performance by over 60%

Published

on


Oaken’s quantization algorithm consisting of three components: (a) threshold-based online-offline hybrid quantization, (b) group-shift quantization, and (c) fused dense-and-sparse encoding. Credit: Proceedings of the 52nd Annual International Symposium on Computer Architecture (2025). DOI: 10.1145/3695053.3731019

The latest generative AI models such as OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 require not only high memory bandwidth but also large memory capacity. This is why generative AI cloud operating companies like Microsoft and Google purchase hundreds of thousands of NVIDIA GPUs.

As a solution to address the core challenges of building such high-performance AI infrastructure, Korean researchers have succeeded in developing an NPU (neural processing unit) core technology that improves the inference performance of generative AI models by an average of more than 60% while consuming approximately 44% less power compared to the latest GPUs.

Professor Jongse Park’s research team from KAIST School of Computing, in collaboration with HyperAccel Inc., developed a high-performance, low-power NPU core technology specialized for generative AI clouds like ChatGPT.

The technology proposed by the research team was presented by Ph.D. student Minsu Kim and Dr. Seongmin Hong from HyperAccel Inc. as co-first authors at the 2025 International Symposium on Computer Architecture (ISCA 2025), held in Tokyo, June 21–25.

The key objective of this research is to improve the performance of large-scale generative AI services by light-weighting the inference process, while minimizing accuracy loss and solving memory bottleneck issues. This research is highly recognized for its integrated design of AI semiconductors and AI system software, which are key components of AI infrastructure.

While existing GPU-based AI infrastructure requires multiple GPU devices to meet high bandwidth and capacity demands, this technology enables the configuration of the same level of AI infrastructure using fewer NPU devices through KV cache quantization. KV cache accounts for most of the memory usage, thereby its quantization significantly reduces the cost of building generative AI clouds.

Core neural processing unit technology to improve ChatGPT inference performance by over 60%
Overall Oaken accelerator architecture. Credit: Proceedings of the 52nd Annual International Symposium on Computer Architecture (2025). DOI: 10.1145/3695053.3731019

The research team designed it to be integrated with memory interfaces without changing the operational logic of existing NPU architectures. This hardware architecture not only implements the proposed quantization algorithm but also adopts page-level memory management techniques for efficient utilization of limited memory bandwidth and capacity, and introduces new encoding techniques optimized for quantized KV cache.

Furthermore, when building an NPU-based AI cloud with superior cost and power efficiency compared to the latest GPUs, the high-performance, low-power nature of NPUs is expected to significantly reduce operating costs.

Professor Jongse Park said, “This research, through joint work with HyperAccel Inc., found a solution in generative AI inference light-weighting algorithms and succeeded in developing a core NPU technology that can solve the memory problem. Through this technology, we implemented an NPU with over 60% improved performance compared to the latest GPUs by combining quantization techniques that reduce requirements while maintaining inference accuracy, and hardware designs optimized for this.

“This technology has demonstrated the possibility of implementing high-performance, low-power infrastructure specialized for generative AI, and is expected to play a key role not only in AI cloud data centers but also in the AI transformation (AX) environment represented by dynamic, executable AI such as agentic AI.”

More information:
Minsu Kim et al, Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization, Proceedings of the 52nd Annual International Symposium on Computer Architecture (2025). DOI: 10.1145/3695053.3731019

Citation:
AI cloud infrastructure gets faster and greener: NPU core improves inference performance by over 60% (2025, July 7)
retrieved 7 July 2025
from https://techxplore.com/news/2025-07-ai-cloud-infrastructure-faster-greener.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Tools & Platforms

How can we create a sustainable AI future?

Published

on


With innovation comes impact. The social media revolution changed how we share content, how we buy, sell and learn, but also raised questions around technology misuse, censorship and protection. Every time we take a step forward, we also need to tackle challenges, and AI is no different.

One of the major challenges for AI is its energy consumption. Together, datacenters and AI currently use between 1-2% of the world’s electricity, but this figure is rising fast.



Source link

Continue Reading

Tools & Platforms

Apple Silently Acquires Two AI Startups To Enhance Vision Pro Realism And Strengthen Apple Intelligence With Smarter, Safer, And More Privacy-Focused Technology

Published

on


Apple seems to be focused on boosting not only the work it has been doing on the Vision Pro headset but also in escalating its AI ambitions further by advancing its Apple Intelligence initiatives. To help with driving its efforts it seems to be resorting to a a technique of acquiring smaller firms time after time that would be solely focused on excelling in the technology. It seems to not be slowing down any time soon as it has recently acquired two more companies to help strengthen not only its talent pool but also with growing its innovation through the new technology stacks added up.

Apple has now bought two companies in to help it strengthen its next wave of innovation and advance in Apple Intelligence

MacGeneration was the one to uncover about Apple recently taking over two additional companies to continue with its low-profile strategy of growing Apple Intelligence by slowly building its talent and technology. One of the acquired companies is TrueMeeting, a startup with expertise in AI avatars and facial scanning. All the users need is an iPhone to scan their faces and then could see a hyper realistic version of themselves being created. While the official website has been taken down, but the technology company has seems to align with Apple’s ambitions regarding its Vision Pro and the attempts at an immersive experience.

TrueMeeting’s main expertise lies in the CommonGround Human AI that is meant to make virtual interactions feel more natural and human and can be integrated seamlessly with a wide range of applications. Although there has been no official comment on the acquisition by either of the parties but it looks like Apple has went ahead with it to further its development of Personas in the Apple Vision Pro headset, which are basically the lifelike digital avatars and refine its technology to improve on the spatial computing experience.

Apple additionally has also acquired WhyLabs, a firm focused on improving the reliability of these large language models (LLMs). It excels in dealings with issues such as bugs and AI hallucinations by helping developers with maintaining consistency and accuracy in the AI systems. Apple by taking over this company wants to not only advance further its Apple Intelligence but also ensure the tools are reliable and safe, which are the core values of the company and something direly needed to help integrate the models across varied platforms and ensure a consistent experience.

WhyLabs is not only focused on monitoring the performance of these models and ensuring reliability but also has expertise in providing safeguards for these systems to help combat misuse owing to security vulnerabilities. It is able to block any harmful output in these AI models and again aligns completely with Apple’s stance on privacy and user trust. This acquisition is especially vital with the growing expansion of Apple Intelligence capabilities across the ecosystem.

Apple seems to be doubling its efforts on the AI front and ensuring a more immersive experience without compromising on the the technology remaining safe and the systems acting responsibly.



Source link

Continue Reading

Tools & Platforms

IIT Delhi announces 6-month online executive programme focused on AI in Healthcare: Check details here

Published

on


The Indian Institute of Technology (IIT) Delhi, in partnership with TeamLease EdTech, has introduced a comprehensive online executive programme in Artificial Intelligence (AI) in Healthcare, specially designed for working professionals across diverse domains. Scheduled to begin on November 1, 2025, this programme seeks to bridge the gap between healthcare and technology by imparting industry-relevant AI skills to professionals, including doctors, engineers, data scientists, and med-tech entrepreneurs.Applications for the programme are currently open and will remain so until July 31, 2025. Interested professionals are encouraged to submit their applications through the official IIT Delhi CEP portal.This initiative is a part of IIT Delhi’s eVIDYA platform, developed under the Continuing Education Programme (CEP), and aims to foster applied learning through a blend of theoretical instruction and hands-on experience using real clinical datasets.This course offers a unique opportunity to upskill with one of India’s premier institutes and contribute meaningfully to the rapidly evolving field of AI-powered healthcare.

Programme overview

To help prospective applicants plan better, here is a quick summary of the programme’s key details:

Category
Details
Course duration November 1, 2025 – May 2, 2026
Class schedule Online and conducted over weekends
Programme fee ₹1,20,000 + 18% GST (Payable in two easy installments)
Application deadline July 31, 2025
Learning platform IIT Delhi Continuing Education Programme (CEP) portal

Who can benefit from this course?

The programme is tailored for a wide spectrum of professionals who are either involved in healthcare or aspire to work at the intersection of health and technology. You are an ideal candidate if you are:• A healthcare practitioner or clinician with limited or no background in coding or artificial intelligence, but curious to explore AI’s applications in medicine.• An engineer, data analyst, or academic researcher engaged in health-tech innovations or biomedical computing.• A med-tech entrepreneur or healthcare startup founder looking to incorporate AI-driven solutions into your business or products.

Curriculum overview

Participants will engage with a carefully curated curriculum that balances core concepts with real-world applications. Key modules include:• Introduction to AI, Machine Learning (ML), and Deep Learning (DL) concepts.• How AI is used to predict disease outcomes and assist in clinical decision-making.• Leveraging AI in population health management and epidemiology.• Application of AI for hospital automation and familiarity with global healthcare data standards like FHIR and DICOM.• Over 10 detailed case studies showcasing successful AI applications in hospitals and clinics.• A hands-on project with expert mentorship from faculty at IIT Delhi and clinicians from AIIMS, enabling learners to apply their knowledge to real clinical challenges.

Learning outcomes you can expect

By the end of this programme, participants will be equipped with the ability to:• Leverage AI technologies to enhance clinical workflows, automate processes, and support evidence-based decision making in healthcare.• Work effectively with diverse data sources such as Electronic Medical Records (EMRs), radiology images, genomics data, and Internet of Things (IoT)-based health devices.• Develop and deploy functional AI models tailored for practical use in hospitals, diagnostics, and public health infrastructure.• Earn a prestigious certification from IIT Delhi, enhancing your professional credentials in the health-tech domain.





Source link

Continue Reading

Trending