
Latest generative AI models such as OpenAI's ChatGPT-4 and Google's Gemini 2.5 require not only high memory bandwidth but also large memory capacity. This is why generative AI cloud operating companies like Microsoft and Google purchase hundreds of thousands of NVIDIA GPUs. As a solution to address the core challenges of building such high-performance AI infrastructure, Korean researchers have succeeded in developing an NPU (Neural Processing Unit)* core technology that improves the inference performance of generative AI models by an average of over 60% while consuming approximately 44% less power compared to the latest GPUs.
*NPU (Neural Processing Unit): An AI-specific semiconductor chip designed to rapidly process artificial neural networks.
On the 4th, Professor Jongse Park's research team from KAIST School of Computing, in collaboration with HyperAccel Inc. (a startup founded by Professor Joo-Young Kim from the School of Electrical Engineering), announced that they have developed a high-performance, low-power NPU (Neural Processing Unit) core technology specialized for generative AI clouds like ChatGPT.

The technology proposed by the research team has been accepted by the '2025 International Symposium on Computer Architecture (ISCA 2025)', a top-tier international conference in the field of computer architecture.
The key objective of this research is to improve the performance of large-scale generative AI services by lightweighting the inference process, while minimizing accuracy loss and solving memory bottleneck issues. This research is highly recognized for its integrated design of AI semiconductors and AI system software, which are key components of AI infrastructure.
While existing GPU-based AI infrastructure requires multiple GPU devices to meet high bandwidth and capacity demands, this technology enables the configuration of the same level of AI infrastructure using fewer NPU devices through KV cache quantization*. KV cache accounts for most of the memory usage, thereby its quantization significantly reduces the cost of building generative AI clouds.
*KV Cache (Key-Value Cache) Quantization: Refers to reducing the data size in a type of temporary storage space used to improve performance when operating generative AI models (e.g., converting a 16-bit number to a 4-bit number reduces data size by 1/4).
The research team designed it to be integrated with memory interfaces without changing the operational logic of existing NPU architectures. This hardware architecture not only implements the proposed quantization algorithm but also adopts page-level memory management techniques* for efficient utilization of limited memory bandwidth and capacity, and introduces new encoding technique optimized for quantized KV cache.
*Page-level memory management technique: Virtualizes memory addresses, as the CPU does, to allow consistent access within the NPU.
Furthermore, when building an NPU-based AI cloud with superior cost and power efficiency compared to the latest GPUs, the high-performance, low-power nature of NPUs is expected to significantly reduce operating costs.
Professor Jongse Park stated, "This research, through joint work with HyperAccel Inc., found a solution in generative AI inference lightweighting algorithms and succeeded in developing a core NPU technology that can solve the 'memory problem.' Through this technology, we implemented an NPU with over 60% improved performance compared to the latest GPUs by combining quantization techniques that reduce memory requirements while maintaining inference accuracy, and hardware designs optimized for this".

He further emphasized, "This technology has demonstrated the possibility of implementing high-performance, low-power infrastructure specialized for generative AI, and is expected to play a key role not only in AI cloud data centers but also in the AI transformation (AX) environment represented by dynamic, executable AI such as 'Agentic AI'."

This research was presented by Ph.D. student Minsu Kim and Dr. Seongmin Hong from HyperAccel Inc. as co-first authors at the '2025 International Symposium on Computer Architecture (ISCA)' held in Tokyo, Japan, from June 21 to June 25. ISCA, a globally renowned academic conference, received 570 paper submissions this year, with only 127 papers accepted (an acceptance rate of 22.7%).
※Paper Title: Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
※DOI: https://doi.org/10.1145/3695053.3731019
Meanwhile, this research was supported by the National Research Foundation of Korea's Excellent Young Researcher Program, the Institute for Information & Communications Technology Planning & Evaluation (IITP), and the AI Semiconductor Graduate School Support Project.
< (From left) Professor Jaesik Choi of the Kim Jaechul Graduate School of AI, Ph.D candidate Chanwoo Lee, Ph.D candidate Youngjin Park > The research team led by Professor Jaesik Choi of KAIST's Kim Jaechul Graduate School of AI, in collaboration with KakaoBank Corp, announced that they have developed an accelerated explanation technology that can explain the basis of an Artificial Intelligence (AI) model's judgment in real-time. This research achievement significantly increases the pr
2025-12-12< From left: Top Excellence Award winner Robolight (Pre-startup Founder Han-seol Choi), Top Excellence Award winner Coils (CEO Seong-ryeol Heo), Professor Jung Kim of KAIST, Grand Prize winner Noman (CEO Jung-wook Moon), Professor Kyoungchul Kong of KAIST, CEO Dae-hee Park of Daejeon Creative Economy Innovation Center, Excellence Award winner Gigaflops (CEO Min-tae Kim), Excellence Award winner BLUE APEX (Pre-startup Founder Na-hyeon Kwon) > KAIST announced on December 10th that KAIST
2025-12-10<(From Left) Ph.D candidate Geon Lee, Ph.D candidate Minyoung Choe, M.S candidate Jaewan Chun, Professor Kijung Shin, M.S candidate Seokbum Yoon> KAIST (President Kwang Hyung Lee) announced on the 9th of December that Professor Kijung Shin’s research team at the Kim Jaechul Graduate School of AI has developed a groundbreaking AI technology that predicts complex social group behavior by analyzing how individual attributes such as age and role influence group relationships. With th
2025-12-09<Photo of KAIST Students> KAIST announced on December 9th that it will accelerate the nurturing of world-class scientific talent and regional balanced development. This follows the government's recent announcement on 'Leaping to a Science and Technology Powerhouse, the Republic of Korea, Where People Dream of Becoming Science and Technology Professionals Again (Nov. 7),' which explicitly named the four major science and technology institutes, including KAIST, as AX (AI Transformation) i
2025-12-09<(From Left) Ph.D candidate Sungyoon Woo, Professor Il-Doo Kim, Professor Seung S.Lee, Ph.D candiate Jihwan Chae, Researcher Jiyeon Yu, (Upper Right) Dr. Yujang Cho> A KAIST research team has drawn attention by developing a new water-based air purification technology that combines “nano water droplets that capture dust” with a “nano sponge structure that autonomously draws up water,” enabling dust removal using nano water droplets without filters, self-supplied w
2025-12-08