
< (From left) KAIST School of Electrical Engineering: Dr. Jinwoo Park, M.S candidate Seunggeun Cho, and Professor Dongsu Han >
Until now, AI services based on Large Language Models (LLMs) have mostly relied on expensive data center GPUs. This has resulted in high operational costs and created a significant barrier to entry for utilizing AI technology. A research team at KAIST has developed a technology that reduces reliance on expensive data center GPUs by utilizing affordable, everyday GPUs to provide AI services at a much lower cost.
On December 28th, KAIST announced that a research team led by Professor Dongsu Han from the School of Electrical Engineering developed 'SpecEdge,' a new technology that significantly lowers LLM infrastructure costs by utilizing affordable, consumer-grade GPUs widely available outside of data centers.
SpecEdge is a system where data center GPUs and "edge GPUs"—found in personal PCs or small servers—collaborate to form an LLM inference infrastructure. By applying this technology, the team successfully reduced the cost per token (the smallest unit of text generated by AI) by approximately 67.6% compared to methods using only data center GPUs.
To achieve this, the research team utilized a method called 'Speculative Decoding.' In this process, a small language model placed on the edge GPU quickly generates a high-probability token sequence (a series of words or word fragments). Then, the large-scale language model in the data center verifies this sequence in batches. During this process, the edge GPU continues to generate words without waiting for the server's response, simultaneously increasing LLM inference speed and infrastructure efficiency.

< Figure 1. Language data flow diagram of the developed SpecEdge >

< Figure 2. Detailed computation time reduction method of SpecEdge >

< Figure 3. Illustration of efficient batching of verification requests from multiple edge GPUs on the server GPU within SpecEdge >
Compared to performing speculative decoding solely on data center GPUs, SpecEdge improved cost efficiency by 1.91 times and server throughput by 2.22 times. Notably, the technology was confirmed to work seamlessly even under standard internet speeds, meaning it can be immediately applied to real-world services without requiring a specialized network environment.
Furthermore, the server is designed to efficiently process verification requests from multiple edge GPUs, allowing it to handle more simultaneous requests without GPU idle time. This has realized an LLM serving infrastructure structure that utilizes data center resources more effectively.
This research presents a new possibility for distributing LLM computations—which were previously concentrated in data centers—to the edge, thereby reducing infrastructure costs and increasing accessibility. In the future, as this expands to various edge devices such as smartphones, personal computers, and Neural Processing Units (NPUs), high-quality AI services are expected to become available to a broader range of users.

< Figure 4. Conceptual comparison of the developed SpecEdge vs. conventional methods >
Professor Dongsu Han, who led the research, stated, "Our goal is to utilize edge resources around the user, beyond the data center, as part of the LLM infrastructure. Through this, we aim to lower AI service costs and create an environment where anyone can utilize high-quality AI."
Dr. Jinwoo Park and M.S candidate Seunggeun Cho from KAIST participated in this study. The research results were presented as a 'Spotlight' (top 3.2% of papers, with a 24.52% acceptance rate) at the NeurIPS (Neural Information Processing Systems) conference, the world's most prestigious academic conference in the field of AI, held in San Diego from December 2nd to 7th.
This research was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the project 'Development of 6G System Technology to Support AI-Native Application Services.'
KAIST announced on May 22nd that the entire faculty of the Graduate School of AI welcomes South Korea's hosting of the 'Global AI Hub.' The faculty determined that hosting this will serve as a crucial momentum builder for South Korea to earnestly contribute to international cooperation and the responsible use of technology in the artificial intelligence (AI) era. In a joint statement, the faculty of the KAIST Graduate School of AI expressed, "Hosting the Global AI Hub goes beyond simply attr
2026-05-22KAIST announced that it will host the ‘AI Agent-Based Solopreneurship Program Information Session’ and the ‘Entrepreneurial Mutual Growth Fair 2026’ for two days from May 18th to 19th. In this event, KAIST’s new AI-based solopreneurship model, which utilizes AI not merely as an operational tool but as a ‘Co-founder,’ will be introduced in depth. The university will hold an information session for the ‘AI Solopreneur Support Project,’ whic
2026-05-13< (From left) KAIST Professor Hoi-Jun Yoo and PhD candidate Seongyon Hong > While Large Language Models (LLMs) like ChatGPT are adept at answering countless questions, they often remain unaware of a user's minor habits or previous conversational contexts. This is why AI, despite being deeply integrated into our daily lives, can still feel like a "stranger." Overcoming these limitations, researchers at KAIST have developed the world’s first AI semiconductor, dubbed "SoulMate," wh
2026-03-17< Photo of the Donation Agreement Ceremony > KAIST announced on March 11th that Inseo Chung (28), an undergraduate student in the School of Interdisciplinary Studies and CEO of the global music-tech startup MPAG, donated 1 billion won in development funds on the 10th to foster ‘Inclusive AI’ talent. Inclusive AI talent refers to experts who research and develop AI technologies so that the socially vulnerable, including people with disabilities and the technologically margi
2026-03-11< KAIST Professor Kyung Ryul Park delivering a keynote speech > KAIST announced on February 9th that the KAIST-NYU AI and Digital Governance Summit, co-hosted with New York University (NYU), was held at NYU in New York from February 6 to 7 (local time). Amid the rapidly expanding impact of Artificial Intelligence (AI) across society, this summit was designed to combine private consensus meetings with public discussions to seek practical AI governance solutions that harmonize technolog
2026-02-09