
< (From bottom left) KAIST Ph.D. Candidate Yoonho Lee, Integrated M.S./Ph.D. Candidate Sein Kim, Ph.D. Candidate Sungwon Kim, Ph.D. Candidate Junseok Lee, Ph.D. Candidate Yunhak Oh, (From top right) Ph.D. Candidate Namkyeong Lee, UNC Chapel Hill Ph.D. Candidate Sukwon Yun, Emory University Professor Carl Yang, KAIST Professor Chanyoung Park >
Federated Learning was devised to solve the problem of difficulty in aggregating personal data, such as patient medical records or financial data, in one place. However, during the process where each institution optimizes the collaboratively trained AI to suit its own environment, a limitation arose: the AI became overly adapted to the specific institution's data, making it vulnerable to new data. Our university research team has presented a solution to this problem and confirmed its stable performance not only in security-critical fields like hospitals and banks but also in rapidly changing environments such as social media and online shopping.
KAIST announced on October 15th that the research team led by Professor Chanyoung Park of the Department of Industrial and Systems Engineering has developed a new learning method that fundamentally solves the chronic performance degradation problem of Federated Learning, significantly enhancing the Generalization performance of AI models.
Federated Learning is a method that allows multiple institutions to jointly train an AI without directly exchanging data. However, a problem occurs when each institution fine-tunes the resulting joint AI model to its local setting. This is because the broad knowledge acquired earlier is diluted, leading to a Local Overfitting problem where the AI becomes excessively adapted only to the data characteristics of a specific institution.
For example, if several banks jointly build a 'Collaborative Loan Review AI,' and one specific bank performs fine-tuning focusing on corporate customer data, that bank's AI becomes strong in corporate reviews but suffers from local overfitting, leading to degraded performance in reviewing individual or startup customers.
Professor Park's team introduced the Synthetic Data method to solve this. They extracted only the core and representative features from each institution's data to generate virtual data that does not contain personal information and applied this during the fine-tuning process. As a result, each institution's AI can strengthen its expertise according to its own data without sharing personal information, while maintaining the broad perspective (generalization performance) gained through collaborative learning.

<Figure 1. Federated Learning is a distributed learning method where multiple institutions collaboratively train a joint Artificial Intelligence model without directly sharing their data. Each institution trains its individual AI model using its local data (Institution 1, 2, 3 Data). Afterward, only the trained model information, not the original data, is securely aggregated to a central server to construct a high-performing 'Joint AI Model.' This method allows for the effect of training with diverse data while protecting the privacy of sensitive information>

< Figure 2. The Local Overfitting problem occurs during the process of fine-tuning the 'Joint AI Model' built through Federated Learning with each institution's data. For example, Institution 3 can fine-tune the joint AI with its own data (Type 0, 2) to create an expert AI for those types, but in the process, it forgets the knowledge about data (Type 1) that other institutions had (Information Loss). In this way, each institution's AI becomes optimized only for its own data, gradually losing the ability (generalization performance) to solve other types of problems that were obtained through collaboration. >
The research results showed that this method is particularly effective in fields where data security is crucial, such as healthcare and finance, and also demonstrated stable performance in environments where new users and products are continuously added, like social media and e-commerce. It proved that the AI could maintain stable performance without confusion even if a new institution joins the collaboration or data characteristics change rapidly.

< Figure 3. The technology proposed by the research team solves the local overfitting problem by utilizing Synthetic Data. When each institution fine-tunes its AI with its own data, it simultaneously trains with 'Global Synthetic Data' created from the data of other institutions. This synthetic data acts as a kind of 'Vaccine' to prevent the AI from forgetting information not present in the local data (e.g., Type 2 in the image), helping the AI to gain expertise on specific data while retaining a broad view (generalization performance) to handle other types of data. >
Professor Chanyoung Park of the Department of Industrial and Systems Engineering said, "This research opens a new path to simultaneously ensure both expertise and versatility for each institution's AI while protecting data privacy," and "It will be a great help in fields where data collaboration is essential but security is important, such as medical AI and financial fraud detection AI."
The research was primarily authored by Graduate School of Data Science student Sungwon Kim and co-authored by Professor Chanyoung Park as the corresponding author. It was recognized for its excellence by being selected for an Oral Presentation, which is reserved for the top 1.8% of outstanding papers, at the International Conference on Learning Representations (ICLR) 2025, a top-tier academic conference in the field of Artificial Intelligence held in Singapore last April.
※ Paper Title: Subgraph Federated Learning for Local Generalization, https://doi.org/10.48550/arXiv.2503.03995
Meanwhile, this research is a result of projects supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) — the 'Robust, Fair, and Scalable Data-Centric Continual Learning' project, the National Research Foundation of Korea (NRF) — the 'Graph Foundation Model: Graph-based Machine Learning Applicable to Various Modalities and Domains' project, and the 'Data Science Convergence Talent Fostering Program.'
< Group photo of the KAIST-MIT Quantum Information Winter School > “Through the KAIST-MIT Quantum Information Winter School, I was able to view research from a broader perspective. The experience of collaborating with students from various universities and majors to complete a project was very refreshing,” said Jun-hyeong Cho, a student at the KAIST School of Electrical Engineering. KAIST announced on the 16th that the Graduate School of Quantum Science and Technology suc
2026-01-16<Jae-Chul Kim, Honorary Chairman of Dongwon Group> "In the era of AI, a new future lies within the sea of data. I ask that KAIST leaps forward to become the world's No. 1 AI research group." — Jae-Chul Kim, Honorary Chairman of Dongwon Group KAIST announced on January 16th that Honorary Chairman Jae-Chul Kim of Dongwon Group has pledged an additional 5.9 billion KRW in development funds to foster Artificial Intelligence (AI) talent and strengthen research infrastructure,
2026-01-16<(From Left) Distinguisehd Professor Sang Yup Lee, Dr. Gi Bae Kim, Professor Bernhard O. Palsson> “We know the genes, but not their functions.” To resolve this long-standing bottleneck in microbial research, a joint research team has proposed a cutting-edge research strategy that leverages Artificial Intelligence (AI) to drastically accelerate the discovery of microbial gene functions. KAIST announced on January 12th that a research team led by Distinguished Professor Sang
2026-01-12<Dr. Jung Won Park, (Upper Right) Professor Jeong Ho Lee, Professor Seok-Gu Kang> IDH-mutant glioma, caused by abnormalities in a specific gene (IDH), is the most common malignant brain tumor among young adults under the age of 50. It is a refractory brain cancer that is difficult to treat due to its high recurrence rate. Until now, treatment has focused primarily on removing the visible tumor mass. However, a Korean research team has discovered for the first time that normal brain cell
2026-01-09