
<(From Left) M.S candidate Soyoung Choi, Ph.D candidate Seong-Hyeon Hwang, Professor Steven Euijong Whang>
Just as human eyes tend to focus on pictures before reading accompanying text, multimodal artificial intelligence (AI)—which processes multiple types of sensory data at once—also tends to depend more heavily on certain types of data. KAIST researchers have now developed a new multimodal AI training technology that enables models to recognize both text and images evenly, enabling far more accurate predictions.
KAIST (President Kwang Hyung Lee) announced on the 14th that a research team led by Professor Steven Euijong Whang from the School of Electrical Engineering has developed a novel data augmentation method that enables multimodal AI systems—those that must process multiple data types simultaneously—to make balanced use of all input data.
Multimodal AI combines various forms of information, such as text and video, to make judgments. However, AI models often show a tendency to rely excessively on one particular type of data, resulting in degraded prediction performance.
To solve this problem, the research team deliberately trained AI models using mismatched or incongruent data pairs. By doing so, the model learned to rely on all modalities—text, images, and even audio—in a balanced way, regardless of context.
The team further improved performance stability by incorporating a training strategy that compensates for low-quality data while emphasizing more challenging examples. The method is not tied to any specific model architecture and can be easily applied to various data types, making it highly scalable and practical.

<Model Prediction Changes with a Data-Centric Multimodal AI Training Framework>

Professor Steven Euijong Whang explained, “Improving AI performance is not just about changing model architectures or algorithms—it’s much more important how we design and use the data for training.” He continued, “This research demonstrates that designing and refining the data itself can be an effective approach to help multimodal AI utilize information more evenly, without becoming biased toward a specific modality such as images or text.”
The study was co-led by doctoral student Seong-Hyeon Hwang and master’s student Soyoung Choi, with Professor Steven Euijong Whang serving as the corresponding author. The results will be presented at NeurIPS 2025 (Conference on Neural Information Processing Systems), the world’s premier conference in the field of AI, which will be held this December in San Diego, USA, and Mexico City, Mexico.
※ Paper title: “MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning,” Original paper: https://arxiv.org/pdf/2509.25831
The research was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under the projects “Robust, Fair, and Scalable Data-Centric Continual Learning” (RS-2022-II220157) and “AI Technology for Non-Invasive Near-Infrared-Based Diagnosis and Treatment of Brain Disorders” (RS-2024-00444862).
KAIST announced that it will host the ‘AI Agent-Based Solopreneurship Program Information Session’ and the ‘Entrepreneurial Mutual Growth Fair 2026’ for two days from May 18th to 19th. In this event, KAIST’s new AI-based solopreneurship model, which utilizes AI not merely as an operational tool but as a ‘Co-founder,’ will be introduced in depth. The university will hold an information session for the ‘AI Solopreneur Support Project,’ whic
2026-05-13< Professor Yiyun Kang (Photo Credit: Ryan Lash / TED) > KAIST announced on April 17th that Professor Yiyun Kang of the Department of Industrial Design has been selected as a speaker for the Main Stage at TED 2026, the world-renowned knowledge conference. Founded in 1984 under the motto "Ideas Worth Spreading," TED is an American non-profit knowledge platform where scholars, innovators, and artists from around the globe gather annually to lead global discourse. Previous Korean speakers
2026-04-18< (From left) Undergraduate researcher Taewon Kim and Professor Sangsik Kim > A new technology has been developed that allows light to be "designed" into desired forms, potentially making Artificial Intelligence (AI) and communication technologies faster and more accurate. A KAIST research team has developed an "integrated photonic resonator"—a core component of next-generation optical integrated circuits that process data using light. The research is particularly significant as i
2026-04-16<(From left) Photos of the KAIST Science Festival exhibition hall and booths from the previous year> KAIST announced on April 10th that KAIST will participate in the ‘2026 Korea Science and Technology Festival,’ the largest science festival in the country, to mark Science Month in April. KAIST will operate ‘KAIST Play World,’ an interactive exhibition hall showcasing the pinnacle of AI and robotics. This year’s festival will be held in two parts: ‘20
2026-04-13< (From left) Professor Gyu Rie Lee, Professor David Baker > Under the foundation of research cooperation established through the Ministry of Science and ICT's InnoCORE (InnoCORE) project, KAIST InnoCORE researchers have derived meaningful research results. Following a visit by Professor David Baker (University of Washington, USA), the 2024 Nobel Laureate in Chemistry, KAIST has revealed research findings on designing proteins that accurately recognize desired compounds using AI through
2026-04-09