AI Fixed 'Temporal Errors'... Enhancing Reliability in Medical and Legal Fields
<Ph.D candidate Soyeon Kim, (From Left)Jindong Wang (Microsoft; currently at the College of William & Mary), Xing Xie (Microsoft), and Steven Euijong Whang (Professor at KAIST)>
What if ChatGPT answered with the name of a minister from a year ago when asked, "Who was the minister inaugurated last month?" This is a prime example of the limitations of AI that fails to properly reflect the latest information. Our university’s research team has developed a new evaluation technology that automatically reflects changing real-world information while catching "temporal errors" that may appear correct on the surface. This is expected to drastically improve AI reliability.
KAIST announced on April14th that a research team led by Professor Steven Euijong Whang from the School of Electrical Engineering, in joint research with Microsoft Research, has developed a system that automatically evaluates and diagnoses the temporal reasoning capabilities of Large Language Models (LLMs) using temporal database technology.
For AI to earn user trust, the ability to accurately understand real-world information that changes moment by moment is essential. However, existing evaluation methods only checked whether the answer matched or failed to sufficiently reflect complex temporal relationships, making it difficult to properly evaluate various question scenarios occurring in actual environments.
To solve this, the research team introduced "Temporal Database" design theory—which has been verified over the past 40 years—into AI evaluation for the first time. By utilizing the temporal flow and relational structure of data, the core of this technology is the automatic generation of 13 types of complex time-based problems from the database itself, without the need for humans to manually write evaluation questions.
<Schematic Diagram of the Evaluation Framework Proposed in This Study>
In particular, this technology is evaluated as a major innovation because it shifts from the traditional method where humans manually created problems to a method where evaluation questions are automatically generated based on data. Furthermore, by automating the entire process from problem generation to answer derivation and verification based on the database, the burden of maintenance can be drastically reduced without the need to manually modify questions as was previously required.
When real-world information changes, the evaluation questions, answers, and verification criteria are automatically updated simply by updating the corresponding content in the database. While the input of the latest information itself is handled by external data or administrators, this technology is structured to perform the overall evaluation automatically after such data is updated.
Additionally, moving beyond the existing method of simply judging whether the final answer is correct or incorrect, the research team introduced a new metric that verifies the logical validity of dates or periods presented during the answering process. Through this, they achieved a performance improvement in detecting "Temporal Hallucination" phenomena—where an answer appears correct but has the wrong temporal basis—by an average of 21.7% more accurately than before.
Applying this technology can significantly reduce evaluation maintenance costs since only the database needs to be updated when information changes, and it showed an effect of reducing the amount of input data by an average of 51% compared to previous methods.
<Future AI Evaluation System (AI-Generated Image)>
Professor Steven Euijong Whang stated, "This research is an example showing that classical database design theory can play a crucial role in solving the reliability issues of the latest AI. By converting vast amounts of professional data into evaluation resources, we expect this to become a practical foundation for verifying AI performance in various fields such as medicine and law in the future."
Soyeon Kim, a PhD student at KAIST, participated as the lead author of this study, and Jindong Wang (Microsoft Research, currently at William & Mary) and Xing Xie (Microsoft Research) participated as co-authors. The research results will be presented this April at ICLR 2026, the most prestigious academic conference in the field of artificial intelligence.
Paper Title: Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in Large Language Models
Paper Link: https://arxiv.org/abs/2508.02045
Meanwhile, this research was conducted with support from Microsoft Research, the National Research Foundation of Korea, and the Institute for Information & Communications Technology Planning & Evaluation (IITP) Global AI Frontier Lab projects (RS-2024-00469482, RS-2024-00509258).
KAIST Develops Multimodal AI That Understands Text and Images Like Humans
<(From Left) M.S candidate Soyoung Choi, Ph.D candidate Seong-Hyeon Hwang, Professor Steven Euijong Whang>
Just as human eyes tend to focus on pictures before reading accompanying text, multimodal artificial intelligence (AI)—which processes multiple types of sensory data at once—also tends to depend more heavily on certain types of data. KAIST researchers have now developed a new multimodal AI training technology that enables models to recognize both text and images evenly, enabling far more accurate predictions.
KAIST (President Kwang Hyung Lee) announced on the 14th that a research team led by Professor Steven Euijong Whang from the School of Electrical Engineering has developed a novel data augmentation method that enables multimodal AI systems—those that must process multiple data types simultaneously—to make balanced use of all input data.
Multimodal AI combines various forms of information, such as text and video, to make judgments. However, AI models often show a tendency to rely excessively on one particular type of data, resulting in degraded prediction performance.
To solve this problem, the research team deliberately trained AI models using mismatched or incongruent data pairs. By doing so, the model learned to rely on all modalities—text, images, and even audio—in a balanced way, regardless of context.
The team further improved performance stability by incorporating a training strategy that compensates for low-quality data while emphasizing more challenging examples. The method is not tied to any specific model architecture and can be easily applied to various data types, making it highly scalable and practical.
<Model Prediction Changes with a Data-Centric Multimodal AI Training Framework>
Professor Steven Euijong Whang explained, “Improving AI performance is not just about changing model architectures or algorithms—it’s much more important how we design and use the data for training.” He continued, “This research demonstrates that designing and refining the data itself can be an effective approach to help multimodal AI utilize information more evenly, without becoming biased toward a specific modality such as images or text.”
The study was co-led by doctoral student Seong-Hyeon Hwang and master’s student Soyoung Choi, with Professor Steven Euijong Whang serving as the corresponding author. The results will be presented at NeurIPS 2025 (Conference on Neural Information Processing Systems), the world’s premier conference in the field of AI, which will be held this December in San Diego, USA, and Mexico City, Mexico.
※ Paper title: “MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning,” Original paper: https://arxiv.org/pdf/2509.25831
The research was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under the projects “Robust, Fair, and Scalable Data-Centric Continual Learning” (RS-2022-II220157) and “AI Technology for Non-Invasive Near-Infrared-Based Diagnosis and Treatment of Brain Disorders” (RS-2024-00444862).
Yuji Roh Awarded 2022 Microsoft Research PhD Fellowship
KAIST PhD candidate Yuji Roh of the School of Electrical Engineering (advisor: Prof. Steven Euijong Whang) was selected as a recipient of the 2022 Microsoft Research PhD Fellowship.
< KAIST PhD candidate Yuji Roh (advisor: Prof. Steven Euijong Whang) >
The Microsoft Research PhD Fellowship is a scholarship program that recognizes outstanding graduate students for their exceptional and innovative research in areas relevant to computer science and related fields. This year, 36 people from around the world received the fellowship, and Yuji Roh from KAIST EE is the only recipient from universities in Korea. Each selected fellow will receive a $10,000 scholarship and an opportunity to intern at Microsoft under the guidance of an experienced researcher.
Yuji Roh was named a fellow in the field of “Machine Learning” for her outstanding achievements in Trustworthy AI. Her research highlights include designing a state-of-the-art fair training framework using batch selection and developing novel algorithms for both fair and robust training. Her works have been presented at the top machine learning conferences ICML, ICLR, and NeurIPS among others. She also co-presented a tutorial on Trustworthy AI at the top data mining conference ACM SIGKDD. She is currently interning at the NVIDIA Research AI Algorithms Group developing large-scale real-world fair AI frameworks.
The list of fellowship recipients and the interview videos are displayed on the Microsoft webpage and Youtube.
The list of recipients: https://www.microsoft.com/en-us/research/academic-program/phd-fellowship/2022-recipients/
Interview (Global): https://www.youtube.com/watch?v=T4Q-XwOOoJc
Interview (Asia): https://www.youtube.com/watch?v=qwq3R1XU8UE
[Highlighted research achievements by Yuji Roh: Fair batch selection framework]
[Highlighted research achievements by Yuji Roh: Fair and robust training framework]