< (From left) Professor Joon Hyuk Noh (Assistant Professor, Department of Artificial Intelligence, Ewha Womans University), Seojin Hwan, Yoonki Cho (Ph.D. Candidate), Professor Sung-Eui Yoon (School of Computing, KAIST) >
When faced with a complex question like 'What object disappeared while the camera was pointing elsewhere?', a common problem is that AI often relies on language patterns to guess a 'plausible answer,' instead of actually observing the real situation in the video. To overcome this limitation, our university's research team developed a technology that enables the AI to autonomously identify the 'exact critical moment (Trigger moment)' within the video, and the team’s excellence was proven by winning an international AI competition with this technology. The university announced on the 28th that the research team led by Professor Sung-Eui Yoon from the School of Computing, in collaboration with Professor Joon Hyuk Noh's team from Ewha Womans University, took 1st place in the Grounded Video Question Answering track of the Perception Test Challenge held at ICCV 2025, a world-renowned computer vision conference. The Perception Test Challenge held at ICCV 2025 was organized by Google DeepMind with a total prize pool of 50,000 Euros (approximately 83 million KRW). It assesses the cognitive and reasoning abilities of multimodal AI, which must comprehensively understand various data, including video, audio, and text. Crucially, the core evaluation factor is the ability to make judgments based on actual video evidence, moving beyond language-centric bias. Unlike conventional methods that analyze the entire video indiscriminately, our university's research team developed a new technology that instructs the AI to first locate the core scene (Trigger moment) essential for finding the correct answer. Simply put, this technology is designed to make the AI autonomously determine: “This scene is decisive for answering this question!” The research team calls this framework CORTEX (Chain-of-Reasoning for Trigger Moment Extraction). The research team's system consists of a three-stage structure where three models performing different functions operate sequentially. First, the Reasoning AI (Gemini 2.5 Pro) reasons about which moment is required to answer the question and finds candidate Trigger moments. Next, the Object Location Finding Model (Grounding Model, Molmo-7B) accurately identifies the exact location (coordinates) of people, cars, and objects on the screen during the selected moment. Finally, the Tracking Model (SAM2) precisely tracks the movement of objects in the time frame before and after the selected scene, using that scene as a reference, thereby reducing errors. In short, the 'method of accurately pinpointing a key scene and tracking the evidence for the answer centered on that scene' significantly reduced problems like initial misjudgment or occlusion in the video. In the Grounded Video Question Answering (Grounded VideoQA) track, which saw 23 participating teams, the KAIST team SGVR Lab (Scalable Graphics, Vision & Robotics Lab) recorded 0.4968 points in the HOTA (Higher Order Tracking Accuracy) metric, overwhelmingly surpassing the 2nd place score of 0.4304 from Columbia University, USA, to secure 1st place. This achievement is nearly double the previous year's winning score of 0.2704 points. This technology has wide-ranging applications in real-life settings. Autonomous driving vehicles can accurately identify moments of potential accident risk, robots can understand the surrounding environment smarter, security and surveillance systems can rapidly locate critical scenes, and media analysis can precisely track the actions of people or objects in chronological order. This is a core technology that enables AI to judge based on "actual evidence in the video." The ability to accurately pinpoint how objects behave over time in a video is expected to greatly expand the application of AI in real-world scenarios in the future.
< Pipeline image of the grounding framework for video question answering proposed by the research team >
This research was presented on October 19th at ICCV 2025, the 3rd Perception Test Challenge conference. The achievement was supported by the Ministry of Science and ICT's Basic Research Program (Mid-Career Researcher), the SW Star Lab Project's 'Development of Perception, Action, and Interaction Algorithms for Open-World Robot Services,' and the AGI Project's 'Reality Construction and Bi-directional Capability Approach based on Cognitive Agents for Embodied AGI' tasks."
<(From Left) Professor Inkyu Park, Dr. Seokjoo Cho, (Upper Right, From Left) Professor Ji-Hwan Ha, Researcher Junho Jeong , Professor Wei Gao> “Diabetic ulcers,” which occur in patients with diabetes, are dangerous complications that can lead to amputation if the treatment window is missed. A joint research team has developed a “smart dressing patch” that can monitor wound conditions in real time. KAIST (President Kwang Hyung Lee) announced on the 14th of May th
2026-05-14KAIST announced that it will host the ‘AI Agent-Based Solopreneurship Program Information Session’ and the ‘Entrepreneurial Mutual Growth Fair 2026’ for two days from May 18th to 19th. In this event, KAIST’s new AI-based solopreneurship model, which utilizes AI not merely as an operational tool but as a ‘Co-founder,’ will be introduced in depth. The university will hold an information session for the ‘AI Solopreneur Support Project,’ whic
2026-05-13< Science Diplomacy Forum Poster > KAIST announced that it has officially launched the ‘KAIST Center for Science Diplomacy (KCSD),’ connecting science and technology with diplomacy, and is holding a global forum on May 13th to commemorate the occasion. Through the Center for Science Diplomacy, our university plans to promote the securing of technological sovereignty and the strengthening of global cooperation, contributing to the resolution of common human challenges such
2026-05-13< (From left) Ph.D. student Hanbin Cho, Postdoctoral Researcher Wenxuan Zhu, Professor Joonki Suh, and MS-PhD integrated student Changhwan Kim > A technology that surpasses the limitations of existing sensors, which failed to distinguish between water and asphalt on dark roads, has emerged to enhance the accuracy of autonomous driving and medical diagnostics. Our university's research team has developed a next-generation polarization sensor that can read the "direction" of light and ch
2026-05-12<(From Left) Prof. Jungwon Kim, Dr. Changmin Ahn> Researchers at KAIST have demonstrated a chip-scale photonic approach for generating ultralow-noise and highly stable microwave and millimeter-wave signals based on optical frequency combs (microcombs), offering a potential pathway toward compact, high-performance frequency sources for next-generation technologies. High-frequency signals in the tens to hundreds of gigahertz range are essential for emerging applications such as 6G commun
2026-05-11