< (From left) Professor Joon Hyuk Noh (Assistant Professor, Department of Artificial Intelligence, Ewha Womans University), Seojin Hwan, Yoonki Cho (Ph.D. Candidate), Professor Sung-Eui Yoon (School of Computing, KAIST) >
When faced with a complex question like 'What object disappeared while the camera was pointing elsewhere?', a common problem is that AI often relies on language patterns to guess a 'plausible answer,' instead of actually observing the real situation in the video. To overcome this limitation, our university's research team developed a technology that enables the AI to autonomously identify the 'exact critical moment (Trigger moment)' within the video, and the team’s excellence was proven by winning an international AI competition with this technology. The university announced on the 28th that the research team led by Professor Sung-Eui Yoon from the School of Computing, in collaboration with Professor Joon Hyuk Noh's team from Ewha Womans University, took 1st place in the Grounded Video Question Answering track of the Perception Test Challenge held at ICCV 2025, a world-renowned computer vision conference. The Perception Test Challenge held at ICCV 2025 was organized by Google DeepMind with a total prize pool of 50,000 Euros (approximately 83 million KRW). It assesses the cognitive and reasoning abilities of multimodal AI, which must comprehensively understand various data, including video, audio, and text. Crucially, the core evaluation factor is the ability to make judgments based on actual video evidence, moving beyond language-centric bias. Unlike conventional methods that analyze the entire video indiscriminately, our university's research team developed a new technology that instructs the AI to first locate the core scene (Trigger moment) essential for finding the correct answer. Simply put, this technology is designed to make the AI autonomously determine: “This scene is decisive for answering this question!” The research team calls this framework CORTEX (Chain-of-Reasoning for Trigger Moment Extraction). The research team's system consists of a three-stage structure where three models performing different functions operate sequentially. First, the Reasoning AI (Gemini 2.5 Pro) reasons about which moment is required to answer the question and finds candidate Trigger moments. Next, the Object Location Finding Model (Grounding Model, Molmo-7B) accurately identifies the exact location (coordinates) of people, cars, and objects on the screen during the selected moment. Finally, the Tracking Model (SAM2) precisely tracks the movement of objects in the time frame before and after the selected scene, using that scene as a reference, thereby reducing errors. In short, the 'method of accurately pinpointing a key scene and tracking the evidence for the answer centered on that scene' significantly reduced problems like initial misjudgment or occlusion in the video. In the Grounded Video Question Answering (Grounded VideoQA) track, which saw 23 participating teams, the KAIST team SGVR Lab (Scalable Graphics, Vision & Robotics Lab) recorded 0.4968 points in the HOTA (Higher Order Tracking Accuracy) metric, overwhelmingly surpassing the 2nd place score of 0.4304 from Columbia University, USA, to secure 1st place. This achievement is nearly double the previous year's winning score of 0.2704 points. This technology has wide-ranging applications in real-life settings. Autonomous driving vehicles can accurately identify moments of potential accident risk, robots can understand the surrounding environment smarter, security and surveillance systems can rapidly locate critical scenes, and media analysis can precisely track the actions of people or objects in chronological order. This is a core technology that enables AI to judge based on "actual evidence in the video." The ability to accurately pinpoint how objects behave over time in a video is expected to greatly expand the application of AI in real-world scenarios in the future.
< Pipeline image of the grounding framework for video question answering proposed by the research team >
This research was presented on October 19th at ICCV 2025, the 3rd Perception Test Challenge conference. The achievement was supported by the Ministry of Science and ICT's Basic Research Program (Mid-Career Researcher), the SW Star Lab Project's 'Development of Perception, Action, and Interaction Algorithms for Open-World Robot Services,' and the AGI Project's 'Reality Construction and Bi-directional Capability Approach based on Cognitive Agents for Embodied AGI' tasks."
<Group Photo> A new initiative is officially underway to build a next-generation medical innovation platform. By combining Artificial Intelligence (AI) and advanced biotechnology, this project aims to overcome the "Death Valley" of the pre-clinical stage—a long-standing hurdle in drug discovery—and accurately predict human responses to replace animal testing. KAIST announced on June 17 that it held the opening ceremony for the "KAIST-FORMOSA BIO R&D CENTER" at the KAIST
2026-06-17<CVPR 2026 poster session. From left to right: Minseok Seo (KAIST, first author), Mark Hamilton (MIT and Microsoft, second author), and Prof. Changick Kim (KAIST, corresponding author)> From facial recognition on smartphones to humanoid robots, computer vision technology, which serves as the eyes of artificial intelligence (AI), is widely utilized in our daily lives. A joint research team from KAIST and international institutions has developed a technology that allows AI to see the wo
2026-06-17< Poster of STARTUP NATION KOREA 2026 > KAIST announced on June 16 that it will co-host 'STARTUP NATION KOREA 2026' (2026 Innovation Entrepreneurship Nation Korea International Forum) with Seoul National University and The JoongAng from June 17 to 18 at the Haedong Advanced Engineering Building on Seoul National University's Gwanak Campus. Celebrating its 5th anniversary this year, the forum serves as a platform to overcome the so-called 'R&D Paradox'—where outstanding re
2026-06-16<(From Left) Professor Sung Jin Kim, Professor Ikjin Lee, Dr. Yong Jin Lee, Ph.D candidate Hansol Lee, Ph.D candidate ChulHyun Hwang> AI data centers are often described as “power-hungry giants.” Not only do artificial intelligence computations consume enormous amounts of electricity, but a significant amount of energy is also required to cool the semiconductor chips that heat up during operation. As AI chips continue to deliver higher performance, the amount of heat they
2026-06-16<Professor S. Josephine Suh> Professor S. Josephine Suh wins the Frontiers of Science Award for the second consecutive year following last year - Honored for her paper published in November 2017, targeting research papers that have achieved significant results within the last 10 years - Recognized internationally for leading research achievements in the fields of quantum gravity and quantum field theory KAIST announced on June 12th that a co-authored research paper by Professor S. Jose
2026-06-15