Supervisor of Master's Candidates
The Last Update Time: ..
Dr. Long Ying is currently a Master's Supervisor in the School of Computer Science and the School of Software Engineering, Nanjing University of Information Science & Technology, China. He is a member of CCF and a member of the CCF Technical Committee on Multimedia.
His research areas include [1] MLLMs based Streaming Video Understanding (Collaboration with the National Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA); commencing November 2025); [2] Multi-modal Representation Learning (including Universal Domain Adaptation, Generalized Category Discovery, and Composed Image Retrieval); [3] Vision Language Navigation.
He maintains long-term research collaboration with Prof. Shengsheng Qian and Prof. Junyu Gao at the National Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA). Joint research interests include MLLMs based Streaming Video Understanding and Multi-modal Representation Learning.
Prospective students with strong self-learning capabilities are welcome to contact Dr. Ying regarding Master's programs. Outstanding students may receive recommendations for doctoral studies under renowned professors.
Dr. Long Ying graduated with a Bachelor's degree in Bioengineering from the School of Life Sciences, Beijing Institute of Technology, ranking second in the major. He graduated in July 2015 with a Ph.D. in Pattern Recognition and Intelligent Systems from the Multimedia Computing Group (MMC) of the National Laboratory of Pattern Recognition in the Institute of Automation, Chinese Academy of Sciences.
From November 2015 to August 2018, he worked as an algorithm engineer firstly in Huawei Software Co., Ltd., and then in Huatai Securities Co., Ltd. In September 2018, he joined Nanjing University of Information Science & Technology.
In recent years, he has presided over one project supported by the National Natural Science Foundation of China and participated in several national scientific research projects. As the first author, he has published multiple papers in renowned international journals and international conferences.
Projects in recent years:
[1] Project supported by the National Natural Science Foundation of China: Knowledge-driven Interpretable Multimedia Social Event Association, Project No. 61902193, 2020.01-2022.12, 260,000 Yuan. (presided over)
Taught Courses:
Introduction to Artificial Intelligence, Machine Learning, Introduction to Compilers in School of Computer Science, Nanjing University of Information Science & Technology (NUIST)
Huatai Securities Co., Ltd. IT department Algorithm Engineer
Huawei Software Co., Ltd. Telecom Software Department Algorithm Engineer
PostalAddress :
Email :
Description of Research Group:
Streaming Video Understanding is an AI technology that empowers systems to analyze and comprehend video content in real-time and continuously, mimicking human perception. It enables computers to process frames instantaneously during playback while integrating historical context to form a coherent understanding.
This technology is primarily characterized by two key capabilities:
(1) Continuous Memory and Contextual Understanding
The model utilizes a "memory buffer" to store and compress critical information from historical video frames. As new frames arrive, the model synthesizes current visual input with past memories to construct a comprehensive and coherent interpretation of events.
(2) Active Perception and Response
Rather than passively awaiting instructions, the model actively monitors the video stream. It is capable of proactively issuing alerts or providing feedback upon detecting critical events or anomalous behaviors.
Research work focuses on multimodal feature representation learning via self-supervised, weakly-supervised, and semi-supervised paradigms—leveraging techniques such as self-distillation—as well as training-free approaches for multimodal pre-trained models. Aim to address challenges in Universal Domain Adaptation (Universal DA), Generalized Category Discovery, Composed Image Retrieval, and Medical Image Segmentation.
Enable an agent (typically a virtual robot or embodied agent) to perform navigation or relevant tasks in complex environments, based on human natural language instructions and visual perception, using perception and decision methods such as Multimodal Large Language Models and Reinforcement Learning.