南京信息工程大学主页平台管理系统 Long Ying--Home--Home

Long Ying

Supervisor of Master's Candidates

Gender:Male
Alma Mater:University of Chinese Academy of Sciences
Education Level:With Certificate of Graduation for Doctorate Study
Degree:Doctoral Degree in Engineering
School/Department:School of Computer and Software
Business Address:Room 1207, Information technology Building
Contact Information:lying@nuist.edu.cn

MOBILE Version

Click:Times

The Last Update Time: ..

Profile

Dr. Long Ying is currently a Master's Supervisor in the School of Computer Science and the School of Software Engineering, Nanjing University of Information Science & Technology, China. He is a member of CCF and a member of the CCF Technical Committee on Multimedia.

His research areas include [1] MLLMs based Streaming Video Understanding (Collaboration with the National Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA); commencing November 2025); [2] Multi-modal Representation Learning (including Universal Domain Adaptation, Generalized Category Discovery, and Composed Image Retrieval); [3] Vision Language Navigation.

He maintains long-term research collaboration with Prof. Shengsheng Qian and Prof. Junyu Gao at the National Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences (CASIA). Joint research interests include MLLMs based Streaming Video Understanding and Multi-modal Representation Learning.

Prospective students with strong self-learning capabilities are welcome to contact Dr. Ying regarding Master's programs. Outstanding students may receive recommendations for doctoral studies under renowned professors.

Dr. Long Ying graduated with a Bachelor's degree in Bioengineering from the School of Life Sciences, Beijing Institute of Technology, ranking second in the major. He graduated in July 2015 with a Ph.D. in Pattern Recognition and Intelligent Systems from the Multimedia Computing Group (MMC) of the National Laboratory of Pattern Recognition in the Institute of Automation, Chinese Academy of Sciences.

From November 2015 to August 2018, he worked as an algorithm engineer firstly in Huawei Software Co., Ltd., and then in Huatai Securities Co., Ltd. In September 2018, he joined Nanjing University of Information Science & Technology.

In recent years, he has presided over one project supported by the National Natural Science Foundation of China and participated in several national scientific research projects. As the first author, he has published multiple papers in renowned international journals and international conferences.

Projects in recent years:

[1] Project supported by the National Natural Science Foundation of China: Knowledge-driven Interpretable Multimedia Social Event Association, Project No. 61902193, 2020.01-2022.12, 260,000 Yuan. (presided over)

Taught Courses:

Introduction to Artificial Intelligence, Machine Learning, Introduction to Compilers in School of Computer Science, Nanjing University of Information Science & Technology (NUIST)

Educational Experience

No Content

Work Experience

2018.7 -- 2018.8
Huatai Securities Co., Ltd. IT department Algorithm Engineer
2015.11 -- 2018.1
Huawei Software Co., Ltd. Telecom Software Department Algorithm Engineer

Social Affiliations

No Content

Other Contact Information

PostalAddress :
Email :

Research Group

Name of Research Group: multimedia computing

Description of Research Group:

Research Focus

Streaming Video Understanding is an AI technology that empowers systems to analyze and comprehend video content in real-time and continuously, mimicking human perception. It enables computers to process frames instantaneously during playback while integrating historical context to form a coherent understanding.

This technology is primarily characterized by two key capabilities:

(1) Continuous Memory and Contextual Understanding

The model utilizes a "memory buffer" to store and compress critical information from historical video frames. As new frames arrive, the model synthesizes current visual input with past memories to construct a comprehensive and coherent interpretation of events.

(2) Active Perception and Response

Rather than passively awaiting instructions, the model actively monitors the video stream. It is capable of proactively issuing alerts or providing feedback upon detecting critical events or anomalous behaviors.
Research work focuses on multimodal feature representation learning via self-supervised, weakly-supervised, and semi-supervised paradigms—leveraging techniques such as self-distillation—as well as training-free approaches for multimodal pre-trained models. Aim to address challenges in Universal Domain Adaptation (Universal DA), Generalized Category Discovery, Composed Image Retrieval, and Medical Image Segmentation.
Enable an agent (typically a virtual robot or embodied agent) to perform navigation or relevant tasks in complex environments, based on human natural language instructions and visual perception, using perception and decision methods such as Multimodal Large Language Models and Reinforcement Learning.