Supervisor of Master's Candidates
The Last Update Time: ..
Kao Zhang received the Ph.D degrees at Lab. of Intelligent Information Processing (IIP) from Wuhan University, Wuhan, China, in 2020, under the supervision of Prof.Zhenzhong Chen. Formerly, he finished the B.Eng. and M.Eng degrees at Computer Vision & Remote Sensing Lab (CVRS) in 2014 and 2016 respectively, under the guidance of Prof.Jian Yao. He is now a lecturer at Nanjing University of Information Science and Technology, Nanjing, China. He was a postdoctoral fellow at Wuhan University, working on visual saliency pridection, a researcher at Tencent, Shenzhen, working on video processing, and a visiting student at PERCEPT team of INRIA, Rennes, France working on UAV video saliency prediction. His current research interests include visual attention, image/video processing, remote sensing and metaverse.
We are looking for self-motivated undergraduate/graduate students. If you are interested in joining us, please feel free to contact me with your CV!
Main research fields:
Visual attention: Video/Image/RGBD/VR/UAV saliency prediction.
Remote sensing video analysis: Object detection, tracking and recognition in satellite/UAV videos.
Metaverse: Multimodal (text, image, video, and sound) Emotion analysis, Virtual Reality technology.
Dataset and benchmark: Visual saliency dataset, Object detection and tracking dataset.
1 Selected Publications:
1.1 Visual Saliency Modeling / Image Processing / Computer Vision
Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang. CCDM: Continuous conditional diffusion models for image generation[J]. IEEE Transactions on Multimedia. 2025.(SCI Q1, IF: 9.70)
Chuangxin Cai, Kao Zhang, Zhihua Hu, Xianxuan Lin, Zhigeng Pan. Prompt-based hybrid supervised contrastive learning for emotion recognition in conversation. Neurocomputing, 2025, 130453. (SCI Q1, IF: 6.50)
Pengyuan Quan, Zihao Mao, Nenglun Chen, Yang Zhang, Kao Zhang, Zhigeng Pan. Attentive Fusion for Efficient Wrist-worn Gesture Recognition based on Dual-view Cameras[J]. IEEE Sensors Journal, 2024. (SCI Q1, IF: 4.5)
Hao Cai*, Kao Zhang*, Zhao Chen, Chenxi Jiang, Zhenzhong Chen. Video saliency prediction for first-person view UAV videos: Dataset and benchmark[J]. Neurocomputing, 2024: 127876. (SCI Q1, IF: 6.50, co-first author)
Zhao Chen*, Kao Zhang*, Hao Cai, Xiaoying Ding, Chenxi Jiang, Zhenzhong Chen. Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks[J]. Journal of Visual Communication and Image Representation, 2024: 104095. (SCI Q2, IF: 3.1, co-first author)
Kao Zhang, Yan Shang, Songnan Li, Shan Liu, Zhenzhong Chen. SalCrop: Spatio-temporal Saliency Based Video Cropping[C]. IEEE VCIP, 2022. (Demo, Oral, Poster, EI)
Kao Zhang, Zhenzhong Chen, Shan Liu. A Spatial-temporal Recurrent Neural Network for Video Saliency Prediction[J]. IEEE Transactions on Image Processing, 2021, 30: 572-587. (SCI Q1, IF: 11.041)
Di Liu, Kao Zhang, Zhenzhong Chen. Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection[J]. IEEE Transactions on Multimedia, 2021, 23: 967-981. (SCI Q1, IF: 8.182)
Kao Zhang, Zhenzhong Chen. Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3544-3557. (SCI Q1, IF: 4.133)
Jing Ling, Kao Zhang, Yingxue Zhang, Daiqin Yang, Zhenzhong Chen. A saliency prediction model on 360 degree images using color dictionary based sparse representation[J]. Signal Processing: Image Communication, 2018, 69: 60-68. (SCI Q2, IF: 2.779)
1.2 Remote Sensing Information Processing
Zhihua Hu, Kao Zhang, Yuxuan Liu. Edge constrained DSM refinement based on shading from high resolution multi-view satellite images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025. (SCI Q1, IF: 5.3)
Zhihua Hu, Wanjie Lu, Kao Zhang, Helong Yang, Yaoyang Wang, Nannan Qin, Yuxuan Liu, Sisi Zlatanova. Accurate room layout estimation from multi-view panoramas with multi-label graph cut[J]. International Journal of Applied Earth Observation and Geoinformation, 2025. (SCI Q1, IF: 8.6)
Yang Li*, Kao Zhang*, Zhao Chen, Wanping Ouyang, Mingpeng Cui, Chenxi Jiang, Daiqin Yang and Zhenzhong Chen. Towards Object Tracking for Quadruped Robots[J]. Journal of Visual Communication and Image Representation, 2023, 97: 103958. (SCI Q2, IF: 2.6, 共同一作)
Kao Zhang, Zhenzhong Chen, Songnan Li, Shan Liu. An Efficient Saliency Prediction Model for Unmanned Aerial Vehicle Video[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 194: 152-166. (SCI Q1, IF: 12.7)
Zhaopeng Hu, Daiqin Yang, Kao Zhang, Zhenzhong Chen. Object Tracking in Satellite Videos Based on Convolutional Regression Network with Appearance and Motion Features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 783-793. (SCI Q1, IF: 3.827)
Ruiqian Zhang, Jian Yao, Kao Zhang, Chen Feng and Jiadong Zhang. S-CNN-Based Ship Detection from High-Resolution Remote Sensing Images[C]. ISPRS Congress, 2016. (Best Poster Award, EI).
Tong He, Jian Yao, Kao Zhang, Yaolin Hou, Shiyao Han. Accurate Multi-Scale License Plate Localization Via Image Saliency[C]. IEEE ITSC, 2014. (Oral, EI).
1.3 Metaverse/ VR
CSIG. Industry and Technology Roadmap for Metaverse. ISBN:9787523607367, 2024.11. (Responsible for the organization and compilation of the book)
2 Patent (Translated by AI)
Zhang, K., Liu, Z., Pan, Z., Hu, Z., et al. Warning and Storage Method, System, Device and Medium Based on Infrared Video Surveillance. CN202511016292.2. (Granted)
Yang, H., Hu, Z., Zhang, K., Li, M., et al. House Spatial Layout Estimation Method, Apparatus, Device and Storage Medium Based on Multi-View Panorama and Multi-Label Graph Cut. CN202510208123.2. (Granted)
Luo, Y., Zhang, K., Song, T., Hu, Z., et al. A Video Cropping Method and a Quality Evaluation Method for Cropped Video. CN202411886091.3. (Granted)
Cai, C., Pan, Z., Zhang, K., Hu, Z., et al. A Progressive Relaxation Training Method Based on Facial and Postural Behavior Perception. CN202411003151.2. (Granted)
Cai, C., Pan, Z., Zhang, K., Hu, Z., et al. Emotion Recognition Method Integrating Interactive Dialogue Context and Speaker Identity Attributes. CN202411328293.6. (Granted)
Cai, C., Pan, Z., Zou, Y., Lin, X., Xia, X., Zhang, K., et al. A Multimodal Dialogue Emotion Recognition Method. CN202411833608.2. (Granted)
Li, C., Zhang, S., Lin, X., Zhang, K., et al. Key Frame Extraction Method, Apparatus and Storage Medium. CN202411314338.4. (Granted)
Li, Y., Hu, Z., Pan, Z., Zhang, K., et al. Digital Ground 3D Reconstruction Method, Apparatus, Device and Storage Medium Based on Neural Radiance Field and Bidirectional Reflectance Distribution Function. CN202411757299.5. (Granted)
Yao, J., Zhang, K., He, T., Zhu, S. An Accurate Multi-Scale License Plate Localization Method Based on Affine Correction. CN201410077985.8. (Granted)
Zhang, K., Song, T., Hu, Z., Li, M., et al. A Fast Saliency Prediction Method for Panoramic Video. CN202510708582.7. (Under Substantive Examination)
Chen, Z., Li, Y., Zhang, K. A Siamese Network-Based Target Tracking Method for Quadruped Robots. CN202310399358.5. (Under Substantive Examination)
Chen, Z., Chen, Z., Zhang, K. A Video Saliency Prediction Method and System Based on Audio-Visual Features. CN202310247030.1. (Under (Under Substantive Examination)
3 Funding (Translated by AI)
3.1 Principal Investigator
Industry-sponsored Project: Typical Target Detection and Early Warning Analysis in Infrared Videos (PI, ongoing), Aug 2024 – Jul 2025
Industry-sponsored Project: Typical Target Detection and Early Warning Analysis in Optical Videos (PI, ongoing), Dec 2024 – Nov 2025
NSFC Young Scientists Fund: Research on Efficient Video Saliency Prediction Methods Based on Weakly Supervised Learning (PI, ongoing), Jan 2023 – Dec 2025
NUIST "Integration of Industry and Education" Textbook Development Project: Virtual Reality Technology (PI, ongoing), Oct 2023 – Aug 2025
NUIST Scientific Research Start-up Funding Project: Multi-source UAV Video Saliency Detection (PI, ongoing), Mar 2024 – Oct 2026
Industry-sponsored Project: Research on Virtual Reality Image Target Detection and Scene Understanding Technology (PI, completed), Jun 2023 – May 2024
China Postdoctoral Science Foundation General Project: Research on Key Technologies for Remote Sensing Video Saliency Prediction (PI, completed), Jul 2021 – Feb 2023
Hubei Provincial Postdoctoral Innovation Research Position Project: Research on Key Technologies for UAV Video Saliency Prediction (PI, completed), Jul 2021 – Feb 2023
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing Exploratory Project: Salient Object Detection in Remote Sensing Videos (PI, completed), Jan 2021 – Dec 2021
3.2 Participant/Collaborator
National Social Science Foundation Youth Project: Enhancing Strategic Proactivity and Capacity in Preventing Online Ideological Risks (Participant, ongoing), Jan 2024 – Dec 2026
National Key R&D Program of China Sub-project: Key Technologies for Collaborative Three-dimensional Public Security Monitoring (Participant, completed), May 2018 – Apr 2021
National Key R&D Program of China Sub-project: Human-like Intelligent Perception Mechanisms and Methods by Integrating Multi-modal Contextual Information (Core Team Member, completed), Oct 2017 – Sep 2021
NSFC Key Program: Visually Inspired Machine Learning Methods for Remote Sensing Images (Participant), Jan 2021 – Dec 2025
NSFC General Program: Visual Perception Analysis and Video Coding Optimization Based on Visual Characteristics (Participant, completed), Jan 2018 – Dec 2021
NSFC General Program: Efficient Video Coding Based on Global Visual Redundancy Analysis (Participant, completed), Jan 2015 – Dec 2018
3.3 Student Project Supervision/Mentorship
National Undergraduate Innovation and Entrepreneurship Training Program: "AI Video Silhouette"—An Intelligent Video Clipping System Based on Saliency Methods (Faculty Advisor, completed, Project ID: 202410300256E), Jun 2024 – May 2025
National Undergraduate Innovation and Entrepreneurship Training Program: Research and Application of Spatiotemporal Feature-based Coastal Infrared Target Detection Algorithms (Faculty Advisor, ongoing, funded), Jun 2025 – May 2026
Jiangsu Provincial Graduate Student Research and Practice Innovation Program: Research on Attention Mechanism-based Audio-visual Saliency Prediction Methods (Faculty Advisor, ongoing, Project ID: KYCX25_1654), May 2025 – Apr 2026
Jiangsu Provincial Undergraduate Innovation and Entrepreneurship Training Program: Multi-scale Infrared Ship Detection Algorithm Based on YOLOv12 (Faculty Advisor, ongoing, funded), May 2025 – Apr 2026
University-Level Undergraduate Innovation and Entrepreneurship Training Program: Research on Video Saliency Prediction Methods Based on Weakly Supervised Learning (Faculty Advisor, completed), Jun 2024 – May 2025
University-Level Undergraduate Innovation and Entrepreneurship Training Program: Research on Target Detection in UAV Imagery (Faculty Advisor, ongoing, funded), May 2025 – Apr 2026
4 Selected Awards (Translated by AI)
2025, Outstanding Class Teacher Award, NUIST.
2024, Outstanding Faculty Member Award, NUIST.
2021, Second-Class Prize of Graduate Academic Innovation Award, Wuhan University.
2018, Grand Winner Prize on Images in ICME2018 Grand Challenge (GC) – Salient360!.
2018, 1st place on track: Prediction of Head Saliency for Images in ICME2018 GC–Salient360!.
2018, 1st place on track: Prediction of Head+Eye Saliency for Videos in ICME2018 GC–Salient360!.
2017, Best Head Movement Prediction Student Prize in ICME2017 GC–Salient360!.
2014, Second-Class Prize of the National Graduate Contest on Smart-City Technology and Creative Design, Video Challenge--Face Detection Section.
2014, Excellent Bachelor’s Degree Thesis of Hubei Province.
2014-2016, Excellent Graduate Students of Wuhan University.
2010-2012, Excellent Undergraduate Students of Wuhan University.
2014-2016, First Class Scholarship of Wuhan University.
5 Teaching
Neural Network and Deep Learning, Introduction to AI, Introduction to VR, Digital Image Processing
Programming Fundamentals (C), Basic Programming Language (Python), Course Project of ML
Artificial Intelligence in Daily Life, DeepSeek and Large AI Models
Wuhan University  Photogrammetry and Remote Sensing  Postgraduate (Doctoral)  Doctoral Degree in Engineering
Wuhan University  Surveying Engineering  Postgraduate (Master's Degree)  Master's Degree in Engineering
Wuhan University  Remote Sensing Science and Technology  Undergraduate (Bachelor’s degree)  Bachelor's Degree in Engineering
NUIST School of Artificial Intelligence/School of Future Technology
WHU School of Remote Sensing and Information Engineering Postdoc research fellow
Tencent Media Lab Visiting Researcher
INRIA IRISA Visiting Student
Metaverse Technology and Application Innovation Platform, CIUR, Deputy secretary-general;
15th International Conference on Graphics and Image Processing (ICGIP 2023), Publicity Co-chairs.
Journal Reviewer:
IEEE Transactions on Image Processing (TIP)
IEEE Transactions on Multimedia (TMM)
IEEE Transactions on Geoscience and Remote Sensing (TGRS)
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS)
IEEE Geoscience and Remote Sensing Letters (GRSL)
Conference Reviewer:
IEEE International Conference on Image Processing (ICIP);
IEEE International Conference on Multimedia and Expo (ICME);
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
PostalAddress :
Email :
Description of Research Group: https://sunwj.github.io/
Description of Research Group: http://xagx.zuel.edu.cn/2022/0429/c3560a297387/page.htm
Visual saliency, Photogrammetry and remote sensing, Image and video processing, Virtual reality