Qr code
中文
Kao Zhang

Supervisor of Master's Candidates


Alma Mater:Wuhan University
Education Level:With Certificate of Graduation for Doctorate Study
Status:在岗
School/Department:School of Artificial Intelligence/School of Future Technology
Discipline:Photogrammetry and Remote Sensing
Business Address:Room A1911, Linjiang Building, No.219, Ningliu Road, Nanjing, China
Contact Information:kaozhang@nuist.edu.cn
Click:Times

The Last Update Time: ..

Profile

Kao Zhang received the Ph.D degrees at Lab. of Intelligent Information Processing (IIP) from Wuhan University, Wuhan, China, in 2020, under the supervision of Prof.Zhenzhong Chen. Formerly, he finished the B.Eng. and M.Eng degrees at Computer Vision & Remote Sensing Lab (CVRS) in 2014 and 2016 respectively, under the guidance of Prof.Jian Yao. He is now a lecturer at Nanjing University of Information Science and Technology, Nanjing, China. He was a postdoctoral fellow at Wuhan University, working on visual saliency pridection, a researcher at Tencent, Shenzhen, working on video processing, and a visiting student at PERCEPT team of INRIA, Rennes, France working on UAV video saliency prediction. His current research interests include visual attention, image/video processing, remote sensing and metaverse.


中文主页英文主页谷歌学术; GitHub


We are looking for self-motivated undergraduate/graduate students. If you are interested in joining us, please feel free to contact me with your CV!


Main research fields:

Visual attention: Video/Image/RGBD/VR/UAV saliency prediction.

Remote sensing video analysis: Object detection, tracking and recognition in satellite/UAV videos.

Metaverse: Multimodal (text, image, video, and sound) Emotion analysis, Virtual Reality technology.

Dataset and benchmark: Visual saliency dataset, Object detection and tracking dataset.


1 Selected Publications:

1.1 Visual Saliency Modeling / Image Processing / Computer Vision

Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang. CCDM: Continuous conditional diffusion models for image generation[J]. IEEE Transactions on Multimedia. 2025.(SCI Q1, IF: 9.70)

Chuangxin Cai, Kao Zhang, Zhihua Hu, Xianxuan Lin, Zhigeng Pan. Prompt-based hybrid supervised contrastive learning for emotion recognition in conversation. Neurocomputing, 2025, 130453. (SCI Q1, IF: 6.50)

Pengyuan Quan, Zihao Mao, Nenglun Chen, Yang Zhang, Kao Zhang, Zhigeng Pan. Attentive Fusion for Efficient Wrist-worn Gesture Recognition based on Dual-view Cameras[J]. IEEE Sensors Journal, 2024. (SCI Q1, IF: 4.5)

Hao Cai*, Kao Zhang*, Zhao Chen, Chenxi Jiang, Zhenzhong Chen. Video saliency prediction for first-person view UAV videos: Dataset and benchmark[J]. Neurocomputing, 2024: 127876. (SCI Q1, IF: 6.50, co-first author)

Zhao Chen*, Kao Zhang*, Hao Cai, Xiaoying Ding, Chenxi Jiang, Zhenzhong Chen. Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks[J]. Journal of Visual Communication and Image Representation, 2024: 104095. (SCI Q2, IF: 3.1, co-first author)

Kao Zhang, Yan Shang, Songnan Li, Shan Liu, Zhenzhong Chen. SalCrop: Spatio-temporal Saliency Based Video Cropping[C]. IEEE VCIP, 2022. (Demo, Oral, Poster, EI)

Kao Zhang, Zhenzhong Chen, Shan Liu. A Spatial-temporal Recurrent Neural Network for Video Saliency Prediction[J]. IEEE Transactions on Image Processing, 2021, 30: 572-587. (SCI Q1, IF: 11.041)

Di Liu, Kao Zhang, Zhenzhong Chen. Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection[J]. IEEE Transactions on Multimedia, 2021, 23: 967-981. (SCI Q1, IF: 8.182)

Kao Zhang, Zhenzhong Chen. Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(12): 3544-3557. (SCI Q1, IF: 4.133)

Jing Ling, Kao Zhang, Yingxue Zhang, Daiqin Yang, Zhenzhong Chen. A saliency prediction model on 360 degree images using color dictionary based sparse representation[J]. Signal Processing: Image Communication, 2018, 69: 60-68. (SCI Q2, IF: 2.779)


1.2 Remote Sensing Information Processing

Zhihua Hu, Kao Zhang, Yuxuan Liu. Edge constrained DSM refinement based on shading from high resolution multi-view satellite images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025. (SCI Q1, IF: 5.3)

Zhihua Hu, Wanjie Lu, Kao Zhang, Helong Yang, Yaoyang Wang, Nannan Qin, Yuxuan Liu, Sisi Zlatanova. Accurate room layout estimation from multi-view panoramas with multi-label graph cut[J]. International Journal of Applied Earth Observation and Geoinformation, 2025. (SCI Q1, IF: 8.6)

Yang Li*, Kao Zhang*, Zhao Chen, Wanping Ouyang, Mingpeng Cui, Chenxi Jiang, Daiqin Yang and Zhenzhong Chen. Towards Object Tracking for Quadruped Robots[J]. Journal of Visual Communication and Image Representation, 2023, 97: 103958. (SCI Q2, IF: 2.6, 共同一作)

Kao Zhang, Zhenzhong Chen, Songnan Li, Shan Liu. An Efficient Saliency Prediction Model for Unmanned Aerial Vehicle Video[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 194: 152-166. (SCI Q1, IF: 12.7)

Zhaopeng Hu, Daiqin Yang, Kao Zhang, Zhenzhong Chen. Object Tracking in Satellite Videos Based on Convolutional Regression Network with Appearance and Motion Features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 783-793. (SCI Q1, IF: 3.827)

Ruiqian Zhang, Jian Yao, Kao Zhang, Chen Feng and Jiadong Zhang. S-CNN-Based Ship Detection from High-Resolution Remote Sensing Images[C]. ISPRS Congress, 2016. (Best Poster Award, EI).

Tong He, Jian Yao, Kao Zhang, Yaolin Hou, Shiyao Han. Accurate Multi-Scale License Plate Localization Via Image Saliency[C]. IEEE ITSC, 2014. (Oral, EI).


1.3 Metaverse/ VR

CSIG. Industry and Technology Roadmap for Metaverse. ISBN:9787523607367, 2024.11. (Responsible for the organization and compilation of the book)


2 Patent (Translated by AI)

Zhang, K., Liu, Z., Pan, Z., Hu, Z., et al. Warning and Storage Method, System, Device and Medium Based on Infrared Video Surveillance. CN202511016292.2. (Granted)

Yang, H., Hu, Z., Zhang, K., Li, M., et al. House Spatial Layout Estimation Method, Apparatus, Device and Storage Medium Based on Multi-View Panorama and Multi-Label Graph Cut. CN202510208123.2. (Granted)

Luo, Y., Zhang, K., Song, T., Hu, Z., et al. A Video Cropping Method and a Quality Evaluation Method for Cropped Video. CN202411886091.3. (Granted)

Cai, C., Pan, Z., Zhang, K., Hu, Z., et al. A Progressive Relaxation Training Method Based on Facial and Postural Behavior Perception. CN202411003151.2. (Granted)

Cai, C., Pan, Z., Zhang, K., Hu, Z., et al. Emotion Recognition Method Integrating Interactive Dialogue Context and Speaker Identity Attributes. CN202411328293.6. (Granted)

Cai, C., Pan, Z., Zou, Y., Lin, X., Xia, X., Zhang, K., et al. A Multimodal Dialogue Emotion Recognition Method. CN202411833608.2. (Granted)

Li, C., Zhang, S., Lin, X., Zhang, K., et al. Key Frame Extraction Method, Apparatus and Storage Medium. CN202411314338.4. (Granted)

Li, Y., Hu, Z., Pan, Z., Zhang, K., et al. Digital Ground 3D Reconstruction Method, Apparatus, Device and Storage Medium Based on Neural Radiance Field and Bidirectional Reflectance Distribution Function. CN202411757299.5. (Granted)

Yao, J., Zhang, K., He, T., Zhu, S. An Accurate Multi-Scale License Plate Localization Method Based on Affine Correction. CN201410077985.8. (Granted)

Zhang, K., Song, T., Hu, Z., Li, M., et al. A Fast Saliency Prediction Method for Panoramic Video. CN202510708582.7. (Under Substantive Examination)

Chen, Z., Li, Y., Zhang, K. A Siamese Network-Based Target Tracking Method for Quadruped Robots. CN202310399358.5. (Under Substantive Examination)

Chen, Z., Chen, Z., Zhang, K. A Video Saliency Prediction Method and System Based on Audio-Visual Features. CN202310247030.1. (Under (Under Substantive Examination)



3 Funding (Translated by AI)

3.1 Principal Investigator

Industry-sponsored Project: Typical Target Detection and Early Warning Analysis in Infrared Videos (PI, ongoing), Aug 2024 – Jul 2025

Industry-sponsored Project: Typical Target Detection and Early Warning Analysis in Optical Videos (PI, ongoing), Dec 2024 – Nov 2025

NSFC Young Scientists Fund: Research on Efficient Video Saliency Prediction Methods Based on Weakly Supervised Learning (PI, ongoing), Jan 2023 – Dec 2025

NUIST "Integration of Industry and Education" Textbook Development Project: Virtual Reality Technology (PI, ongoing), Oct 2023 – Aug 2025

NUIST Scientific Research Start-up Funding Project: Multi-source UAV Video Saliency Detection (PI, ongoing), Mar 2024 – Oct 2026

Industry-sponsored Project: Research on Virtual Reality Image Target Detection and Scene Understanding Technology (PI, completed), Jun 2023 – May 2024

China Postdoctoral Science Foundation General Project: Research on Key Technologies for Remote Sensing Video Saliency Prediction (PI, completed), Jul 2021 – Feb 2023

Hubei Provincial Postdoctoral Innovation Research Position Project: Research on Key Technologies for UAV Video Saliency Prediction (PI, completed), Jul 2021 – Feb 2023

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing Exploratory Project: Salient Object Detection in Remote Sensing Videos (PI, completed), Jan 2021 – Dec 2021


3.2 Participant/Collaborator

National Social Science Foundation Youth Project: Enhancing Strategic Proactivity and Capacity in Preventing Online Ideological Risks (Participant, ongoing), Jan 2024 – Dec 2026

National Key R&D Program of China Sub-project: Key Technologies for Collaborative Three-dimensional Public Security Monitoring (Participant, completed), May 2018 – Apr 2021

National Key R&D Program of China Sub-project: Human-like Intelligent Perception Mechanisms and Methods by Integrating Multi-modal Contextual Information (Core Team Member, completed), Oct 2017 – Sep 2021

NSFC Key Program: Visually Inspired Machine Learning Methods for Remote Sensing Images (Participant), Jan 2021 – Dec 2025

NSFC General Program: Visual Perception Analysis and Video Coding Optimization Based on Visual Characteristics (Participant, completed), Jan 2018 – Dec 2021

NSFC General Program: Efficient Video Coding Based on Global Visual Redundancy Analysis (Participant, completed), Jan 2015 – Dec 2018


3.3 Student Project Supervision/Mentorship

National Undergraduate Innovation and Entrepreneurship Training Program: "AI Video Silhouette"—An Intelligent Video Clipping System Based on Saliency Methods (Faculty Advisor, completed, Project ID: 202410300256E), Jun 2024 – May 2025

National Undergraduate Innovation and Entrepreneurship Training Program: Research and Application of Spatiotemporal Feature-based Coastal Infrared Target Detection Algorithms (Faculty Advisor, ongoing, funded), Jun 2025 – May 2026

Jiangsu Provincial Graduate Student Research and Practice Innovation Program: Research on Attention Mechanism-based Audio-visual Saliency Prediction Methods (Faculty Advisor, ongoing, Project ID: KYCX25_1654), May 2025 – Apr 2026

Jiangsu Provincial Undergraduate Innovation and Entrepreneurship Training Program: Multi-scale Infrared Ship Detection Algorithm Based on YOLOv12 (Faculty Advisor, ongoing, funded), May 2025 – Apr 2026

University-Level Undergraduate Innovation and Entrepreneurship Training Program: Research on Video Saliency Prediction Methods Based on Weakly Supervised Learning (Faculty Advisor, completed), Jun 2024 – May 2025

University-Level Undergraduate Innovation and Entrepreneurship Training Program: Research on Target Detection in UAV Imagery (Faculty Advisor, ongoing, funded), May 2025 – Apr 2026


4 Selected Awards (Translated by AI)

2025, Outstanding Class Teacher Award, NUIST.

2024, Outstanding Faculty Member Award, NUIST.

2021, Second-Class Prize of Graduate Academic Innovation Award, Wuhan University.

2018, Grand Winner Prize on Images in ICME2018 Grand Challenge (GC) – Salient360!.

2018, 1st place on track: Prediction of Head Saliency for Images in ICME2018 GC–Salient360!.

2018, 1st place on track: Prediction of Head+Eye Saliency for Videos in ICME2018 GC–Salient360!.

2017, Best Head Movement Prediction Student Prize in ICME2017 GC–Salient360!.

2014, Second-Class Prize of the National Graduate Contest on Smart-City Technology and Creative Design, Video Challenge--Face Detection Section.

2014, Excellent Bachelor’s Degree Thesis of Hubei Province.

2014-2016, Excellent Graduate Students of Wuhan University.

2010-2012, Excellent Undergraduate Students of Wuhan University.

2014-2016, First Class Scholarship of Wuhan University.


5 Teaching

Neural Network and Deep Learning, Introduction to AI, Introduction to VR, Digital Image Processing

Programming Fundamentals (C), Basic Programming Language (Python), Course Project of ML

Artificial Intelligence in Daily Life, DeepSeek and Large AI Models



Educational Experience

  • 2016.9 -- 2020.6

    Wuhan University       Photogrammetry and Remote Sensing       Postgraduate (Doctoral)       Doctoral Degree in Engineering

  • 2014.9 -- 2016.6

    Wuhan University       Surveying Engineering       Postgraduate (Master's Degree)       Master's Degree in Engineering

  • 2010.9 -- 2014.6

    Wuhan University       Remote Sensing Science and Technology       Undergraduate (Bachelor’s degree)       Bachelor's Degree in Engineering

Work Experience

  • 2023.3 -- Now

    NUIST      School of Artificial Intelligence/School of Future Technology

  • 2020.12 -- 2023.2

    WHU      School of Remote Sensing and Information Engineering      Postdoc research fellow

  • 2020.7 -- 2020.12

    Tencent      Media Lab      Visiting Researcher

  • 2019.10 -- 2019.10

    INRIA      IRISA      Visiting Student

Social Affiliations

  • 2023.5 -- Now

    Metaverse Technology and Application Innovation Platform, CIUR, Deputy secretary-general;
    15th International Conference on Graphics and Image Processing (ICGIP 2023), Publicity Co-chairs.

  • 2020.1 -- Now

    Journal Reviewer:
    IEEE Transactions on Image Processing (TIP)
    IEEE Transactions on Multimedia (TMM)
    IEEE Transactions on Geoscience and Remote Sensing (TGRS)
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS)
    IEEE Geoscience and Remote Sensing Letters (GRSL)

  • 2020.1 -- Now

    Conference Reviewer:
    IEEE International Conference on Image Processing (ICIP);
    IEEE International Conference on Multimedia and Expo (ICME);
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Other Contact Information

  • PostalAddress :

  • Email :

Research Group

Name of Research Group: Cooperation-Wuhan University-Wanjie Sun (Associate Research Fellow)

Description of Research Group: https://sunwj.github.io/

Name of Research Group: Cooperation-Zhongnan University of Economics and Law-Xiaoying Ding (Associate Professor)

Description of Research Group: http://xagx.zuel.edu.cn/2022/0429/c3560a297387/page.htm

Research Focus

  • Visual saliency, Photogrammetry and remote sensing, Image and video processing, Virtual reality