2025年"核理安邦“联合博士生学术论坛

Name: 2025年"核理安邦“联合博士生学术论坛
Start: 2025-05-17T08:00:00+08:00
End: 2025-05-18T17:00:00+08:00
Location: Location: 清华大学核能与新能源技术研究院

17–18 May 2025

Location: 清华大学核能与新能源技术研究院

Asia/Shanghai timezone

张宸、岳顺禹、徐闯、杨洲

基于大模型的公共安全监控文本到图像行人检索研究

Not scheduled

12m

Location: 清华大学核能与新能源技术研究院

北京市昌平区Y902(虎峪路)清华大学核能与新能源技术研究院

口头报告 AI+ AI+

骆炳君 (清华大学)

摘要

随着智慧城市建设的发展，文本到图像行人检索在公共安全监控中的应用价值日益凸显。针对现有行人检索方法对大量标注数据的依赖及其在跨场景部署中的泛化瓶颈，本研究提出一种基于图结构的跨域知识蒸馏方法（GCKD），实现大模型驱动下的无监督跨域检索能力。该方法通过图结构多域传播与对比式动量蒸馏模块，解决了跨场景语义迁移与模态差异问题。实验在多个行人检索基准数据集上进行了验证，平均Rank-1检索精度提升超过4%。研究成果已被人工智能领域国际顶级会议、中国计算机学会推荐A类会议AAAI 2025接收。

Abstract

With the advancement of smart city development, text-to-image person retrieval has shown increasing value in public security surveillance applications. To address the reliance of existing retrieval methods on large-scale annotated data and their limited generalization across diverse deployment scenarios, this study proposes a Graph-Based Cross-Domain Knowledge Distillation (GCKD) method, enabling large-model-driven unsupervised cross-domain retrieval. The proposed approach leverages a graph-based multi-domain propagation module and a contrastive momentum distillation module to tackle the challenges of semantic transfer and modality discrepancy across scenarios. Experiments on multiple benchmark person retrieval datasets demonstrate that our method achieves an average improvement of over 4% in Rank-1 retrieval accuracy. This work has been accepted as an oral presentation at AAAI 2025, a top-tier international conference in artificial intelligence and a CCF-A recommended venue.

关键词	文本到图像行人检索；多模态大模型；智慧城市；公共安全
Keywords	Text-to-Image Person Retrieval; Multimodal Large Model; Smart City; Public Security

骆炳君 (清华大学)

There are no materials yet.

2025年"核理安邦“联合博士生学术论坛

张宸、岳顺禹、徐闯、杨洲

基于大模型的公共安全监控文本到图像行人检索研究

Location: 清华大学核能与新能源技术研究院

Speaker

摘要

Abstract

Author

Presentation materials

Choose timezone

2025年"核理安邦“联合博士生学术论坛

张宸、 岳顺禹、徐闯、杨洲

Speaker

摘要

Abstract

Author

Presentation materials

张宸、岳顺禹、徐闯、杨洲