CSAI2025 | Beijing, China

Track 1: Vision-Language Models for Practical Applications

Organizer

Assoc. Prof. Kazuya Ueki

Meisei University, Japan
Contact Email: kazuya.ueki.meisei-u.ac.jp

★ Background: Vision-Language Models (VLMs) have rapidly advanced in recent years, enabling machines to connect visual and textual modalities more effectively. These models have opened new possibilities in multimedia retrieval, analysis, and human-centered applications. Subject: This track will focus on practical applications of vision-language models, particularly in areas such as video and image retrieval, anomaly detection in surveillance and industrial domains, and multimodal understanding for lifelogging and personal media. Research hotpot: Research hotspots include efficient training and fine-tuning of large-scale VLMs, improving interpretability and explainability, applying VLMs to healthcare, education, and cultural heritage, and exploring novel multimodal applications that bridge theory and practice.

★ List some of the topics:

● Video and image retrieval using vision-language models

● Human anomaly detection and surveillance applications

● Multimodal learning and representation methods

● Lifelogging and personal media analysis

● Vision-language models for healthcare applications (e.g., medical image analysis with text reports)

● Education and e-learning systems enhanced by multimodal AI

● Cultural heritage and digital archives analysis

● Efficient training, fine-tuning, and evaluation of VLMs

● Interpretability and explainability in multimodal models

Invited Speaker To be added......