Track 1: Vision-Language Models for Practical Applications

Organizer

Assoc. Prof. Kazuya Ueki

Meisei University, Japan
Contact Email: kazuya.ueki.meisei-u.ac.jp

★ Background: Vision-Language Models (VLMs) have rapidly advanced in recent years, enabling machines to connect visual and textual modalities more effectively. These models have opened new possibilities in multimedia retrieval, analysis, and human-centered applications. Subject: This track will focus on practical applications of vision-language models, particularly in areas such as video and image retrieval, anomaly detection in surveillance and industrial domains, and multimodal understanding for lifelogging and personal media. Research hotpot: Research hotspots include efficient training and fine-tuning of large-scale VLMs, improving interpretability and explainability, applying VLMs to healthcare, education, and cultural heritage, and exploring novel multimodal applications that bridge theory and practice.

 

★ List some of the topics:

Video and image retrieval using vision-language models

Human anomaly detection and surveillance applications

Multimodal learning and representation methods

Lifelogging and personal media analysis

Vision-language models for healthcare applications (e.g., medical image analysis with text reports)

Education and e-learning systems enhanced by multimodal AI

Cultural heritage and digital archives analysis

Efficient training, fine-tuning, and evaluation of VLMs

Interpretability and explainability in multimodal models


Invited Speaker To be added......