Medical VLM with Global Anomaly Detection (MEDIC-AD)

Overview

Synchronizing overview from sheet...

Key Achievements

  • Synchronizing achievements from sheet...

Milestones

  • Synchronizing milestones from sheet...

Abstract

Lesion detection, symptom tracking, and visual explainability are central to real-world medical image analysis, yet current medical vision-language models (VLMs) still lack mechanisms that translate their broad knowledge into clinically actionable outputs.

To bridge this gap, we present MEDIC-AD, a clinically oriented VLM that strengthens these capabilities through a stage-wise framework. First, learnable anomaly-aware tokens ($\langle Ano \rangle$) encourage the model to focus on abnormal regions and build more discriminative lesion-centered representations. Second, inter-image difference tokens ($\langle Diff \rangle$) explicitly encode temporal changes between studies, allowing the model to distinguish worsening, improvement, and stability in disease burden.

Finally, a dedicated explainability stage trains the model to generate heatmaps that highlight lesion-related regions, offering clear visual evidence that is consistent with the model's reasoning. Through this staged design, MEDIC-AD steadily boosts performance across anomaly detection, symptom tracking, and anomaly segmentation, achieving state-of-the-art results compared with both closed-source and medical-specialized baselines.

Evaluations on real longitudinal clinical data collected from hospital workflows further show that MEDIC-AD delivers stable predictions and clinically faithful explanations in practical patient-monitoring and decision-support workflows.