DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Item request has been placed!

Item request cannot be made.

Processing Request

اقرأ أكثر حفظ في قائمتي

المؤلفون: Han, Zongbo; Yang, Jialong; Li, Junfan; Hu, Qinghua; Xu, Qianli; Shou, Mike Zheng; Zhang, Changqing
الموضوع:
Computer Science - Machine Learning; Computer Science - Artificial Intelligence; Computer Science - Computation and Language; Computer Science - Computer Vision and Pattern Recognition; Computer Science - Human-Computer Interaction
نوع التسجيلة:
Working Paper
الدخول الالكتروني :
http://arxiv.org/abs/2409.19375

معلومة اضافية
- الموضوع:
  2024
- Collection:
  Computer Science
- نبذة مختصرة :
  Vision-language foundation models (e.g., CLIP) have shown remarkable performance across a wide range of tasks. However, deploying these models may be unreliable when significant distribution gaps exist between the training and test data. The training-free test-time dynamic adapter (TDA) is a promising approach to address this issue by storing representative test samples to guide the classification of subsequent ones. However, TDA only naively maintains a limited number of reference samples in the cache, leading to severe test-time catastrophic forgetting when the cache is updated by dropping samples. In this paper, we propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota). Instead of naively memorizing representative test samples, Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment. The test-time posterior probabilities are then computed using the estimated distributions based on Bayes' theorem for adaptation purposes. To further enhance the adaptability on the uncertain samples, we introduce a new human-in-the-loop paradigm which identifies uncertain samples, collects human-feedback, and incorporates it into the Dota framework. Extensive experiments validate that Dota enables CLIP to continually learn, resulting in a significant improvement compared to current state-of-the-art methods.
  Comment: In submission
- الرقم المعرف:
  edsarx.2409.19375

تعليقات

No Comments.

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

اتصل بنا

اتبع