Abstract

The paper explores the use of pseudolabels, which are heuristic labels for unlabeled data, to enhance the performance of vision-language models like CLIP via prompt tuning. The authors investigate different learning paradigms and prompt modalities and find that iterative prompt-training strategies leveraging CLIP-based pseudolabels lead to significant improvements in CLIP’s image classification performance.