Few-shot industrial anomaly detection aims to identify unseen defects using only a limited number of normal samples. However, most existing approaches still rely heavily on auxiliary industrial datasets for training. In this paper, we propose a novel self-supervised CLIP-guided for few-shot industrial anomaly detection, which eliminates the need for auxiliary industrial data. Specifically, we first introduce a pseudo-anomaly generation strategy that synthesizes both structural and textural anomalies. Then, leveraging the cross-modal semantic understanding capability of CLIP, we contrast the multi-scale visual features with learnable textual prompts to achieve anomaly localization grounded in language semantics. Inspired by the human cognitive process of identifying anomalies through reference comparison, we introduce a support set composed of a few normal samples and perform semantic-level feature alignment with the query set via CLIP visual encoder, thereby enhancing anomaly discrimination. Furthermore, we also introduce Adapter to alleviate the semantic offset problem between text and image modalities in industrial scenarios of CLIP, and enhance the model’s robustness to the spatial structure differences between query set and support set. Extensive experiments conducted on the MVTec AD, the VisA, the BTAD and the MPDD datasets demonstrate that our method achieves competitive results under the few-shot setting. Moreover, its effectiveness and deployability are validated through real-world application in battery spot-welding defect inspection. The code is available at https://github.com/YiKuiZhai/SCF-AD.

Self-Supervised CLIP-Guided for Few-Shot Industrial Anomaly Detection / Y. Chen, Y. Xu, T. Wang, Y. Zhai, K. Tan, J. Zhou, P. Coscia, A. Genovese, C.L.P. Chen. - In: IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT. - ISSN 0018-9456. - (2026), pp. 1-16. [Epub ahead of print] [10.1109/tim.2026.3661696]

Self-Supervised CLIP-Guided for Few-Shot Industrial Anomaly Detection

P. Coscia;A. Genovese;
2026

Abstract

Few-shot industrial anomaly detection aims to identify unseen defects using only a limited number of normal samples. However, most existing approaches still rely heavily on auxiliary industrial datasets for training. In this paper, we propose a novel self-supervised CLIP-guided for few-shot industrial anomaly detection, which eliminates the need for auxiliary industrial data. Specifically, we first introduce a pseudo-anomaly generation strategy that synthesizes both structural and textural anomalies. Then, leveraging the cross-modal semantic understanding capability of CLIP, we contrast the multi-scale visual features with learnable textual prompts to achieve anomaly localization grounded in language semantics. Inspired by the human cognitive process of identifying anomalies through reference comparison, we introduce a support set composed of a few normal samples and perform semantic-level feature alignment with the query set via CLIP visual encoder, thereby enhancing anomaly discrimination. Furthermore, we also introduce Adapter to alleviate the semantic offset problem between text and image modalities in industrial scenarios of CLIP, and enhance the model’s robustness to the spatial structure differences between query set and support set. Extensive experiments conducted on the MVTec AD, the VisA, the BTAD and the MPDD datasets demonstrate that our method achieves competitive results under the few-shot setting. Moreover, its effectiveness and deployability are validated through real-world application in battery spot-welding defect inspection. The code is available at https://github.com/YiKuiZhai/SCF-AD.
CLIP; cross-modal; representation alignment; industrial anomaly detection; few-shot
Settore INFO-01/A - Informatica
2026
feb-2026
Article (author)
File in questo prodotto:
File Dimensione Formato  
tim26_compressed.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza: Nessuna licenza
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1217115
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact