Metal surface defect recognition (MSDR) based on deep learning encounters the challenge of few-shot expert-labeled data. In this study, we proposed a CLIP-vision guided self supervised learning (CVGSSL) framework for representation learning of unlabeled data, completing MSDR using few-shot labeled data. This framework initially generates rich and diverse representation information through multiple CLIP-Vs to ensure effective SSL pretraining, followed by the design of an MLP-adapter to distill knowledge and adapt these representations to recognition tasks. In addition, we constructed a self-constrained loss to address the inherent problem of intraclass and interclass distance ambiguity that causes the representation to fall into an equivocal decision margin. Following label-free pretraining of CVGSSL, the downstream model adapts to one-shot to four-shot defect recognition tasks through fine-tuning. Experimental results demonstrate that CVGSSL outperforms state-of-the-art SSL methods across three public metal surface defect datasets, with the efficacy of the approach validated through extensive ablation experiments.
CLIP-Vision Guided Few-Shot Metal Surface Defect Recognition / T. Wang, Z. Li, Y. Xu, Y. Zhai, X. Xing, K. Guo, P. Coscia, A. Genovese, V. Piuri, F. Scotti. - In: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS. - ISSN 1551-3203. - 21:5(2025), pp. 4273-4284. [10.1109/tii.2025.3547353]
CLIP-Vision Guided Few-Shot Metal Surface Defect Recognition
P. Coscia;A. Genovese;V. Piuri;F. Scotti
2025
Abstract
Metal surface defect recognition (MSDR) based on deep learning encounters the challenge of few-shot expert-labeled data. In this study, we proposed a CLIP-vision guided self supervised learning (CVGSSL) framework for representation learning of unlabeled data, completing MSDR using few-shot labeled data. This framework initially generates rich and diverse representation information through multiple CLIP-Vs to ensure effective SSL pretraining, followed by the design of an MLP-adapter to distill knowledge and adapt these representations to recognition tasks. In addition, we constructed a self-constrained loss to address the inherent problem of intraclass and interclass distance ambiguity that causes the representation to fall into an equivocal decision margin. Following label-free pretraining of CVGSSL, the downstream model adapts to one-shot to four-shot defect recognition tasks through fine-tuning. Experimental results demonstrate that CVGSSL outperforms state-of-the-art SSL methods across three public metal surface defect datasets, with the efficacy of the approach validated through extensive ablation experiments.| File | Dimensione | Formato | |
|---|---|---|---|
|
tii25.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
7.58 MB
Formato
Adobe PDF
|
7.58 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




