Metal surface defect recognition (MSDR) based on deep learning encounters the challenge of few-shot expert-labeled data. In this study, we proposed a CLIP-vision guided self supervised learning (CVGSSL) framework for representation learning of unlabeled data, completing MSDR using few-shot labeled data. This framework initially generates rich and diverse representation information through multiple CLIP-Vs to ensure effective SSL pretraining, followed by the design of an MLP-adapter to distill knowledge and adapt these representations to recognition tasks. In addition, we constructed a self-constrained loss to address the inherent problem of intraclass and interclass distance ambiguity that causes the representation to fall into an equivocal decision margin. Following label-free pretraining of CVGSSL, the downstream model adapts to one-shot to four-shot defect recognition tasks through fine-tuning. Experimental results demonstrate that CVGSSL outperforms state-of-the-art SSL methods across three public metal surface defect datasets, with the efficacy of the approach validated through extensive ablation experiments.

CLIP-Vision Guided Few-Shot Metal Surface Defect Recognition / T. Wang, Z. Li, Y. Xu, Y. Zhai, X. Xing, K. Guo, P. Coscia, A. Genovese, V. Piuri, F. Scotti. - In: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS. - ISSN 1551-3203. - 21:5(2025), pp. 4273-4284. [10.1109/tii.2025.3547353]

CLIP-Vision Guided Few-Shot Metal Surface Defect Recognition

P. Coscia;A. Genovese;V. Piuri;F. Scotti
2025

Abstract

Metal surface defect recognition (MSDR) based on deep learning encounters the challenge of few-shot expert-labeled data. In this study, we proposed a CLIP-vision guided self supervised learning (CVGSSL) framework for representation learning of unlabeled data, completing MSDR using few-shot labeled data. This framework initially generates rich and diverse representation information through multiple CLIP-Vs to ensure effective SSL pretraining, followed by the design of an MLP-adapter to distill knowledge and adapt these representations to recognition tasks. In addition, we constructed a self-constrained loss to address the inherent problem of intraclass and interclass distance ambiguity that causes the representation to fall into an equivocal decision margin. Following label-free pretraining of CVGSSL, the downstream model adapts to one-shot to four-shot defect recognition tasks through fine-tuning. Experimental results demonstrate that CVGSSL outperforms state-of-the-art SSL methods across three public metal surface defect datasets, with the efficacy of the approach validated through extensive ablation experiments.
Contrastive language-image pretraining (CLIP); deep learning; few-shot; self-supervised learning (SSL); surface defect recognition;
Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
2025
28-mar-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
tii25.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 7.58 MB
Formato Adobe PDF
7.58 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1157198
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex 1
social impact