IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks: code summarization, statement separation, and code search. We compare with the state-of-the-art non-autoregressive and end-to-end models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of f1-score, accuracy, and MRR, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities. Meaning that these models can be used to detect code misconduct.

Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension / F. Bertolotti, W. Cazzola. - In: ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY. - ISSN 1049-331X. - 32:1(2023 Feb 13), pp. 6.1-6.31. [10.1145/3514232]

Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension

F. Bertolotti^Primo;W. Cazzola^Ultimo

2023

Abstract

We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks: code summarization, statement separation, and code search. We compare with the state-of-the-art non-autoregressive and end-to-end models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of f1-score, accuracy, and MRR, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities. Meaning that these models can be used to detect code misconduct.

Scheda breve

Scheda completa

Scheda completa (DC)

	Presenza di coautori internazionali
	
				No
			
	Lingua dell'articolo
	
				English
			
	Parole chiave
	
				Machine Learninig; Neural Networks; Big Code; Learning Representations; Method Name Suggestion; Intent identiication;
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Tipo
	
				Articolo
			
	Revisione (peer review)
	
				Esperti anonimi
			
	Classificazione in base al tipo di ricerca
	
				Ricerca applicata
			
	Classificazione della pubblicazione
	
				Pubblicazione scientifica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Typeful Language Adaptation for Dynamic, Interacting and Evolving Systems
								
	Acronimo
	
									T-LADIES
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2020TL3X8X_001
								
	Titolo Progetto
	
									DSurf: Scalable Computational Methods for 3D Printing Surfaces
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2015B8TRFM_003 - PE6
								
	Data di pubblicazione
	
				13-feb-2023
			
	Data ahead of print o data di stampa
	
				6-apr-2022
			
	Rivista in ANCE
	
				ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY
			
	Editore
	
				ACM
			
	Volume o annata
	
				32
			
	Fascicolo
	
				1
			
	Numero dell'articolo
	
				6
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				31
			
	Numero di pagine
	
				31
			
	Stato di pubblicazione
	
				Pubblicato
			
	Rilevanza del periodico
	
				Periodico con rilevanza internazionale
			
	DOI
	
				https://dx.doi.org/10.1145/3514232
			
	Banca dati sorgente
	
				crossref
			
	Identificativo ISI
	
				WOS:000964909700006
			
	Identificativo SCOPUS
	
				2-s2.0-85152605859
			
	Adesione alla policy Open Access di Ateneo
	
				Aderisco
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Citazione
	
				Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension / F. Bertolotti, W. Cazzola. - In: ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY. - ISSN 1049-331X. - 32:1(2023 Feb 13), pp. 6.1-6.31. [10.1145/3514232]
			
	Fulltext
	
				open
			
	Tipologia
	
				Prodotti della ricerca::01 - Articolo su periodico
			
	Numero autori
	
				2
			
	Tipologia sito docente
	
				262
			
	Tipologia
	
				Article (author)
			
	Presenza impact factor
	
				Periodico con Impact Factor
			
	Tutti gli autori
	
						F. Bertolotti, W. Cazzola
					
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
tosem22-published.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.52 MB Formato Adobe PDF Visualizza/Apri	1.52 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/922076

Citazioni

ND

9

8

ND

social impact