We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks: code summarization, statement separation, and code search. We compare with the state-of-the-art non-autoregressive and end-to-end models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of f1-score, accuracy, and MRR, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities. Meaning that these models can be used to detect code misconduct.

Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension / F. Bertolotti, W. Cazzola. - In: ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY. - ISSN 1049-331X. - 32:1(2023 Feb 13), pp. 6.1-6.31. [10.1145/3514232]

Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension

F. Bertolotti
Primo
;
W. Cazzola
Ultimo
2023

Abstract

We introduce a novel approach to source code representation to be used in combination with neural networks. Such a representation is designed to permit the production of a continuous vector for each code statement. In particular, we present how the representation is produced in the case of Java source code. We test our representation for three tasks: code summarization, statement separation, and code search. We compare with the state-of-the-art non-autoregressive and end-to-end models for these tasks. We conclude that all tasks benefit from the proposed representation to boost their performance in terms of f1-score, accuracy, and MRR, respectively. Moreover, we show how models trained on code summarization and models trained on statement separation can be combined to address methods with tangled responsibilities. Meaning that these models can be used to detect code misconduct.
No
English
Machine Learninig; Neural Networks; Big Code; Learning Representations; Method Name Suggestion; Intent identiication;
Settore INF/01 - Informatica
Articolo
Esperti anonimi
Ricerca applicata
Pubblicazione scientifica
   Typeful Language Adaptation for Dynamic, Interacting and Evolving Systems
   T-LADIES
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2020TL3X8X_001

   DSurf: Scalable Computational Methods for 3D Printing Surfaces
   MINISTERO DELL'ISTRUZIONE E DEL MERITO
   2015B8TRFM_003 - PE6
13-feb-2023
6-apr-2022
ACM
32
1
6
1
31
31
Pubblicato
Periodico con rilevanza internazionale
crossref
Aderisco
info:eu-repo/semantics/article
Fold2Vec: Towards a Statement Based Representation of Code for Code Comprehension / F. Bertolotti, W. Cazzola. - In: ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY. - ISSN 1049-331X. - 32:1(2023 Feb 13), pp. 6.1-6.31. [10.1145/3514232]
open
Prodotti della ricerca::01 - Articolo su periodico
2
262
Article (author)
Periodico con Impact Factor
F. Bertolotti, W. Cazzola
File in questo prodotto:
File Dimensione Formato  
tosem22-published.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.52 MB
Formato Adobe PDF
1.52 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/922076
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact