Speech emotion recognition (SER) has been constantly gaining attention in recent yearsdue to its potential applications in diverse fields and thanks to the possibilities offered bydeep learning technologies. However, recent studies have shown that deep learning modelscan be vulnerable to adversarial attacks. In this paper, we systematically assess this problemby examining the impact of various adversarial white-box and black-box attacks on differentlanguages and genders within the context of SER. We first propose a suitable methodologyfor audio data processing, feature extraction, and convolutional neural network long short-term memory (CNN-LSTM) architecture. The observed outcomes highlighted the considerablevulnerability of CNN-LSTM models to adversarial examples (AEs). In fact, all the consideredadversarial attacks are able to considerably reduce the performance of the constructed models.Furthermore, when assessing the efficacy of the attacks, minor differences were noted betweenthe languages analyzed as well as between male and female speech. In summary, this workcontributes to the understanding of the robustness of CNN-LSTM models, particularly in SERscenarios, and the impact of AEs. Interestingly, our findings serve as a baseline for a) developingmore robust algorithms for SER, b) designing more effective attacks, c) investigating possibledefenses, d) improved understanding of the vocal differences between different languages andgenders, and e) overall, enhancing our comprehension of the SER tas

A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models / N. Facchinetti, F. Simonetta, S. Ntalampiras. - In: INTELLIGENT COMPUTING. - ISSN 2771-5892. - (2024), pp. 1-44. [Epub ahead of print] [10.34133/icomputing.0088]

A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models

S. Ntalampiras
Ultimo
2024

Abstract

Speech emotion recognition (SER) has been constantly gaining attention in recent yearsdue to its potential applications in diverse fields and thanks to the possibilities offered bydeep learning technologies. However, recent studies have shown that deep learning modelscan be vulnerable to adversarial attacks. In this paper, we systematically assess this problemby examining the impact of various adversarial white-box and black-box attacks on differentlanguages and genders within the context of SER. We first propose a suitable methodologyfor audio data processing, feature extraction, and convolutional neural network long short-term memory (CNN-LSTM) architecture. The observed outcomes highlighted the considerablevulnerability of CNN-LSTM models to adversarial examples (AEs). In fact, all the consideredadversarial attacks are able to considerably reduce the performance of the constructed models.Furthermore, when assessing the efficacy of the attacks, minor differences were noted betweenthe languages analyzed as well as between male and female speech. In summary, this workcontributes to the understanding of the robustness of CNN-LSTM models, particularly in SERscenarios, and the impact of AEs. Interestingly, our findings serve as a baseline for a) developingmore robust algorithms for SER, b) designing more effective attacks, c) investigating possibledefenses, d) improved understanding of the vocal differences between different languages andgenders, and e) overall, enhancing our comprehension of the SER tas
Settore INF/01 - Informatica
2024
Article (author)
File in questo prodotto:
File Dimensione Formato  
icomputing.0088.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 1.64 MB
Formato Adobe PDF
1.64 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1047669
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
  • OpenAlex ND
social impact