IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In this paper we study the problem-solving ability of the Large Language Model known as GPT-3 (codename DaVinci), by considering its performance in solving tasks proposed in the ``Bebras International Challenge on Informatics and Computational Thinking''. In our experiment, GPT-3 was able to answer with a majority of correct answers about one third of the Bebras tasks we submitted to it. The linguistic fluency of GPT-3 is impressive and, at a first reading, its explanations sound coherent, on-topic and authoritative; however the answers it produced are in fact erratic and the explanations often questionable or plainly wrong. The tasks in which the system performs better are those that describe a procedure, asking to execute it on a specific instance of the problem. Tasks solvable with simple, one-step deductive reasoning are more likely to obtain better answers and explanations. Synthesis tasks, or tasks that require a more complex logical consistency get the most incorrect answers.

DaVinci Goes to Bebras: A Study on the Problem Solving Ability of GPT-3 / C. Bellettini, M. Lodi, V. Lonati, M. Monga, A. Morpurgo - In: Proceedings of the 15th International Conference on Computer Supported Education. 2: CSEDU / [a cura di] J. Jovanovic, I.-A. Chounta, J. Uhomoibhi, B. McLaren. - [s.l] : SciTePress, 2023. - ISBN 978-989-758-641-5. - pp. 59-69 (( Intervento presentato al 15. convegno International Conference on Computer Supported Education tenutosi a Praha nel 2023 [10.5220/0012007500003470].

DaVinci Goes to Bebras: A Study on the Problem Solving Ability of GPT-3

C. Bellettini^Primo;Lodi, Michael;V. Lonati;M. Monga^Penultimo;A. Morpurgo^Ultimo

2023

Abstract

In this paper we study the problem-solving ability of the Large Language Model known as GPT-3 (codename DaVinci), by considering its performance in solving tasks proposed in the ``Bebras International Challenge on Informatics and Computational Thinking''. In our experiment, GPT-3 was able to answer with a majority of correct answers about one third of the Bebras tasks we submitted to it. The linguistic fluency of GPT-3 is impressive and, at a first reading, its explanations sound coherent, on-topic and authoritative; however the answers it produced are in fact erratic and the explanations often questionable or plainly wrong. The tasks in which the system performs better are those that describe a procedure, asking to execute it on a specific instance of the problem. Tasks solvable with simple, one-step deductive reasoning are more likely to obtain better answers and explanations. Synthesis tasks, or tasks that require a more complex logical consistency get the most incorrect answers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Bebras; GPT-3; Large Language Models; Computer Science Education
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2023
			
	Enti collegati al convegno
	
				INSTICC
			
	DOI
	
				https://dx.doi.org/10.5220/0012007500003470
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
csedu.pdf accesso aperto Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore) Dimensione 157.25 kB Formato Adobe PDF Visualizza/Apri	157.25 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/967037

Citazioni

ND

3

ND

social impact