The accurate prediction of plant height is crucial for optimizing soybean cultivar selection and improving yield estimations. In this study, we investigate the potential of machine learning (ML) algorithms to predict soybean plant height (PH) based on a diverse set of agronomic parameters analyzed from forty soybean cultivars evaluated across sequential harvests. Using a comprehensive dataset, the models Elastic Net (EN), Extra Trees (ET), Gaussian Process Regressor (GPR), K-Nearest Neighbors, and XGBoost (XGB) were compared in terms of predictive accuracy, uncertainty, and robustness. Our results demonstrate that ET outperformed other models with an average correlation coefficient of 0.674, R2 of 0.426 and the lowest RMSE of 6.859 cm and MAE of 5.361 cm, while also showing the lowest uncertainty (5.07%). The proposed ML framework includes an extensive model evaluation pipeline that incorporates the Performance Index (PI), ANOVA, and feature importance analysis, providing a multidimensional perspective on model behavior. The most influential features for PH prediction were the number of stems (NS) and insertion of the first pod (IFP). This research highlights the viability of integrating explainable ML techniques into agricultural decision support systems, enabling data-driven strategies for cultivar evaluation and phenotypic trait forecasting.
Machine Learning-Based Prediction of Soybean Plant Height from Agronomic Traits Across Sequential Harvests / B.R. De Oliveira, R.L. Sobrinho, F.R.T. Ferreira, F.F. Putti, M. Bodini, C.M. Saporetti, L. Goliatt. - In: AGRIENGINEERING. - ISSN 2624-7402. - 7:12(2025 Dec), pp. 408.1-408.23. [10.3390/agriengineering7120408]
Machine Learning-Based Prediction of Soybean Plant Height from Agronomic Traits Across Sequential Harvests
M. Bodini;
2025
Abstract
The accurate prediction of plant height is crucial for optimizing soybean cultivar selection and improving yield estimations. In this study, we investigate the potential of machine learning (ML) algorithms to predict soybean plant height (PH) based on a diverse set of agronomic parameters analyzed from forty soybean cultivars evaluated across sequential harvests. Using a comprehensive dataset, the models Elastic Net (EN), Extra Trees (ET), Gaussian Process Regressor (GPR), K-Nearest Neighbors, and XGBoost (XGB) were compared in terms of predictive accuracy, uncertainty, and robustness. Our results demonstrate that ET outperformed other models with an average correlation coefficient of 0.674, R2 of 0.426 and the lowest RMSE of 6.859 cm and MAE of 5.361 cm, while also showing the lowest uncertainty (5.07%). The proposed ML framework includes an extensive model evaluation pipeline that incorporates the Performance Index (PI), ANOVA, and feature importance analysis, providing a multidimensional perspective on model behavior. The most influential features for PH prediction were the number of stems (NS) and insertion of the first pod (IFP). This research highlights the viability of integrating explainable ML techniques into agricultural decision support systems, enabling data-driven strategies for cultivar evaluation and phenotypic trait forecasting.| File | Dimensione | Formato | |
|---|---|---|---|
|
agriengineering-07-00408.pdf
accesso aperto
Descrizione: Versione disponibile online
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
1.75 MB
Formato
Adobe PDF
|
1.75 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




