Accurate water quality assessment is critical for environmental monitoring and public health. Conventional Water Quality Index (WQI) computation methods, however, often rely on numerous parameters and labor-intensive processes, thus limiting their practicality for rapid assessments. While Machine Learning (ML) offers promising alternatives, the development of high-performing models typically demands extensive expertise and computational resources. This study addresses the latter gap by leveraging Automated Machine Learning (AutoML), specifically the AutoGluon platform, to predict WQI from a reduced set of readily available water quality parameters. Our objectives were to (i) evaluate the predictive performance of AutoML with reduced inputs, (ii) assess model interpretability via feature importance, and (iii) propose an automated framework for efficient water quality monitoring. This study analyzed a 36-year dataset from Taiwan’s national river water quality monitoring network, focusing on four parameters: electrical conductivity (EC), suspended solids (SS), water temperature (WT), and pH. A fully automated pipeline handled model selection, hyperparameter tuning, and ensemble construction by systematically testing multiple algorithm families and stacking strategies to determine the optimal setup for each parameter. This eliminated manual intervention and delivered reproducible, data-driven results that matched the distinct spatiotemporal patterns in the long-term records. Among the evaluated algorithms, ensemble-based tree models (CatBoost, Random Forest, and XGBoost) demonstrated superior performance, achieving a mean R2 of 0.76 and low predictive errors. Feature importance analysis identified EC as the most influential predictor. Such results highlight the feasibility of reducing the number of input parameters without compromising prediction accuracy, thereby enabling faster and cost-effective water quality assessments.

Automated machine learning achieves accurate water quality prediction with reduced parameter requirements / D. Campos, V. Galvão, M.L. De Rezende, A. Braga, M. Bodini, U.R.V. Aires, R. Yonaba, L. Goliatt. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 16:(2026 Feb), pp. 4431.1-4431.25. [10.1038/s41598-025-34448-8]

Automated machine learning achieves accurate water quality prediction with reduced parameter requirements

M. Bodini;
2026

Abstract

Accurate water quality assessment is critical for environmental monitoring and public health. Conventional Water Quality Index (WQI) computation methods, however, often rely on numerous parameters and labor-intensive processes, thus limiting their practicality for rapid assessments. While Machine Learning (ML) offers promising alternatives, the development of high-performing models typically demands extensive expertise and computational resources. This study addresses the latter gap by leveraging Automated Machine Learning (AutoML), specifically the AutoGluon platform, to predict WQI from a reduced set of readily available water quality parameters. Our objectives were to (i) evaluate the predictive performance of AutoML with reduced inputs, (ii) assess model interpretability via feature importance, and (iii) propose an automated framework for efficient water quality monitoring. This study analyzed a 36-year dataset from Taiwan’s national river water quality monitoring network, focusing on four parameters: electrical conductivity (EC), suspended solids (SS), water temperature (WT), and pH. A fully automated pipeline handled model selection, hyperparameter tuning, and ensemble construction by systematically testing multiple algorithm families and stacking strategies to determine the optimal setup for each parameter. This eliminated manual intervention and delivered reproducible, data-driven results that matched the distinct spatiotemporal patterns in the long-term records. Among the evaluated algorithms, ensemble-based tree models (CatBoost, Random Forest, and XGBoost) demonstrated superior performance, achieving a mean R2 of 0.76 and low predictive errors. Feature importance analysis identified EC as the most influential predictor. Such results highlight the feasibility of reducing the number of input parameters without compromising prediction accuracy, thereby enabling faster and cost-effective water quality assessments.
Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Settore CEAR-02/A - Ingegneria sanitaria-ambientale
feb-2026
Article (author)
File in questo prodotto:
File Dimensione Formato  
s41598-025-34448-8.pdf

accesso aperto

Descrizione: Versione disponibile online
Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 6.39 MB
Formato Adobe PDF
6.39 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1215058
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 1
social impact