Accurate prediction of river water quality is fundamental to environmental sustainability and public health, particularly amid increasing freshwater scarcity. This study develops a robust Machine Learning (ML) framework to forecast the River Pollution Index (RPI) using a comprehensive 36-year national dataset from Taiwan’s Environmental Protection Administration, covering over 500 monitoring stations. We conducted a systematic comparison of ensemble methods (CatBoost, XGBoost, NGBoost) and non-ensemble benchmarks (SVM, ElasticNet, and 1D CNN). Hyperparameters were optimized via Bayesian optimization, and statistical significance was ensured by evaluating model stability using a suite of complementary indicators across 30 independent experimental runs. The results demonstrated the consistent superiority of ensemble models over non-ensemble counterparts. Among them, CatBoost achieved the highest accuracy and stability, reducing prediction error by approximately 20% relative to SVM and ElasticNet. These findings highlight the capacity of ensemble learning techniques to capture complex, non-linear interactions inherent in water quality data. The study makes two principal contributions: (1) the systematic implementation, optimization, and comparison of ensemble and non-ensemble ML models for river pollution prediction on a long-term national dataset; and (2) the identification of ensemble-based methods, particularly CatBoost, as robust and data-driven tools to enhance RPI forecasting and to support informed decision-making in sustainable water resource management.
A comparative study of ensemble and non-ensemble machine learning methods for predicting river pollution index / L.S.R. Nogueira, M.A.S. De Carvalho, B.D.O. Santos, R. Yonaba, A. Bamal, M.G. Uddin, M. Bodini, L. Goliatt. - In: ECOLOGICAL INFORMATICS. - ISSN 1574-9541. - 94:(2026 Mar), pp. 103617.1-103617.18. [10.1016/j.ecoinf.2026.103617]
A comparative study of ensemble and non-ensemble machine learning methods for predicting river pollution index
M. Bodini
;
2026
Abstract
Accurate prediction of river water quality is fundamental to environmental sustainability and public health, particularly amid increasing freshwater scarcity. This study develops a robust Machine Learning (ML) framework to forecast the River Pollution Index (RPI) using a comprehensive 36-year national dataset from Taiwan’s Environmental Protection Administration, covering over 500 monitoring stations. We conducted a systematic comparison of ensemble methods (CatBoost, XGBoost, NGBoost) and non-ensemble benchmarks (SVM, ElasticNet, and 1D CNN). Hyperparameters were optimized via Bayesian optimization, and statistical significance was ensured by evaluating model stability using a suite of complementary indicators across 30 independent experimental runs. The results demonstrated the consistent superiority of ensemble models over non-ensemble counterparts. Among them, CatBoost achieved the highest accuracy and stability, reducing prediction error by approximately 20% relative to SVM and ElasticNet. These findings highlight the capacity of ensemble learning techniques to capture complex, non-linear interactions inherent in water quality data. The study makes two principal contributions: (1) the systematic implementation, optimization, and comparison of ensemble and non-ensemble ML models for river pollution prediction on a long-term national dataset; and (2) the identification of ensemble-based methods, particularly CatBoost, as robust and data-driven tools to enhance RPI forecasting and to support informed decision-making in sustainable water resource management.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S1574954126000233-main.pdf
accesso aperto
Descrizione: Versione disponibile online
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
3.41 MB
Formato
Adobe PDF
|
3.41 MB | Adobe PDF | Visualizza/Apri |
|
water quality data Taiwan.zip
accesso aperto
Descrizione: dataset
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
12.23 MB
Formato
Zip File
|
12.23 MB | Zip File | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




