Evaluating the performance of machine learning approaches in predicting Albanian Shkumbini River's waters using water quality index model
Abstract
A common technique for assessing the overall water quality state of surface water and groundwater systems globally is the water quality index (WQI) method. The aim of the research is to use four machine learning classifier algorithms: Gradient boosting, Naive Bayes, Random Forest, and K-Nearest Neighbour to determine which model was most effective at forecasting the various water quality index and classes of the Albanian Shkumbini River. The analysis was performed on the data collected during a 4-year period, in six monitoring points, for nine parameters.
The predictive accuracy of the models, XGBoost, Random Forest, K-Nearest Neighbour, and Naive Bayes, was determined to be 98.61%, 94.44%, 91.22%, and 94.45%, respectively. Notably, the XGBoost algorithm demonstrated superior performance in terms of F1 score, sensitivity, and prediction accuracy, the lowest errors during both learning (RMSE = 2.1, MSE = 9.8, MAE = 1.13) and evaluating (RMSE = 0.0, MSE = 0.01, MAE = 0.01) stages. The findings highlighted that Biochemical oxygen demand (BOD), Bicarbonate (HCO3), and Total Phosphor had the most positive impact on the Shkumbini River’s water quality. Additionally, a statistically significant, strong positive correlation (r = 0.85) was identified between BOD and WQI, emphasizing its crucial role in influencing water quality in the Shkumbini River.
Keyword : Water Quality Index model, Shkumbini River, machine learning classifier, model accuracy
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Aldhyani, T. H. H., Al-Yaari, M., Alkahtani H., & Maashi, M. (2020). Retraction: Water quality prediction using artificial intelligence algorithms. Applied Bionics and Biomechanics, 2020, Article 6659314. https://doi.org/10.1155/2020/6659314
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–85. https://doi.org/10.1080/00031305.1992.10475879
Azrour, M., Mabrouki, J., Fattah, G., Guezzaz A., & Aziz, F. (2021). Machine learning algorithms for efficient water quality prediction. Modeleling Earth Systems and Environment, 8, 2793–2801. https://doi.org/10.1007/s40808-021-01266-6
Bedi, S., Samal, A., Ray, C., & Snow, D. (2020). Comparative evaluation of machine learning models for groundwater quality assessment. Environmental Monitoring and Assessment, 192, Article 776. https://doi.org/10.1007/s10661-020-08695-3
Brown, R. M., Mccleiland, N. J., Deiniger R. A., & O’Connor, M. F. (1972, June 18–23). Water quality index-crossing the physical barrier. In Proceedings of the International Conference on Water Pollution Research (pp. 787–797), Jerusalem. https://doi.org/10.1016/B978-0-08-017005-3.50067-0
Chen, T., & Guestrin, C. (2016, August 13–17). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794), San Francisco. https://doi.org/10.1145/2939672.2939785
Cunningham, P., & Delany, S. J. (2007). k-Nearest neighbour classifiers. ACM Computing Surveys, 54(6), 1–25. https://doi.org/10.1145/3459665
Dadolahi-Sohrab, A., Arjomand, F., & Fadaei-Nasab, M. (2012). Water quality index as a simple indicator of watersheds pollution in southwestern part of Iran. Water and Environment Journal, 26(4), 445–454. https://doi.org/10.1111/j.1747-6593.2011.00303.x
Damo, R., & Icka, P. (2013). Evaluation of water quality index for drinking water. Polish Journal of Environmental Studies, 22(4), 1045–1051.
El Bilali, A., Taleb, A., & Brouziyne, Y. (2021). Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agricultural Water Management, 245, Article 106625. https://doi.org/10.1016/j.agwat.2020.106625
Ferreira, A. J., & Figueiredo, M. A. (2012). Boosting algorithms: A review of methods, theory, and applications. In Ensemble machine learning (pp. 35–85). Springer. https://doi.org/10.1007/978-1-4419-9326-7_2
Georgescu, P.-L., Moldovanu, S., Iticescu, C., Calmuc, M., Calmuc, V., Topa, C., & Moraru, L. (2023). Assessing and forecasting water quality in the Danube River by using neural network approaches. The Science of the Total Environment, 879, Article 162998. https://doi.org/10.1016/j.scitotenv.2023.162998
Horton, R. K. (1965). An index number system for rating water quality. Journal of the Water Pollution Control Federation, 37(3), 303–306.
International Organization for Standardization. (2018). Water quality – Sampling – Part 4: Guidance on sampling from lakes, natural and man-made (ISO Standard No. 5667-4). https://standards.iteh.ai/catalog/standards/sist/a1a7bb26-7c03-462f-a7ae-7619d48945e2/sist-iso-5667-4-2018
International Organization for Standardization. (2015). Water quality – Sampling – Part 6: Guidance on sampling of rivers and streams (ISO 5667-6). https://standards.iteh.ai/catalog/standards/sist/b8b8c606-00fc-46fb-a38f-109c197cc3b9/sist-iso-5667-6-2015
Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D. (2022). Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water, 14(10), Article 1552. https://doi.org/10.3390/w14101552
Naloufi, M., Lucas F. S., Souihi, S., Servais, P., Janne, A., & Wanderley Matos De Abreu, T. (2021). Evaluating the performance of machine learning approaches to predict the microbial quality of surface waters and to optimize the sampling effort. Water, 13(18), Article 2457. https://doi.org/10.3390/w13182457
Nayan, A.-A., Kibria, M. G., Rahman, M. O., & Saha, J. (2020, November 28–29). River water quality analysis and prediction using GBM. In Proceedings of the 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT) (pp. 219–224). IEEE. https://doi.org/10.1109/ICAICT51780.2020.9333492
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., Gupta, H. V. (2021). What role does hydrological science play in the age of machine learning? Water Resources Research, 57(3), Article e2020WR028091. https://doi.org/10.1029/2020WR028091
Parween, S., Siddique, N. A., Mahammad Diganta, M. T., Olbert, A. I., & Uddin, Md. G. (2022). Assessment of urban river water quality using modified NSF water quality index model at Siliguri city, West Bengal, India. Environmental and Sustainability Indicators, 16, Article 100202. https://doi.org/10.1016/j.indic.2022.100202
Rahman, A. (2020). Statistics for data science and policy analysis. Springer. https://doi.org/10.1007/978-981-15-1735-8
Ravindra, B., Subba Rao, N., & Dhanamjaya Rao, E. N. (2023). Groundwater quality monitoring for assessment of pollution levels and potability using WPI and WQI methods from a part of Guntur district, Andhra Pradesh, India. Environment, Development and Sustainability, 25, 14785–14815. https://doi.org/10.1007/s10668-022-02689-6
Roba, C., Rosu, C., Pistea, I., Baciu, C., Costin, D., & Ozunu, A. (2016). Transfer of heavy metals from soil to vegetables in a mining/smelting influenced area (Baia Mare – Ferneziu, Romania). Journal of Environmental Protection and Ecology, 16, 891–898.
Sain, S. R. (1996). The nature of statistical learning theory. Technometrics, 38(4), 409. https://doi.org/10.2307/1271324
Shafi, U., Mumtaz, R., Anwar, H., Qamar, A. M., & Khurshid, H. (2018, October 8–10). Surface water pollution detection using internet of things. In Proceedings 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT) (pp. 92–96). IEEE. https://doi.org/10.1109/HONET.2018.8551341
Shamsuddin, I. I. S., Othman, Z., & Sani, N. S. (2022). Water quality index classification based on machine learning: A case from the Langat River Basin model. Water, 14(19), Article 2939. https://doi.org/10.3390/w14192939
Steinhart, C. E., Schierow, L. J., & Sonzogni, W. C. (1982). An environmental quality index for the great lakes. Journal of the American Water Resources Association, 18(6), 1025–1031. https://doi.org/10.1111/j.1752-1688.1982.tb00110.x
Subba Rao, N., Sunitha, B., Das, R., & Anil Kumar, B. (2022). Monitoring the causes of pollution using groundwater quality and chemistry before and after the monsoon. Physics and Chemistry of the Earth, 128, Article 103228. https://doi.org/10.1016/j.pce.2022.103228
Sulce, S., Rroco, E., Malltezi, J., Shallari, S., Libohova, Z., Sinaj, S., & Qafoku, N. P. (2018). Water quality in Albania: An overview of sources of contamination and controlling factors. Albanian Journal of Agricultural Sciences, 2 (Special edition – Proceedings of ICOALS), 279–297.
Sutadian, A. D., Muttil, N., Yilmaz, A. G., & Perera, B. J. C. (2018). Development of a water quality index for rivers in West Java Province, Indonesia. Ecological Indicators, 85, 966–982. https://doi.org/10.1016/j.ecolind.2017.11.049
Uddin, M. G., Nash, S., & Olbert, A. I. (2021). A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122, Article 107218. https://doi.org/10.1016/j.ecolind.2020.107218
Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2022a). A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Research, 219, Article 118532. https://doi.org/10.1016/j.watres.2022.118532
Uddin, M. G., Nash, S., Mahammad Diganta, M. T., Rahman, A., & Olbert, A. I. (2022b). Robust machine learning algorithms for predicting coastal water quality index. Journal or Environmental Management, 321, Article 115923. https://doi.org/10.1016/j.jenvman.2022.115923
Uddin, G., Nash, S., & Olbert, A. I. (2022c). Optimization of parameters in a water quality index model using principal component analysis [Conference presentation]. Proceedings of the 39th IAHR World Congress, Granada, Spain. https://doi.org/10.3850/IAHR-39WC2521711920221326
Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023a). A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Research, 229, Article 119422. https://doi.org/10.1016/j.watres.2022.119422
Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023b). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Safety and Environmental Protection, 169, 808–828. https://doi.org/10.1016/j.psep.2022.11.073
Verma, R. K., Murthy, S., Tiwary, R. K., & Verma, S. (2019). Development of simplified WQIs for assessment of spatial and temporal variations of surface water quality in upper Damodar river basin, eastern India. Applied Water Science, 9, Article 21. https://doi.org/10.1007/s13201-019-0893-0
World Health Organization. (2017). Guideline for drinking water quality (4th ed., incorporating the 1st addendum). https://www.who.int/publications/i/item/9789241549950
Zela, G., Demiraj, E., Marko, O., Gjipalaj, J., Erebara, A., Malltezi, J., Zela, E., & Bani, A. (2020). Assessment of the water quality index in the Semani River in Albania. Journal of Environmental Protection, 11(11), 998–1013. https://doi.org/10.4236/jep.2020.1111063