Explanatory comparative analysis of time series forecasting algorithms for air quality prediction
DOI:
https://doi.org/10.21638/spbu10.2024.206Abstract
This study explores the effectiveness of time series forecasting models for predicting air quality using datasets from a Purple Air Dual Laser Air Quality Sensor and the Kaggle Online platform. These datasets contain reliable and real sensor records, ensuring the richness of information required for environmental protection. The research focuses on identifying suitable forecast models for environmental analysis, including popular algorithm structures such as neural network models and ensemble models. Moreover, the study introduces the Explainable artificial intellect method to provide explanations for models with excellent performance indicators, thereby enhancing their trust and transparency. The performance of the models was evaluated using metrics such as mean absolute error, root mean square error, and coefficient of determination (R-squared). Results indicate that the neural network and ensemble models are effective in forecasting air quality time series. The study contributes to the body of knowledge on time series forecasting models and provides insights for future research in air quality prediction.
Keywords:
air quality, time series forecasting, neural networks, ensemble models, explainable AI
Downloads
References
Zhan D., Kwan Mei-Po, Zhang W., Yu X., Meng B., Liu Q. The driving factors of air quality index in China // Journal of Cleaner Production. 2018. Vol. 197. P. 1342–1351.
Saad M. Sh., Melvin A., Md Sh., Yeon A., Saad M., Rahman A., Kamarudin Yu. M. Classifying sources influencing indoor air quality (IAQ) using artificial neural network (ANN) // Sensors. 2015. Vol. 15. N 5. P. 11665–11684.
Navares R., Aznarte J. L. Predicting air quality with deep learning LSTM: Towards comprehensive models // Ecological Informatics. 2020. Vol. 55. P. 101019.
Athira V., Geetha P., Vinayakumar R., Soman K. P. Deepairnet: Applying recurrent networks for air quality prediction // Procedia Computer Science. 2018. Vol. 132. P. 1394–1403.
Di Q., Amini H., Shi L., Kloog I. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution // Environment International. 2019. Vol. 130. Iss. 0160–4120. P. 104909. https://doi.org/10.1016/j.envint.2019.104909
Fann N., Risley D. The public health context for PM2.5 and ozone air quality trends // Air Qual Atmos Health. 2013. Vol. 6. P. 1–111. https://doi.org/10.1007/s11869-010-0125-0
Wang K., Yin H., Chen Y. The effect of environmental regulation on air quality: A study of new ambient air quality standards in China // Journal of Cleaner Production. 2019. Vol. 215. Iss. 0959–6526. P. 268–279. https://doi.org/10.1016/j.jclepro.2019.01.061
Palani S., Liong Shie-Yui, Tkalich P. An ANN application for water quality forecasting // Marine Pollution Bulletin. 2008. Vol. 56. N 9. Iss. 0025–326X. P. 1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021
Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network // Physica D: Nonlinear Phenomena. 2020. Vol. 404. Iss. 0167–2789. P. 132306. https://doi.org/10.1016/j.physd.2019.132306
Dey R. S., Fathi M. Gate-variants of Gated Recurrent Unit (GRU) neural networks // 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS). 2017. P. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
Chen T., Guestrin C. Xgboost: A scalable tree boosting system // Proceedings of the 22nd International Conference on knowledge discovery and data mining. 2016. P. 785–794.
Essam Al. D. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset // International Journal of Computer and Information Engineering. 2019. Vol. 13. N 1. P. 6–10.
Meng Y., Yang N., Qian Z., Zhang G. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values // Journal of Theoretical and Applied Electronic Commerce Research. 2020. Vol. 16. N 3. P. 466–490.
Merrick L. The explanation game: Explaining machine learning models using shapley values // Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference. CD-MAKE 2020. Dublin, Ireland. August 25–28. 2020. Vol. 4. P. 17–38.
Якушев В. П., Буре В. М., Митрофанова О. А., Митрофанов Е. П. Теоретические основы вероятностно-статистического прогнозирования неблагоприятных агрометеоусловий // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2021. Т. 17. Вып. 2. С. 174–182. https://doi.org/10.21638/11701/spbu10.2021.207
Егоров Н. В., Виноградова Е. М., Доронин Г. Г. Математическое моделирование полевого катода лезвийной формы с диэлектрическим покрытием // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2023. Т. 19. Вып. 1. С. 65–71. https://doi.org/10.21638/11701/spbu10.2023.106
References
Zhan D., Kwan Mei-Po, Zhang W., Yu X., Meng B., Liu Q. The driving factors of air quality index in China. Journal of Cleaner Production, 2018, vol. 197, pp. 1342–1351.
Saad M. Sh., Melvin A., Md Sh., Yeon A., Saad M., Rahman A., Kamarudin Yu. M. Classifying sources influencing indoor air quality (IAQ) using artificial neural network (ANN). Sensors, 2015, vol. 15, no. 5, pp. 11665–11684.
Navares R., Aznarte J. L. Predicting air quality with deep learning LSTM: Towards comprehensive models. Ecological Informatics, 2020, vol. 55, p. 101019.
Athira V., Geetha P., Vinayakumar R., Soman K. P. Deepairnet: Applying recurrent networks for air quality prediction. Procedia Computer Science, 2018, vol. 132, pp. 1394–1403.
Di Q., Amini H., Shi L., Kloog I. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International, 2019, vol. 130, iss. 0160–4120, pp. 104909. https://doi.org/10.1016/j.envint.2019.104909
Fann N., Risley D. The public health context for PM2.5 and ozone air quality trends. Air Qual Atmos Health, 2013, vol. 6, pp. 1–111. https://doi.org/10.1007/s11869-010-0125-0
Wang K., Yin H., Chen Y. The effect of environmental regulation on air quality: A study of new ambient air quality standards in China. Journal of Cleaner Production, 2019, vol. 215, iss. 0959–6526, pp. 268–279. https://doi.org/10.1016/j.jclepro.2019.01.061
Palani S., Liong Shie-Yui, Tkalich P. An ANN application for water quality forecasting. Marine Pollution Bulletin, 2008, vol. 56, no. 9, iss. 0025–326X, pp. 1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021
Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 2020, vol. 404, iss. 0167–2789, pp. 132306. https://doi.org/10.1016/j.physd.2019.132306
Dey R. S., Fathi M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 2017, pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
Chen T., Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd International Conference on knowledge discovery and data mining, 2016, pp. 785–794.
Essam Al. D. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. International Journal of Computer and Information Engineering, 2019, vol. 13, no. 1, pp. 6–10.
Meng Y., Yang N., Qian Z., Zhang G. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values. Journal of Theoretical and Applied Electronic Commerce Research, 2020, vol. 16, no. 3, pp. 466–490.
Merrick L. The explanation game: Explaining machine learning models using shapley values. Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020. Dublin, Ireland, August 25–28, 2020, vol. 4, pp. 17–38.
Iakushev V. Р., Bure V. M., Mitrofanova O. А., Mitrofanov E. P. Teoreticheskie osnovy veroiatnostno-statisticheskogo prognozirovaniia neblagopriiatnykh agrometeouslovii [Theoretical foundations of probabilistic and statistical forecasting of agrometeorological risks]. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2021, vol. 17, iss. 2, pp. 174–182. https://doi.org/10.21638/11701/spbu10.2021.207 (In Russian)
Egorov N. V., Vinogradova E. M., Doronin G. G. Matematicheskoe modelirovanie polevogo katoda lezviinoi formy s dielektricheskim pokrytiem [Blade-like field cathode with a dielectric coating mathematical modeling]. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2023, vol. 19, iss. 1, pp. 65–71. https://doi.org/10.21638/11701/spbu10.2023.106 (In Russian)
Downloads
Published
How to Cite
Issue
Section
License
Articles of "Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes" are open access distributed under the terms of the License Agreement with Saint Petersburg State University, which permits to the authors unrestricted distribution and self-archiving free of charge.