Исследование методов прогнозирования временных рядов для предсказания качества воздуха: объяснительный сравнительный анализ

Dongfang Qi; Vladimir M. Bure

doi:10.21638/spbu10.2024.206

Authors

Dongfang Qi St. Petersburg State University, 7–9, Universitetskaya nab., St. Petersburg, 199034, Russian Federation https://orcid.org/0000-0002-8017-0151
Vladimir M. Bure St. Petersburg State University, 7–9, Universitetskaya nab., St. Petersburg, 199034, Russian Federation

DOI:

https://doi.org/10.21638/spbu10.2024.206

Abstract

This study explores the effectiveness of time series forecasting models for predicting air quality using datasets from a Purple Air Dual Laser Air Quality Sensor and the Kaggle Online platform. These datasets contain reliable and real sensor records, ensuring the richness of information required for environmental protection. The research focuses on identifying suitable forecast models for environmental analysis, including popular algorithm structures such as neural network models and ensemble models. Moreover, the study introduces the Explainable artificial intellect method to provide explanations for models with excellent performance indicators, thereby enhancing their trust and transparency. The performance of the models was evaluated using metrics such as mean absolute error, root mean square error, and coefficient of determination (R-squared). Results indicate that the neural network and ensemble models are effective in forecasting air quality time series. The study contributes to the body of knowledge on time series forecasting models and provides insights for future research in air quality prediction.

Keywords:

air quality, time series forecasting, neural networks, ensemble models, explainable AI

Downloads

Download data is not yet available.

References

Литература

Zhan D., Kwan Mei-Po, Zhang W., Yu X., Meng B., Liu Q. The driving factors of air quality index in China // Journal of Cleaner Production. 2018. Vol. 197. P. 1342–1351.

Saad M. Sh., Melvin A., Md Sh., Yeon A., Saad M., Rahman A., Kamarudin Yu. M. Classifying sources influencing indoor air quality (IAQ) using artificial neural network (ANN) // Sensors. 2015. Vol. 15. N 5. P. 11665–11684.

Navares R., Aznarte J. L. Predicting air quality with deep learning LSTM: Towards comprehensive models // Ecological Informatics. 2020. Vol. 55. P. 101019.

Athira V., Geetha P., Vinayakumar R., Soman K. P. Deepairnet: Applying recurrent networks for air quality prediction // Procedia Computer Science. 2018. Vol. 132. P. 1394–1403.

Di Q., Amini H., Shi L., Kloog I. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution // Environment International. 2019. Vol. 130. Iss. 0160–4120. P. 104909. https://doi.org/10.1016/j.envint.2019.104909

Fann N., Risley D. The public health context for PM2.5 and ozone air quality trends // Air Qual Atmos Health. 2013. Vol. 6. P. 1–111. https://doi.org/10.1007/s11869-010-0125-0

Wang K., Yin H., Chen Y. The effect of environmental regulation on air quality: A study of new ambient air quality standards in China // Journal of Cleaner Production. 2019. Vol. 215. Iss. 0959–6526. P. 268–279. https://doi.org/10.1016/j.jclepro.2019.01.061

Palani S., Liong Shie-Yui, Tkalich P. An ANN application for water quality forecasting // Marine Pollution Bulletin. 2008. Vol. 56. N 9. Iss. 0025–326X. P. 1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021

Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network // Physica D: Nonlinear Phenomena. 2020. Vol. 404. Iss. 0167–2789. P. 132306. https://doi.org/10.1016/j.physd.2019.132306

Dey R. S., Fathi M. Gate-variants of Gated Recurrent Unit (GRU) neural networks // 2017 IEEE 60^th International Midwest Symposium on Circuits and Systems (MWSCAS). 2017. P. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243

Chen T., Guestrin C. Xgboost: A scalable tree boosting system // Proceedings of the 22^nd International Conference on knowledge discovery and data mining. 2016. P. 785–794.

Essam Al. D. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset // International Journal of Computer and Information Engineering. 2019. Vol. 13. N 1. P. 6–10.

Meng Y., Yang N., Qian Z., Zhang G. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values // Journal of Theoretical and Applied Electronic Commerce Research. 2020. Vol. 16. N 3. P. 466–490.

Merrick L. The explanation game: Explaining machine learning models using shapley values // Machine Learning and Knowledge Extraction: 4^th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference. CD-MAKE 2020. Dublin, Ireland. August 25–28. 2020. Vol. 4. P. 17–38.

Якушев В. П., Буре В. М., Митрофанова О. А., Митрофанов Е. П. Теоретические основы вероятностно-статистического прогнозирования неблагоприятных агрометеоусловий // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2021. Т. 17. Вып. 2. С. 174–182. https://doi.org/10.21638/11701/spbu10.2021.207

Егоров Н. В., Виноградова Е. М., Доронин Г. Г. Математическое моделирование полевого катода лезвийной формы с диэлектрическим покрытием // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2023. Т. 19. Вып. 1. С. 65–71. https://doi.org/10.21638/11701/spbu10.2023.106

References

Zhan D., Kwan Mei-Po, Zhang W., Yu X., Meng B., Liu Q. The driving factors of air quality index in China. Journal of Cleaner Production, 2018, vol. 197, pp. 1342–1351.

Saad M. Sh., Melvin A., Md Sh., Yeon A., Saad M., Rahman A., Kamarudin Yu. M. Classifying sources influencing indoor air quality (IAQ) using artificial neural network (ANN). Sensors, 2015, vol. 15, no. 5, pp. 11665–11684.

Navares R., Aznarte J. L. Predicting air quality with deep learning LSTM: Towards comprehensive models. Ecological Informatics, 2020, vol. 55, p. 101019.

Athira V., Geetha P., Vinayakumar R., Soman K. P. Deepairnet: Applying recurrent networks for air quality prediction. Procedia Computer Science, 2018, vol. 132, pp. 1394–1403.

Di Q., Amini H., Shi L., Kloog I. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International, 2019, vol. 130, iss. 0160–4120, pp. 104909. https://doi.org/10.1016/j.envint.2019.104909

Fann N., Risley D. The public health context for PM2.5 and ozone air quality trends. Air Qual Atmos Health, 2013, vol. 6, pp. 1–111. https://doi.org/10.1007/s11869-010-0125-0

Wang K., Yin H., Chen Y. The effect of environmental regulation on air quality: A study of new ambient air quality standards in China. Journal of Cleaner Production, 2019, vol. 215, iss. 0959–6526, pp. 268–279. https://doi.org/10.1016/j.jclepro.2019.01.061

Palani S., Liong Shie-Yui, Tkalich P. An ANN application for water quality forecasting. Marine Pollution Bulletin, 2008, vol. 56, no. 9, iss. 0025–326X, pp. 1586–1597. https://doi.org/10.1016/j.marpolbul.2008.05.021

Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 2020, vol. 404, iss. 0167–2789, pp. 132306. https://doi.org/10.1016/j.physd.2019.132306

Dey R. S., Fathi M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. 2017 IEEE 60^th International Midwest Symposium on Circuits and Systems (MWSCAS), 2017, pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243

Chen T., Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22^nd International Conference on knowledge discovery and data mining, 2016, pp. 785–794.

Essam Al. D. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. International Journal of Computer and Information Engineering, 2019, vol. 13, no. 1, pp. 6–10.

Meng Y., Yang N., Qian Z., Zhang G. What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values. Journal of Theoretical and Applied Electronic Commerce Research, 2020, vol. 16, no. 3, pp. 466–490.

Merrick L. The explanation game: Explaining machine learning models using shapley values. Machine Learning and Knowledge Extraction: 4^th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020. Dublin, Ireland, August 25–28, 2020, vol. 4, pp. 17–38.

Iakushev V. Р., Bure V. M., Mitrofanova O. А., Mitrofanov E. P. Teoreticheskie osnovy veroiatnostno-statisticheskogo prognozirovaniia neblagopriiatnykh agrometeouslovii [Theoretical foundations of probabilistic and statistical forecasting of agrometeorological risks]. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2021, vol. 17, iss. 2, pp. 174–182. https://doi.org/10.21638/11701/spbu10.2021.207 (In Russian)

Egorov N. V., Vinogradova E. M., Doronin G. G. Matematicheskoe modelirovanie polevogo katoda lezviinoi formy s dielektricheskim pokrytiem [Blade-like field cathode with a dielectric coating mathematical modeling]. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2023, vol. 19, iss. 1, pp. 65–71. https://doi.org/10.21638/11701/spbu10.2023.106 (In Russian)