New application of multiple linear regression method - A case in China air quality

Authors

  • Yang He St Petersburg State University, 7-9, Universitetskaya nab., St Petersburg, 199034, Russian Federation https://orcid.org/0000-0002-1066-3575
  • Dongfang Qi St Petersburg State University, 7-9, Universitetskaya nab., St Petersburg, 199034, Russian Federation https://orcid.org/0000-0002-8017-0151
  • Vladimir M. Bure St Petersburg State University, 7-9, Universitetskaya nab., St Petersburg, 199034, Russian Federation; Agrophysical Research Institute, 14, Grazhdanskiy pr., St Petersburg, 195220, Russian Federation https://orcid.org/0000-0001-7018-4667

DOI:

https://doi.org/10.21638/11701/spbu10.2022.406

Abstract

In this paper, we propose an econometric model based on the multiple linear regression method. This research aims to evaluate the most important factors of the dependent variable. To be more specific, we consider the properties of this model, model quality, parameters test, checking the residual of the model. Then, to ensure that the prediction model is optimal, we use the backward elimination stepwise regression method to get the final model. At the same time, we also need to check the properties in each step. Finally, the results are illustrated by a real case in China air quality. The achieved model was applied to predict the 31 capital cities in Сhina's air quality index (AQI) during 2013-2019 per year. All calculations and tests were achieved by using R-studio. The dependent variable is the China's AQI. The control variables are six pollutant factors and four meteorological factors. In summary, the model shows that the most significant influencing factor of the AQI in China is PM2.5, followed by O3.

Keywords:

multiple linear regression, air pollution, AQI, hypothesis test, PM2.5, O3

Downloads

Download data is not yet available.
 

References

References

Nassiri M., Elahi T. M., Ghovvati S. Evaluation of different statistical methods using SAS software: an in silico approach for analysis of real-time PCR. Journal of Applied Statistics, 2018, vol. 45, iss. 2, pp. 306-319.

Bure V. M., Petrushin A. F., Mitrofanov E. P., Mitrofanova O. A., Denisov V. Experience with the use of mathematical statistics methods for assessment of agricultural plants status. Sel'skokhozyaistvennaya Biologiya [ Agricultural Biology], 2019, vol. 54, iss. 1, pp. 84-90. https://doi.org/10.15389/agrobiology.2019.1.84eng

Iakushev V. P., Bure V. M., Mitrofanova O. A., Mitrofanov E. P. Theoretical foundations of probabilistic and statistical forecasting of agrometeorological risks. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2021, vol. 17, iss. 2, pp. 174-182. https://doi.org/10.21638/11701/spbu10.2021.207

Iakushev V. P., Bure V. M., Mitrofanova O. A., Mitrofanov E. P. On the issue of semivariograms constructing automation for precision agriculture problems. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2020, vol. 16, iss. 2, pp. 177-185. https://doi.org/10.21638/11701/spbu10.2020.209

Ghani I. M., Ahmad S. Stepwise multiple regression method to forecast fish landing. Procedia-Social and Behavioral Sciences, 2010, vol. 8, pp. 549-554.

Bure V. M., Parilina E. M., Sedakov A. A. Metody prikladnoi statistiki v R i Excel. 3-izd. [Applied statistics methods in R and Excel]. 3rd ed. St Petersburg, Lan’ Publ., 2019, 196 p. (In Russian)

Qi D. Study of the investment attractiveness of China's regions. Management Processes and Sustainability, 2020, vol. 7, iss. 1, pp. 423-427.

Karim S. A., Kamsani N. F. Water quality index prediction using multiple linear fuzzy regression model: Case study in Perak River, Malaysia. Springer Nature, 2020, pp. 31-35.

Adrain R. Research concerning the probabilities of the errors which happen in making observations. George Long, 1814, vol. 1, no. 4, pp. 93-107.

Merriman M. On the history of the method of least squares. The Analyst, 1877, vol. 4, iss. 2, pp. 33-36.

Zyskind G., Martin F. B. On best linear estimation and general Gauss-Markov theorem in linear models with arbitrary nonnegative covariance structure. SIAM Journal on Applied Mathematics, 1969, vol. 17(6), pp. 1190-1202.

Quandt R. E. Tests of the hypothesis that a linear regression system obeys two separate regimes. Journal of the American Statistical Association, 1960, vol. 55, iss. 290, pp. 324-330.

Pope P. T., Webster J. T. The use of an F-statistic in stepwise regression procedures. Technometrics, 1972, vol. 14, iss. 2, pp. 327-340.

Bure V. M., Parilina E. M. Teoriia veroiatnosti i matematicheskaia statistika [Probability theory and mathematical statistics]. 1st ed., St Petersburg, Lan’ Publ., 2013, 416 p. (In Russian)

Royston P. Approximating the Shapiro-Wilk test for non-normality. Statistics and Computing, 1992, vol. 2, iss. 3, pp. 117-119.

Wilford L. L., Taylor D. The power of four tests of autocorrelation in the linear regression model. Journal of Econometrics, 1975, vol. 3, iss. 1, pp. 1-21.

Downloads

Published

2023-03-02

How to Cite

He, Y., Qi, D., & Bure, V. M. (2023). New application of multiple linear regression method - A case in China air quality. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 18(4), 516–526. https://doi.org/10.21638/11701/spbu10.2022.406

Issue

Section

Applied Mathematics