(EST 2019 Forecast)Performance of Prediction Algorithms for Modeling Outdoor Air Pollution Spatial S

2021-04-19

标签:Surfaces training EST predictors R2 Spatial model data regression


 However, in the past decade linear (stepwise) regression methods have been criticized for their lack of flexibility, their ignorance of potential interaction between predictors, and their limited ability to incorporate highly correlated predictors. 



Higher training data R2 did not equate to higher test R2 for the external long-term average exposure estimates, making the argument that external validation data are critical to compare model performance. 



LUR modeling is an empirical technique with the measured concentration of a pollutant as dependent variable and potential predictors such as road type, traffic count, elevation, and land cover as independent variables in a multiple regression model.



These models are simple, fast, and often provide interpretable coefficients of predictors.



Standard linear regression methods are prone to overfitting, especially when few training sites are used for model development along with a large number of predictor variables. Linear regression therefore may not identify the optimal model.



Machine learning techniques (such as neural networks and random forests) offer possibilities to create spatial models of air pollutants by learning the underlying relationships in a training data set, without any predefined constrictions. 



KRLS had a higher training model R2 compared to linear regression, but differences decreased when external data were used to compare predictions.



As each modeling algorithm is likely not optimal, we also explored if a combination of models (stacking) could increase predictive performance.



