Stepwise R-Squared forward regression
Source:R/ols-stepaic-forward-regression.R
ols_step_forward_r2.Rd
Build regression model from a set of candidate predictor variables by entering predictors based on r-squared, in a stepwise manner until there is no variable left to enter any more.
Usage
ols_step_forward_r2(model, ...)
# Default S3 method
ols_step_forward_r2(
model,
include = NULL,
exclude = NULL,
progress = FALSE,
details = FALSE,
...
)
# S3 method for class 'ols_step_forward_r2'
plot(x, print_plot = TRUE, details = TRUE, digits = 3, ...)
Arguments
- model
An object of class
lm
.- ...
Other arguments.
- include
Character or numeric vector; variables to be included in selection process.
- exclude
Character or numeric vector; variables to be excluded from selection process.
- progress
Logical; if
TRUE
, will display variable selection progress.- details
Logical; if
TRUE
, will print the regression result at each step.- x
An object of class
ols_step_forward_*
.- print_plot
logical; if
TRUE
, prints the plot else returns a plot object.- digits
Number of decimal places to display.
Value
List containing the following components:
- model
final model; an object of class
lm
- metrics
selection metrics
- others
list; info used for plotting and printing
References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
See also
Other forward selection procedures:
ols_step_forward_adj_r2()
,
ols_step_forward_aic()
,
ols_step_forward_p()
,
ols_step_forward_sbc()
,
ols_step_forward_sbic()
Examples
# stepwise forward regression
model <- lm(y ~ ., data = surgical)
ols_step_forward_r2(model)
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 liver_test 771.875 777.842 616.009 0.45454 0.44405
#> 2 alc_heavy 761.439 769.395 605.506 0.56674 0.54975
#> 3 enzyme_test 750.509 760.454 595.297 0.65900 0.63854
#> 4 pindex 735.715 747.649 582.943 0.75015 0.72975
#> 5 bcs 730.620 744.543 579.638 0.78091 0.75808
#> 6 age 732.494 748.406 581.938 0.78142 0.75351
#> 7 gender 734.407 752.308 584.276 0.78177 0.74856
#> 8 alc_mod 736.390 756.280 586.665 0.78184 0.74305
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 183.883
#> R-Squared 0.782 MSE 33813.069
#> Adj. R-Squared 0.743 Coef. Var 28.691
#> Pred R-Squared 0.668 AIC 736.390
#> MAE 137.058 SBC 756.280
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6543614.824 8 817951.853 20.159 0.0000
#> Residual 1825905.713 45 40575.683
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1148.823 242.328 -4.741 0.000 -1636.896 -660.750
#> liver_test 50.413 44.959 0.136 1.121 0.268 -40.140 140.965
#> alc_heavy 320.697 85.070 0.316 3.770 0.000 149.357 492.037
#> enzyme_test 9.888 1.742 0.529 5.677 0.000 6.380 13.396
#> pindex 8.973 1.874 0.382 4.788 0.000 5.199 12.748
#> bcs 62.390 24.470 0.252 2.550 0.014 13.106 111.675
#> age -0.951 2.649 -0.027 -0.359 0.721 -6.286 4.384
#> gender 15.874 58.475 0.020 0.271 0.787 -101.900 133.648
#> alc_mod 7.713 64.956 0.010 0.119 0.906 -123.116 138.542
#> ------------------------------------------------------------------------------------------------
#>
# stepwise forward regression plot
k <- ols_step_forward_r2(model)
plot(k)
# selection metrics
k$metrics
#> step variable r2 adj_r2 aic sbc sbic
#> 1 1 liver_test 0.4545389 0.4440492 771.8753 777.8423 616.0089
#> 2 2 alc_heavy 0.5667409 0.5497504 761.4394 769.3953 605.5062
#> 3 3 enzyme_test 0.6590000 0.6385400 750.5089 760.4538 595.2974
#> 4 4 pindex 0.7501457 0.7297495 735.7146 747.6485 582.9426
#> 5 5 bcs 0.7809054 0.7580831 730.6204 744.5433 579.6377
#> 6 6 age 0.7814169 0.7535127 732.4942 748.4061 581.9383
#> 7 7 gender 0.7817703 0.7485615 734.4068 752.3077 584.2757
#> 8 8 alc_mod 0.7818387 0.7430544 736.3899 756.2797 586.6645
# extract final model
k$model
#>
#> Call:
#> lm(formula = paste(response, "~", paste(preds, collapse = " + ")),
#> data = l)
#>
#> Coefficients:
#> (Intercept) liver_test alc_heavy enzyme_test pindex bcs
#> -1148.823 50.413 320.697 9.888 8.973 62.390
#> age gender alc_mod
#> -0.951 15.874 7.713
#>
# include or exclude variables
# force variable to be included in selection process
ols_step_forward_r2(model, include = c("age"))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 liver_test 773.831 781.787 616.928 0.45498 0.43361
#> 2 alc_heavy 763.110 773.055 606.421 0.56938 0.54354
#> 3 enzyme_test 752.416 764.350 596.755 0.65959 0.63180
#> 4 pindex 737.680 751.603 585.012 0.75030 0.72429
#> 5 bcs 732.494 748.406 581.938 0.78142 0.75351
#> 6 gender 734.407 752.308 584.276 0.78177 0.74856
#> 7 alc_mod 736.390 756.280 586.665 0.78184 0.74305
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 183.883
#> R-Squared 0.782 MSE 33813.069
#> Adj. R-Squared 0.743 Coef. Var 28.691
#> Pred R-Squared 0.668 AIC 736.390
#> MAE 137.058 SBC 756.280
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6543614.824 8 817951.853 20.159 0.0000
#> Residual 1825905.713 45 40575.683
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1148.823 242.328 -4.741 0.000 -1636.896 -660.750
#> age -0.951 2.649 -0.027 -0.359 0.721 -6.286 4.384
#> liver_test 50.413 44.959 0.136 1.121 0.268 -40.140 140.965
#> alc_heavy 320.697 85.070 0.316 3.770 0.000 149.357 492.037
#> enzyme_test 9.888 1.742 0.529 5.677 0.000 6.380 13.396
#> pindex 8.973 1.874 0.382 4.788 0.000 5.199 12.748
#> bcs 62.390 24.470 0.252 2.550 0.014 13.106 111.675
#> gender 15.874 58.475 0.020 0.271 0.787 -101.900 133.648
#> alc_mod 7.713 64.956 0.010 0.119 0.906 -123.116 138.542
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_r2(model, include = c(5))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 liver_test 773.831 781.787 616.928 0.45498 0.43361
#> 2 alc_heavy 763.110 773.055 606.421 0.56938 0.54354
#> 3 enzyme_test 752.416 764.350 596.755 0.65959 0.63180
#> 4 pindex 737.680 751.603 585.012 0.75030 0.72429
#> 5 bcs 732.494 748.406 581.938 0.78142 0.75351
#> 6 gender 734.407 752.308 584.276 0.78177 0.74856
#> 7 alc_mod 736.390 756.280 586.665 0.78184 0.74305
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 183.883
#> R-Squared 0.782 MSE 33813.069
#> Adj. R-Squared 0.743 Coef. Var 28.691
#> Pred R-Squared 0.668 AIC 736.390
#> MAE 137.058 SBC 756.280
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6543614.824 8 817951.853 20.159 0.0000
#> Residual 1825905.713 45 40575.683
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1148.823 242.328 -4.741 0.000 -1636.896 -660.750
#> age -0.951 2.649 -0.027 -0.359 0.721 -6.286 4.384
#> liver_test 50.413 44.959 0.136 1.121 0.268 -40.140 140.965
#> alc_heavy 320.697 85.070 0.316 3.770 0.000 149.357 492.037
#> enzyme_test 9.888 1.742 0.529 5.677 0.000 6.380 13.396
#> pindex 8.973 1.874 0.382 4.788 0.000 5.199 12.748
#> bcs 62.390 24.470 0.252 2.550 0.014 13.106 111.675
#> gender 15.874 58.475 0.020 0.271 0.787 -101.900 133.648
#> alc_mod 7.713 64.956 0.010 0.119 0.906 -123.116 138.542
#> ------------------------------------------------------------------------------------------------
#>
# force variable to be excluded from selection process
ols_step_forward_r2(model, exclude = c("liver_test"))
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 enzyme_test 782.629 788.596 626.220 0.33435 0.32154
#> 2 bcs 766.271 774.226 609.940 0.52619 0.50761
#> 3 pindex 746.376 756.320 591.702 0.68413 0.66518
#> 4 alc_heavy 730.924 742.858 579.087 0.77136 0.75269
#> 5 age 732.339 746.262 580.934 0.77382 0.75026
#> 6 gender 733.921 749.833 582.951 0.77556 0.74691
#> 7 alc_mod 735.878 753.779 585.255 0.77574 0.74162
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.881 RMSE 186.434
#> R-Squared 0.776 MSE 34757.812
#> Adj. R-Squared 0.742 Coef. Var 28.771
#> Pred R-Squared 0.666 AIC 735.878
#> MAE 138.247 SBC 753.779
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6492598.668 7 927514.095 22.732 0.0000
#> Residual 1876921.869 46 40802.649
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1235.188 230.403 -5.361 0.000 -1698.966 -771.410
#> enzyme_test 11.120 1.355 0.595 8.205 0.000 8.392 13.848
#> bcs 80.761 18.226 0.326 4.431 0.000 44.073 117.449
#> pindex 9.914 1.680 0.422 5.900 0.000 6.531 13.297
#> alc_heavy 318.305 85.281 0.314 3.732 0.001 146.643 489.966
#> age -1.850 2.531 -0.052 -0.731 0.469 -6.946 3.245
#> gender 33.699 56.430 0.043 0.597 0.553 -79.888 147.286
#> alc_mod 12.493 64.997 0.016 0.192 0.848 -118.340 143.326
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_r2(model, exclude = c(4))
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 enzyme_test 782.629 788.596 626.220 0.33435 0.32154
#> 2 bcs 766.271 774.226 609.940 0.52619 0.50761
#> 3 pindex 746.376 756.320 591.702 0.68413 0.66518
#> 4 alc_heavy 730.924 742.858 579.087 0.77136 0.75269
#> 5 age 732.339 746.262 580.934 0.77382 0.75026
#> 6 gender 733.921 749.833 582.951 0.77556 0.74691
#> 7 alc_mod 735.878 753.779 585.255 0.77574 0.74162
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.881 RMSE 186.434
#> R-Squared 0.776 MSE 34757.812
#> Adj. R-Squared 0.742 Coef. Var 28.771
#> Pred R-Squared 0.666 AIC 735.878
#> MAE 138.247 SBC 753.779
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6492598.668 7 927514.095 22.732 0.0000
#> Residual 1876921.869 46 40802.649
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1235.188 230.403 -5.361 0.000 -1698.966 -771.410
#> enzyme_test 11.120 1.355 0.595 8.205 0.000 8.392 13.848
#> bcs 80.761 18.226 0.326 4.431 0.000 44.073 117.449
#> pindex 9.914 1.680 0.422 5.900 0.000 6.531 13.297
#> alc_heavy 318.305 85.281 0.314 3.732 0.001 146.643 489.966
#> age -1.850 2.531 -0.052 -0.731 0.469 -6.946 3.245
#> gender 33.699 56.430 0.043 0.597 0.553 -79.888 147.286
#> alc_mod 12.493 64.997 0.016 0.192 0.848 -118.340 143.326
#> ------------------------------------------------------------------------------------------------
#>
# include & exclude variables in the selection process
ols_step_forward_r2(model, include = c("age"), exclude = c("liver_test"))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 enzyme_test 783.607 791.563 626.048 0.34683 0.32121
#> 2 bcs 767.078 777.023 609.973 0.53654 0.50873
#> 3 pindex 747.171 759.105 592.354 0.69109 0.66588
#> 4 alc_heavy 732.339 746.262 580.934 0.77382 0.75026
#> 5 gender 733.921 749.833 582.951 0.77556 0.74691
#> 6 alc_mod 735.878 753.779 585.255 0.77574 0.74162
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.881 RMSE 186.434
#> R-Squared 0.776 MSE 34757.812
#> Adj. R-Squared 0.742 Coef. Var 28.771
#> Pred R-Squared 0.666 AIC 735.878
#> MAE 138.247 SBC 753.779
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6492598.668 7 927514.095 22.732 0.0000
#> Residual 1876921.869 46 40802.649
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1235.188 230.403 -5.361 0.000 -1698.966 -771.410
#> age -1.850 2.531 -0.052 -0.731 0.469 -6.946 3.245
#> enzyme_test 11.120 1.355 0.595 8.205 0.000 8.392 13.848
#> bcs 80.761 18.226 0.326 4.431 0.000 44.073 117.449
#> pindex 9.914 1.680 0.422 5.900 0.000 6.531 13.297
#> alc_heavy 318.305 85.281 0.314 3.732 0.001 146.643 489.966
#> gender 33.699 56.430 0.043 0.597 0.553 -79.888 147.286
#> alc_mod 12.493 64.997 0.016 0.192 0.848 -118.340 143.326
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_r2(model, include = c(5), exclude = c(4))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 enzyme_test 783.607 791.563 626.048 0.34683 0.32121
#> 2 bcs 767.078 777.023 609.973 0.53654 0.50873
#> 3 pindex 747.171 759.105 592.354 0.69109 0.66588
#> 4 alc_heavy 732.339 746.262 580.934 0.77382 0.75026
#> 5 gender 733.921 749.833 582.951 0.77556 0.74691
#> 6 alc_mod 735.878 753.779 585.255 0.77574 0.74162
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.881 RMSE 186.434
#> R-Squared 0.776 MSE 34757.812
#> Adj. R-Squared 0.742 Coef. Var 28.771
#> Pred R-Squared 0.666 AIC 735.878
#> MAE 138.247 SBC 753.779
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6492598.668 7 927514.095 22.732 0.0000
#> Residual 1876921.869 46 40802.649
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1235.188 230.403 -5.361 0.000 -1698.966 -771.410
#> age -1.850 2.531 -0.052 -0.731 0.469 -6.946 3.245
#> enzyme_test 11.120 1.355 0.595 8.205 0.000 8.392 13.848
#> bcs 80.761 18.226 0.326 4.431 0.000 44.073 117.449
#> pindex 9.914 1.680 0.422 5.900 0.000 6.531 13.297
#> alc_heavy 318.305 85.281 0.314 3.732 0.001 146.643 489.966
#> gender 33.699 56.430 0.043 0.597 0.553 -79.888 147.286
#> alc_mod 12.493 64.997 0.016 0.192 0.848 -118.340 143.326
#> ------------------------------------------------------------------------------------------------
#>