Build regression model from a set of candidate predictor variables by entering predictors based on akaike information criterion, in a stepwise manner until there is no variable left to enter any more.
Usage
ols_step_forward_aic(model, ...)
# S3 method for default
ols_step_forward_aic(
model,
include = NULL,
exclude = NULL,
progress = FALSE,
details = FALSE,
...
)
# S3 method for ols_step_forward_aic
plot(x, print_plot = TRUE, details = TRUE, digits = 3, ...)
Arguments
- model
An object of class
lm
.- ...
Other arguments.
- include
Character or numeric vector; variables to be included in selection process.
- exclude
Character or numeric vector; variables to be excluded from selection process.
- progress
Logical; if
TRUE
, will display variable selection progress.- details
Logical; if
TRUE
, will print the regression result at each step.- x
An object of class
ols_step_forward_*
.- print_plot
logical; if
TRUE
, prints the plot else returns a plot object.- digits
Number of decimal places to display.
Value
List containing the following components:
- model
final model; an object of class
lm
- metrics
selection metrics
- others
list; info used for plotting and printing
References
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
See also
Other forward selection procedures:
ols_step_forward_adj_r2()
,
ols_step_forward_p()
,
ols_step_forward_r2()
,
ols_step_forward_sbc()
,
ols_step_forward_sbic()
Examples
# stepwise forward regression
model <- lm(y ~ ., data = surgical)
ols_step_forward_aic(model)
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 liver_test 771.875 777.842 616.009 0.45454 0.44405
#> 2 alc_heavy 761.439 769.395 605.506 0.56674 0.54975
#> 3 enzyme_test 750.509 760.454 595.297 0.65900 0.63854
#> 4 pindex 735.715 747.649 582.943 0.75015 0.72975
#> 5 bcs 730.620 744.543 579.638 0.78091 0.75808
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 184.276
#> R-Squared 0.781 MSE 38202.426
#> Adj. R-Squared 0.758 Coef. Var 27.839
#> Pred R-Squared 0.700 AIC 730.620
#> MAE 137.656 SBC 744.543
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6535804.090 5 1307160.818 34.217 0.0000
#> Residual 1833716.447 48 38202.426
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1178.330 208.682 -5.647 0.000 -1597.914 -758.746
#> liver_test 58.064 40.144 0.156 1.446 0.155 -22.652 138.779
#> alc_heavy 317.848 71.634 0.314 4.437 0.000 173.818 461.878
#> enzyme_test 9.748 1.656 0.521 5.887 0.000 6.419 13.077
#> pindex 8.924 1.808 0.380 4.935 0.000 5.288 12.559
#> bcs 59.864 23.060 0.241 2.596 0.012 13.498 106.230
#> ------------------------------------------------------------------------------------------------
#>
# stepwise forward regression plot
k <- ols_step_forward_aic(model)
plot(k)
# selection metrics
k$metrics
#> step variable r2 adj_r2 aic sbc sbic
#> 1 1 liver_test 0.4545389 0.4440492 771.8753 777.8423 616.0089
#> 2 2 alc_heavy 0.5667409 0.5497504 761.4394 769.3953 605.5062
#> 3 3 enzyme_test 0.6590000 0.6385400 750.5089 760.4538 595.2974
#> 4 4 pindex 0.7501457 0.7297495 735.7146 747.6485 582.9426
#> 5 5 bcs 0.7809054 0.7580831 730.6204 744.5433 579.6377
# extract final model
k$model
#>
#> Call:
#> lm(formula = paste(response, "~", paste(preds, collapse = " + ")),
#> data = l)
#>
#> Coefficients:
#> (Intercept) liver_test alc_heavy enzyme_test pindex bcs
#> -1178.330 58.064 317.848 9.748 8.924 59.864
#>
# include or exclude variables
# force variable to be included in selection process
ols_step_forward_aic(model, include = c("age"))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 liver_test 773.831 781.787 616.928 0.45498 0.43361
#> 2 alc_heavy 763.110 773.055 606.421 0.56938 0.54354
#> 3 enzyme_test 752.416 764.350 596.755 0.65959 0.63180
#> 4 pindex 737.680 751.603 585.012 0.75030 0.72429
#> 5 bcs 732.494 748.406 581.938 0.78142 0.75351
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 184.061
#> R-Squared 0.781 MSE 38924.162
#> Adj. R-Squared 0.754 Coef. Var 28.101
#> Pred R-Squared 0.692 AIC 732.494
#> MAE 138.160 SBC 748.406
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6540084.920 6 1090014.153 28.004 0.0000
#> Residual 1829435.617 47 38924.162
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1143.080 235.943 -4.845 0.000 -1617.737 -668.424
#> age -0.850 2.563 -0.024 -0.332 0.742 -6.007 4.307
#> liver_test 54.053 42.288 0.146 1.278 0.207 -31.019 139.125
#> alc_heavy 314.585 72.974 0.310 4.311 0.000 167.781 461.390
#> enzyme_test 9.852 1.700 0.527 5.794 0.000 6.431 13.273
#> pindex 8.974 1.832 0.382 4.900 0.000 5.290 12.659
#> bcs 61.424 23.748 0.248 2.586 0.013 13.649 109.199
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_aic(model, include = c(5))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 liver_test 773.831 781.787 616.928 0.45498 0.43361
#> 2 alc_heavy 763.110 773.055 606.421 0.56938 0.54354
#> 3 enzyme_test 752.416 764.350 596.755 0.65959 0.63180
#> 4 pindex 737.680 751.603 585.012 0.75030 0.72429
#> 5 bcs 732.494 748.406 581.938 0.78142 0.75351
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.884 RMSE 184.061
#> R-Squared 0.781 MSE 38924.162
#> Adj. R-Squared 0.754 Coef. Var 28.101
#> Pred R-Squared 0.692 AIC 732.494
#> MAE 138.160 SBC 748.406
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6540084.920 6 1090014.153 28.004 0.0000
#> Residual 1829435.617 47 38924.162
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1143.080 235.943 -4.845 0.000 -1617.737 -668.424
#> age -0.850 2.563 -0.024 -0.332 0.742 -6.007 4.307
#> liver_test 54.053 42.288 0.146 1.278 0.207 -31.019 139.125
#> alc_heavy 314.585 72.974 0.310 4.311 0.000 167.781 461.390
#> enzyme_test 9.852 1.700 0.527 5.794 0.000 6.431 13.273
#> pindex 8.974 1.832 0.382 4.900 0.000 5.290 12.659
#> bcs 61.424 23.748 0.248 2.586 0.013 13.649 109.199
#> ------------------------------------------------------------------------------------------------
#>
# force variable to be excluded from selection process
ols_step_forward_aic(model, exclude = c("liver_test"))
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 enzyme_test 782.629 788.596 626.220 0.33435 0.32154
#> 2 bcs 766.271 774.226 609.940 0.52619 0.50761
#> 3 pindex 746.376 756.320 591.702 0.68413 0.66518
#> 4 alc_heavy 730.924 742.858 579.087 0.77136 0.75269
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.878 RMSE 188.249
#> R-Squared 0.771 MSE 39053.801
#> Adj. R-Squared 0.753 Coef. Var 28.147
#> Pred R-Squared 0.695 AIC 730.924
#> MAE 140.619 SBC 742.858
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6455884.265 4 1613971.066 41.327 0.0000
#> Residual 1913636.272 49 39053.801
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1334.424 180.589 -7.389 0.000 -1697.332 -971.516
#> enzyme_test 11.243 1.308 0.601 8.596 0.000 8.614 13.871
#> bcs 81.439 17.781 0.329 4.580 0.000 45.706 117.171
#> pindex 10.131 1.622 0.431 6.246 0.000 6.871 13.390
#> alc_heavy 312.777 72.341 0.309 4.324 0.000 167.402 458.152
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_aic(model, exclude = c(4))
#>
#>
#> Stepwise Summary
#> --------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> --------------------------------------------------------------------------
#> 0 Base Model 802.606 806.584 646.794 0.00000 0.00000
#> 1 enzyme_test 782.629 788.596 626.220 0.33435 0.32154
#> 2 bcs 766.271 774.226 609.940 0.52619 0.50761
#> 3 pindex 746.376 756.320 591.702 0.68413 0.66518
#> 4 alc_heavy 730.924 742.858 579.087 0.77136 0.75269
#> --------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.878 RMSE 188.249
#> R-Squared 0.771 MSE 39053.801
#> Adj. R-Squared 0.753 Coef. Var 28.147
#> Pred R-Squared 0.695 AIC 730.924
#> MAE 140.619 SBC 742.858
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6455884.265 4 1613971.066 41.327 0.0000
#> Residual 1913636.272 49 39053.801
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1334.424 180.589 -7.389 0.000 -1697.332 -971.516
#> enzyme_test 11.243 1.308 0.601 8.596 0.000 8.614 13.871
#> bcs 81.439 17.781 0.329 4.580 0.000 45.706 117.171
#> pindex 10.131 1.622 0.431 6.246 0.000 6.871 13.390
#> alc_heavy 312.777 72.341 0.309 4.324 0.000 167.402 458.152
#> ------------------------------------------------------------------------------------------------
#>
# include & exclude variables in the selection process
ols_step_forward_aic(model, include = c("age"), exclude = c("liver_test"))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 enzyme_test 783.607 791.563 626.048 0.34683 0.32121
#> 2 bcs 767.078 777.023 609.973 0.53654 0.50873
#> 3 pindex 747.171 759.105 592.354 0.69109 0.66588
#> 4 alc_heavy 732.339 746.262 580.934 0.77382 0.75026
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.880 RMSE 187.233
#> R-Squared 0.774 MSE 39438.163
#> Adj. R-Squared 0.750 Coef. Var 28.286
#> Pred R-Squared 0.688 AIC 732.339
#> MAE 140.528 SBC 746.262
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6476488.730 5 1295297.746 32.844 0.0000
#> Residual 1893031.807 48 39438.163
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1237.653 225.517 -5.488 0.000 -1691.085 -784.221
#> age -1.787 2.473 -0.050 -0.723 0.473 -6.759 3.184
#> enzyme_test 11.244 1.314 0.601 8.555 0.000 8.601 13.887
#> bcs 81.587 17.870 0.329 4.566 0.000 45.657 117.516
#> pindex 10.062 1.633 0.428 6.163 0.000 6.779 13.344
#> alc_heavy 306.655 73.188 0.303 4.190 0.000 159.500 453.809
#> ------------------------------------------------------------------------------------------------
#>
# use index of variable instead of name
ols_step_forward_aic(model, include = c(5), exclude = c(4))
#>
#>
#> Stepwise Summary
#> ---------------------------------------------------------------------------
#> Step Variable AIC SBC SBIC R2 Adj. R2
#> ---------------------------------------------------------------------------
#> 0 Base Model 803.834 809.801 646.572 0.01420 -0.00476
#> 1 enzyme_test 783.607 791.563 626.048 0.34683 0.32121
#> 2 bcs 767.078 777.023 609.973 0.53654 0.50873
#> 3 pindex 747.171 759.105 592.354 0.69109 0.66588
#> 4 alc_heavy 732.339 746.262 580.934 0.77382 0.75026
#> ---------------------------------------------------------------------------
#>
#> Final Model Output
#> ------------------
#>
#> Model Summary
#> -------------------------------------------------------------------
#> R 0.880 RMSE 187.233
#> R-Squared 0.774 MSE 39438.163
#> Adj. R-Squared 0.750 Coef. Var 28.286
#> Pred R-Squared 0.688 AIC 732.339
#> MAE 140.528 SBC 746.262
#> -------------------------------------------------------------------
#> RMSE: Root Mean Square Error
#> MSE: Mean Square Error
#> MAE: Mean Absolute Error
#> AIC: Akaike Information Criteria
#> SBC: Schwarz Bayesian Criteria
#>
#> ANOVA
#> -----------------------------------------------------------------------
#> Sum of
#> Squares DF Mean Square F Sig.
#> -----------------------------------------------------------------------
#> Regression 6476488.730 5 1295297.746 32.844 0.0000
#> Residual 1893031.807 48 39438.163
#> Total 8369520.537 53
#> -----------------------------------------------------------------------
#>
#> Parameter Estimates
#> ------------------------------------------------------------------------------------------------
#> model Beta Std. Error Std. Beta t Sig lower upper
#> ------------------------------------------------------------------------------------------------
#> (Intercept) -1237.653 225.517 -5.488 0.000 -1691.085 -784.221
#> age -1.787 2.473 -0.050 -0.723 0.473 -6.759 3.184
#> enzyme_test 11.244 1.314 0.601 8.555 0.000 8.601 13.887
#> bcs 81.587 17.870 0.329 4.566 0.000 45.657 117.516
#> pindex 10.062 1.633 0.428 6.163 0.000 6.779 13.344
#> alc_heavy 306.655 73.188 0.303 4.190 0.000 159.500 453.809
#> ------------------------------------------------------------------------------------------------
#>