Skip to contents

Build regression model from a set of candidate predictor variables by entering predictors based on r-squared, in a stepwise manner until there is no variable left to enter any more.

Usage

ols_step_forward_r2(model, ...)

# Default S3 method
ols_step_forward_r2(
  model,
  include = NULL,
  exclude = NULL,
  progress = FALSE,
  details = FALSE,
  ...
)

# S3 method for class 'ols_step_forward_r2'
plot(x, print_plot = TRUE, details = TRUE, digits = 3, ...)

Arguments

model

An object of class lm.

...

Other arguments.

include

Character or numeric vector; variables to be included in selection process.

exclude

Character or numeric vector; variables to be excluded from selection process.

progress

Logical; if TRUE, will display variable selection progress.

details

Logical; if TRUE, will print the regression result at each step.

x

An object of class ols_step_forward_*.

print_plot

logical; if TRUE, prints the plot else returns a plot object.

digits

Number of decimal places to display.

Value

List containing the following components:

model

final model; an object of class lm

metrics

selection metrics

others

list; info used for plotting and printing

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

# stepwise forward regression
model <- lm(y ~ ., data = surgical)
ols_step_forward_r2(model)
#> 
#> 
#>                               Stepwise Summary                              
#> --------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2 
#> --------------------------------------------------------------------------
#>  0      Base Model     802.606    806.584    646.794    0.00000    0.00000 
#>  1      liver_test     771.875    777.842    616.009    0.45454    0.44405 
#>  2      alc_heavy      761.439    769.395    605.506    0.56674    0.54975 
#>  3      enzyme_test    750.509    760.454    595.297    0.65900    0.63854 
#>  4      pindex         735.715    747.649    582.943    0.75015    0.72975 
#>  5      bcs            730.620    744.543    579.638    0.78091    0.75808 
#>  6      age            732.494    748.406    581.938    0.78142    0.75351 
#>  7      gender         734.407    752.308    584.276    0.78177    0.74856 
#>  8      alc_mod        736.390    756.280    586.665    0.78184    0.74305 
#> --------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.884       RMSE                 183.883 
#> R-Squared                 0.782       MSE                33813.069 
#> Adj. R-Squared            0.743       Coef. Var             28.691 
#> Pred R-Squared            0.668       AIC                  736.390 
#> MAE                     137.058       SBC                  756.280 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6543614.824         8     817951.853    20.159    0.0000 
#> Residual      1825905.713        45      40575.683                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1148.823       242.328                 -4.741    0.000    -1636.896    -660.750 
#>  liver_test       50.413        44.959        0.136     1.121    0.268      -40.140     140.965 
#>   alc_heavy      320.697        85.070        0.316     3.770    0.000      149.357     492.037 
#> enzyme_test        9.888         1.742        0.529     5.677    0.000        6.380      13.396 
#>      pindex        8.973         1.874        0.382     4.788    0.000        5.199      12.748 
#>         bcs       62.390        24.470        0.252     2.550    0.014       13.106     111.675 
#>         age       -0.951         2.649       -0.027    -0.359    0.721       -6.286       4.384 
#>      gender       15.874        58.475        0.020     0.271    0.787     -101.900     133.648 
#>     alc_mod        7.713        64.956        0.010     0.119    0.906     -123.116     138.542 
#> ------------------------------------------------------------------------------------------------
#> 

# stepwise forward regression plot
k <- ols_step_forward_r2(model)
plot(k)


# selection metrics
k$metrics
#>   step    variable        r2    adj_r2      aic      sbc     sbic
#> 1    1  liver_test 0.4545389 0.4440492 771.8753 777.8423 616.0089
#> 2    2   alc_heavy 0.5667409 0.5497504 761.4394 769.3953 605.5062
#> 3    3 enzyme_test 0.6590000 0.6385400 750.5089 760.4538 595.2974
#> 4    4      pindex 0.7501457 0.7297495 735.7146 747.6485 582.9426
#> 5    5         bcs 0.7809054 0.7580831 730.6204 744.5433 579.6377
#> 6    6         age 0.7814169 0.7535127 732.4942 748.4061 581.9383
#> 7    7      gender 0.7817703 0.7485615 734.4068 752.3077 584.2757
#> 8    8     alc_mod 0.7818387 0.7430544 736.3899 756.2797 586.6645

# extract final model
k$model
#> 
#> Call:
#> lm(formula = paste(response, "~", paste(preds, collapse = " + ")), 
#>     data = l)
#> 
#> Coefficients:
#> (Intercept)   liver_test    alc_heavy  enzyme_test       pindex          bcs  
#>   -1148.823       50.413      320.697        9.888        8.973       62.390  
#>         age       gender      alc_mod  
#>      -0.951       15.874        7.713  
#> 

# include or exclude variables
# force variable to be included in selection process
ols_step_forward_r2(model, include = c("age"))
#> 
#> 
#>                               Stepwise Summary                               
#> ---------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2  
#> ---------------------------------------------------------------------------
#>  0      Base Model     803.834    809.801    646.572    0.01420    -0.00476 
#>  1      liver_test     773.831    781.787    616.928    0.45498     0.43361 
#>  2      alc_heavy      763.110    773.055    606.421    0.56938     0.54354 
#>  3      enzyme_test    752.416    764.350    596.755    0.65959     0.63180 
#>  4      pindex         737.680    751.603    585.012    0.75030     0.72429 
#>  5      bcs            732.494    748.406    581.938    0.78142     0.75351 
#>  6      gender         734.407    752.308    584.276    0.78177     0.74856 
#>  7      alc_mod        736.390    756.280    586.665    0.78184     0.74305 
#> ---------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.884       RMSE                 183.883 
#> R-Squared                 0.782       MSE                33813.069 
#> Adj. R-Squared            0.743       Coef. Var             28.691 
#> Pred R-Squared            0.668       AIC                  736.390 
#> MAE                     137.058       SBC                  756.280 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6543614.824         8     817951.853    20.159    0.0000 
#> Residual      1825905.713        45      40575.683                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1148.823       242.328                 -4.741    0.000    -1636.896    -660.750 
#>         age       -0.951         2.649       -0.027    -0.359    0.721       -6.286       4.384 
#>  liver_test       50.413        44.959        0.136     1.121    0.268      -40.140     140.965 
#>   alc_heavy      320.697        85.070        0.316     3.770    0.000      149.357     492.037 
#> enzyme_test        9.888         1.742        0.529     5.677    0.000        6.380      13.396 
#>      pindex        8.973         1.874        0.382     4.788    0.000        5.199      12.748 
#>         bcs       62.390        24.470        0.252     2.550    0.014       13.106     111.675 
#>      gender       15.874        58.475        0.020     0.271    0.787     -101.900     133.648 
#>     alc_mod        7.713        64.956        0.010     0.119    0.906     -123.116     138.542 
#> ------------------------------------------------------------------------------------------------
#> 

# use index of variable instead of name
ols_step_forward_r2(model, include = c(5))
#> 
#> 
#>                               Stepwise Summary                               
#> ---------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2  
#> ---------------------------------------------------------------------------
#>  0      Base Model     803.834    809.801    646.572    0.01420    -0.00476 
#>  1      liver_test     773.831    781.787    616.928    0.45498     0.43361 
#>  2      alc_heavy      763.110    773.055    606.421    0.56938     0.54354 
#>  3      enzyme_test    752.416    764.350    596.755    0.65959     0.63180 
#>  4      pindex         737.680    751.603    585.012    0.75030     0.72429 
#>  5      bcs            732.494    748.406    581.938    0.78142     0.75351 
#>  6      gender         734.407    752.308    584.276    0.78177     0.74856 
#>  7      alc_mod        736.390    756.280    586.665    0.78184     0.74305 
#> ---------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.884       RMSE                 183.883 
#> R-Squared                 0.782       MSE                33813.069 
#> Adj. R-Squared            0.743       Coef. Var             28.691 
#> Pred R-Squared            0.668       AIC                  736.390 
#> MAE                     137.058       SBC                  756.280 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6543614.824         8     817951.853    20.159    0.0000 
#> Residual      1825905.713        45      40575.683                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1148.823       242.328                 -4.741    0.000    -1636.896    -660.750 
#>         age       -0.951         2.649       -0.027    -0.359    0.721       -6.286       4.384 
#>  liver_test       50.413        44.959        0.136     1.121    0.268      -40.140     140.965 
#>   alc_heavy      320.697        85.070        0.316     3.770    0.000      149.357     492.037 
#> enzyme_test        9.888         1.742        0.529     5.677    0.000        6.380      13.396 
#>      pindex        8.973         1.874        0.382     4.788    0.000        5.199      12.748 
#>         bcs       62.390        24.470        0.252     2.550    0.014       13.106     111.675 
#>      gender       15.874        58.475        0.020     0.271    0.787     -101.900     133.648 
#>     alc_mod        7.713        64.956        0.010     0.119    0.906     -123.116     138.542 
#> ------------------------------------------------------------------------------------------------
#> 

# force variable to be excluded from selection process
ols_step_forward_r2(model, exclude = c("liver_test"))
#> 
#> 
#>                               Stepwise Summary                              
#> --------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2 
#> --------------------------------------------------------------------------
#>  0      Base Model     802.606    806.584    646.794    0.00000    0.00000 
#>  1      enzyme_test    782.629    788.596    626.220    0.33435    0.32154 
#>  2      bcs            766.271    774.226    609.940    0.52619    0.50761 
#>  3      pindex         746.376    756.320    591.702    0.68413    0.66518 
#>  4      alc_heavy      730.924    742.858    579.087    0.77136    0.75269 
#>  5      age            732.339    746.262    580.934    0.77382    0.75026 
#>  6      gender         733.921    749.833    582.951    0.77556    0.74691 
#>  7      alc_mod        735.878    753.779    585.255    0.77574    0.74162 
#> --------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.881       RMSE                 186.434 
#> R-Squared                 0.776       MSE                34757.812 
#> Adj. R-Squared            0.742       Coef. Var             28.771 
#> Pred R-Squared            0.666       AIC                  735.878 
#> MAE                     138.247       SBC                  753.779 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6492598.668         7     927514.095    22.732    0.0000 
#> Residual      1876921.869        46      40802.649                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1235.188       230.403                 -5.361    0.000    -1698.966    -771.410 
#> enzyme_test       11.120         1.355        0.595     8.205    0.000        8.392      13.848 
#>         bcs       80.761        18.226        0.326     4.431    0.000       44.073     117.449 
#>      pindex        9.914         1.680        0.422     5.900    0.000        6.531      13.297 
#>   alc_heavy      318.305        85.281        0.314     3.732    0.001      146.643     489.966 
#>         age       -1.850         2.531       -0.052    -0.731    0.469       -6.946       3.245 
#>      gender       33.699        56.430        0.043     0.597    0.553      -79.888     147.286 
#>     alc_mod       12.493        64.997        0.016     0.192    0.848     -118.340     143.326 
#> ------------------------------------------------------------------------------------------------
#> 

# use index of variable instead of name
ols_step_forward_r2(model, exclude = c(4))
#> 
#> 
#>                               Stepwise Summary                              
#> --------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2 
#> --------------------------------------------------------------------------
#>  0      Base Model     802.606    806.584    646.794    0.00000    0.00000 
#>  1      enzyme_test    782.629    788.596    626.220    0.33435    0.32154 
#>  2      bcs            766.271    774.226    609.940    0.52619    0.50761 
#>  3      pindex         746.376    756.320    591.702    0.68413    0.66518 
#>  4      alc_heavy      730.924    742.858    579.087    0.77136    0.75269 
#>  5      age            732.339    746.262    580.934    0.77382    0.75026 
#>  6      gender         733.921    749.833    582.951    0.77556    0.74691 
#>  7      alc_mod        735.878    753.779    585.255    0.77574    0.74162 
#> --------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.881       RMSE                 186.434 
#> R-Squared                 0.776       MSE                34757.812 
#> Adj. R-Squared            0.742       Coef. Var             28.771 
#> Pred R-Squared            0.666       AIC                  735.878 
#> MAE                     138.247       SBC                  753.779 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6492598.668         7     927514.095    22.732    0.0000 
#> Residual      1876921.869        46      40802.649                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1235.188       230.403                 -5.361    0.000    -1698.966    -771.410 
#> enzyme_test       11.120         1.355        0.595     8.205    0.000        8.392      13.848 
#>         bcs       80.761        18.226        0.326     4.431    0.000       44.073     117.449 
#>      pindex        9.914         1.680        0.422     5.900    0.000        6.531      13.297 
#>   alc_heavy      318.305        85.281        0.314     3.732    0.001      146.643     489.966 
#>         age       -1.850         2.531       -0.052    -0.731    0.469       -6.946       3.245 
#>      gender       33.699        56.430        0.043     0.597    0.553      -79.888     147.286 
#>     alc_mod       12.493        64.997        0.016     0.192    0.848     -118.340     143.326 
#> ------------------------------------------------------------------------------------------------
#> 

# include & exclude variables in the selection process
ols_step_forward_r2(model, include = c("age"), exclude = c("liver_test"))
#> 
#> 
#>                               Stepwise Summary                               
#> ---------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2  
#> ---------------------------------------------------------------------------
#>  0      Base Model     803.834    809.801    646.572    0.01420    -0.00476 
#>  1      enzyme_test    783.607    791.563    626.048    0.34683     0.32121 
#>  2      bcs            767.078    777.023    609.973    0.53654     0.50873 
#>  3      pindex         747.171    759.105    592.354    0.69109     0.66588 
#>  4      alc_heavy      732.339    746.262    580.934    0.77382     0.75026 
#>  5      gender         733.921    749.833    582.951    0.77556     0.74691 
#>  6      alc_mod        735.878    753.779    585.255    0.77574     0.74162 
#> ---------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.881       RMSE                 186.434 
#> R-Squared                 0.776       MSE                34757.812 
#> Adj. R-Squared            0.742       Coef. Var             28.771 
#> Pred R-Squared            0.666       AIC                  735.878 
#> MAE                     138.247       SBC                  753.779 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6492598.668         7     927514.095    22.732    0.0000 
#> Residual      1876921.869        46      40802.649                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1235.188       230.403                 -5.361    0.000    -1698.966    -771.410 
#>         age       -1.850         2.531       -0.052    -0.731    0.469       -6.946       3.245 
#> enzyme_test       11.120         1.355        0.595     8.205    0.000        8.392      13.848 
#>         bcs       80.761        18.226        0.326     4.431    0.000       44.073     117.449 
#>      pindex        9.914         1.680        0.422     5.900    0.000        6.531      13.297 
#>   alc_heavy      318.305        85.281        0.314     3.732    0.001      146.643     489.966 
#>      gender       33.699        56.430        0.043     0.597    0.553      -79.888     147.286 
#>     alc_mod       12.493        64.997        0.016     0.192    0.848     -118.340     143.326 
#> ------------------------------------------------------------------------------------------------
#> 

# use index of variable instead of name
ols_step_forward_r2(model, include = c(5), exclude = c(4))
#> 
#> 
#>                               Stepwise Summary                               
#> ---------------------------------------------------------------------------
#> Step    Variable         AIC        SBC       SBIC        R2       Adj. R2  
#> ---------------------------------------------------------------------------
#>  0      Base Model     803.834    809.801    646.572    0.01420    -0.00476 
#>  1      enzyme_test    783.607    791.563    626.048    0.34683     0.32121 
#>  2      bcs            767.078    777.023    609.973    0.53654     0.50873 
#>  3      pindex         747.171    759.105    592.354    0.69109     0.66588 
#>  4      alc_heavy      732.339    746.262    580.934    0.77382     0.75026 
#>  5      gender         733.921    749.833    582.951    0.77556     0.74691 
#>  6      alc_mod        735.878    753.779    585.255    0.77574     0.74162 
#> ---------------------------------------------------------------------------
#> 
#> Final Model Output 
#> ------------------
#> 
#>                            Model Summary                            
#> -------------------------------------------------------------------
#> R                         0.881       RMSE                 186.434 
#> R-Squared                 0.776       MSE                34757.812 
#> Adj. R-Squared            0.742       Coef. Var             28.771 
#> Pred R-Squared            0.666       AIC                  735.878 
#> MAE                     138.247       SBC                  753.779 
#> -------------------------------------------------------------------
#>  RMSE: Root Mean Square Error 
#>  MSE: Mean Square Error 
#>  MAE: Mean Absolute Error 
#>  AIC: Akaike Information Criteria 
#>  SBC: Schwarz Bayesian Criteria 
#> 
#>                                  ANOVA                                  
#> -----------------------------------------------------------------------
#>                    Sum of                                              
#>                   Squares        DF    Mean Square      F         Sig. 
#> -----------------------------------------------------------------------
#> Regression    6492598.668         7     927514.095    22.732    0.0000 
#> Residual      1876921.869        46      40802.649                     
#> Total         8369520.537        53                                    
#> -----------------------------------------------------------------------
#> 
#>                                       Parameter Estimates                                        
#> ------------------------------------------------------------------------------------------------
#>       model         Beta    Std. Error    Std. Beta      t        Sig         lower       upper 
#> ------------------------------------------------------------------------------------------------
#> (Intercept)    -1235.188       230.403                 -5.361    0.000    -1698.966    -771.410 
#>         age       -1.850         2.531       -0.052    -0.731    0.469       -6.946       3.245 
#> enzyme_test       11.120         1.355        0.595     8.205    0.000        8.392      13.848 
#>         bcs       80.761        18.226        0.326     4.431    0.000       44.073     117.449 
#>      pindex        9.914         1.680        0.422     5.900    0.000        6.531      13.297 
#>   alc_heavy      318.305        85.281        0.314     3.732    0.001      146.643     489.966 
#>      gender       33.699        56.430        0.043     0.597    0.553      -79.888     147.286 
#>     alc_mod       12.493        64.997        0.016     0.192    0.848     -118.340     143.326 
#> ------------------------------------------------------------------------------------------------
#>