pool_glm Pools and selects Linear and Logistic regression models across multiply imputed data, using pooling methods RR, D1, D2, D3, D4 and MPR (in combination with 'with' function).

pool_glm(
  object,
  method = "D1",
  p.crit = 1,
  keep.predictors = NULL,
  direction = NULL
)

Arguments

object

An object of class 'mistats' ('Multiply Imputed Statistical Analyses').

method

A character vector to indicate the multiparameter pooling method to pool the total model or used during model selection. This can be "RR", D1", "D2", "D3", "D4", or "MPR". See details for more information. Default is "RR".

p.crit

A numerical scalar. P-value selection criterium. A value of 1 provides the pooled model without selection.

keep.predictors

A single string or a vector of strings including the variables that are forced in the model during model selection. All type of variables are allowed.

direction

The direction for model selection, "BW" means backward selection and "FW" means forward selection.

Value

An object of class mipool (multiply imputed pooled models) from which the following objects can be extracted:

  • pmodel pooled model (at last selection step)

  • pmultiparm pooled p-values according to multiparameter test method (at last selection step)

  • pmodel_step pooled model (at each selection step)

  • pmultiparm_step pooled p-values according to multiparameter test method (at each selection step)

  • multiparm_final pooled p-values at final step according to pooling method

  • multiparm_out (only when direction = "FW") pooled p-values of removed predictors

  • formula_final formula object at final step

  • formula_initial formula object at final step

  • predictors_in predictors included at each selection step

  • predictors_out predictors excluded at each step

  • impvar name of variable used to distinguish imputed datasets

  • nimp number of imputed datasets

  • Outcome name of the outcome variable

  • method selection method

  • p.crit p-value selection criterium

  • call function call

  • model_type type of regression model used

  • direction direction of predictor selection

  • predictors_final names of predictors in final selection step

  • predictors_initial names of predictors in start model

  • keep.predictors names of predictors that were forced in the model

Details

The basic pooling procedure to derive pooled coefficients, standard errors, 95 confidence intervals and p-values is Rubin's Rules (RR). However, RR is only possible when the model includes continuous and dichotomous variables. Multiparameter pooling methods are available when the model also included categorical (> 2 categories) variables. These pooling methods are: “D1” is pooling of the total covariance matrix, ”D2” is pooling of Chi-square values, “D3” and "D4" is pooling Likelihood ratio statistics (method of Meng and Rubin) and “MPR” is pooling of median p-values (MPR rule). For pooling restricted cubic splines using the 'rcs' function of of the rms package, use function 'glm_mi'.

A typical formula object has the form Outcome ~ terms. Categorical variables has to be defined as Outcome ~ factor(variable). Interaction terms can be defined as Outcome ~ variable1*variable2 or Outcome ~ variable1 + variable2 + variable1:variable2. All variables in the terms part have to be separated by a "+".

Vignettes

https://mwheymans.github.io/miceafter/articles/regression_modelling.html

References

Eekhout I, van de Wiel MA, Heymans MW. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis. BMC Med Res Methodol. 2017;17(1):129.

Enders CK (2010). Applied missing data analysis. New York: The Guilford Press.

Meng X-L, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika.1992;79:103-11.

van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009;10:550-60.

Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009;9:57.

Van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd Edition. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton.

Author

Martijn Heymans, 2021

Examples


  dat_list <- df2milist(lbpmilr, impvar="Impnr")
  ra <- with(data=dat_list, expr = glm(Chronic ~ factor(Carrying) + Radiation + Age))
  poolm <- pool_glm(ra, method="D1")
  poolm$pmodel
#>                term     estimate   std.error statistic       df      p.value
#> 1       (Intercept)  0.336159667 0.176976537  1.899459 131.2046 5.969920e-02
#> 2         Radiation  0.206960010 0.074646493  2.772535 141.1522 6.313437e-03
#> 3               Age -0.005613532 0.003871727 -1.449878 136.4631 1.493869e-01
#> 4 factor(Carrying)2  0.255755785 0.093849364  2.725173 117.8087 7.406930e-03
#> 5 factor(Carrying)3  0.436431952 0.092953821  4.695148 134.2307 6.496890e-06
#>         2.5 %     97.5 %
#> 1 -0.01393705 0.68625638
#> 2  0.05939038 0.35452964
#> 3 -0.01326987 0.00204281
#> 4  0.06990537 0.44160620
#> 5  0.25258837 0.62027554
  poolm$pmultiparm
#>                   p-values D1 F-statistic
#> Radiation        5.624615e-03    7.686950
#> Age              1.474129e-01    2.102147
#> factor(Carrying) 2.252051e-05   10.777357