Pooling and Predictor selection function for multilevel models in multiply imputed datasets

psfmi_mm Pooling and backward selection for 2 level (generalized) linear mixed models in multiply imputed datasets using different selection methods.

psfmi_mm(
  data,
  nimp = 5,
  impvar = NULL,
  clusvar = NULL,
  Outcome,
  predictors = NULL,
  random.eff = NULL,
  family = "linear",
  p.crit = 1,
  cat.predictors = NULL,
  spline.predictors = NULL,
  int.predictors = NULL,
  keep.predictors = NULL,
  nknots = NULL,
  method = "RR",
  print.method = FALSE
)

Arguments

data: Data frame with stacked multiple imputed datasets. The original dataset that contains missing values must be excluded from the dataset. The imputed datasets must be distinguished by an imputation variable, specified under impvar, and starting by 1 and the clusters should be distinguished by a cluster variable, specified under clusvar.
nimp: A numerical scalar. Number of imputed datasets. Default is 5.
impvar: A character vector. Name of the variable that distinguishes the imputed datasets.
clusvar: A character vector. Name of the variable that distinguishes the clusters.
Outcome: Character vector containing the name of the outcome variable.
predictors: Character vector with the names of the predictor variables. At least one predictor variable has to be defined.
random.eff: Character vector to specify the random effects as used by the lmer and glmer functions of the lme4 package.
family: Character vector to specify the type of model, "linear" is used to call the lmer function and "binomial" is used to call the glmer function of the lme4 package. See details for more information.
p.crit: A numerical scalar. P-value selection criterium. A value of 1 provides the pooled model without selection.
cat.predictors: A single string or a vector of strings to define the categorical variables. Default is NULL categorical predictors.
spline.predictors: A single string or a vector of strings to define the (restricted cubic) spline variables. Default is NULL spline predictors. See details.
int.predictors: A single string or a vector of strings with the names of the variables that form an interaction pair, separated by a “:” symbol.
keep.predictors: A single string or a vector of strings including the variables that are forced in the model during predictor selection. Categorical and interaction variables are allowed.
nknots: A numerical vector that defines the number of knots for each spline predictor separately.
method: A character vector to indicate the pooling method for p-values to pool the total model or used during predictor selection. This can be "D1", "D2", "D3" or "MPR". See details for more information.
print.method: logical vector. If TRUE full matrix with p-values of all variables according to chosen method (under method) is shown. If FALSE (default) p-value for categorical variables according to method are shown and for continuous and dichotomous predictors Rubin’s Rules are used.

Value

An object of class smodsmi (selected models in multiply imputed datasets) from which the following objects can be extracted: imputed datasets as data, selected pooled model as RR_model, pooled p-values according to pooling method as multiparm, random effects as random.eff, predictors included at each selection step as predictors_in, predictors excluded at each step as predictors_out, and family, impvar, clusvar,

nimp, Outcome, method, p.crit, predictors, cat.predictors,

keep.predictors, int.predictors, spline.predictors, knots, print.method,

model_type, call, predictors_final for names of predictors in final step and

fit.formula is the regression formula of start model.

Details

The basic pooling procedure to derive pooled coefficients, standard errors, 95 confidence intervals and p-values is Rubin's Rules (RR). Specific procedures are available to derive pooled p-values for categorical (> 2 categories) and spline variables. print.method allows to choose between the pooling methods: D1, D2 and D3 and MPR for pooling of median p-values (MPR rule). The D1, D2 and D3 methods are called from the package mitml. For Logistic multilevel models (that are estimated using the glmer function), the D3 method is not yet available. Spline regression coefficients are defined by using the rcs function for restricted cubic splines of the rms package. A minimum number of 3 knots as defined under knots is required.

References

Eekhout I, van de Wiel MA, Heymans MW. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis. BMC Med Res Methodol. 2017;17(1):129.

Enders CK (2010). Applied missing data analysis. New York: The Guilford Press.

Meng X-L, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika.1992;79:103-11.

van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009;10:550-60.

mitml package https://cran.r-project.org/web/packages/mitml/index.html

Van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd Edition. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton.

http://missingdatasolutions.rbind.io/

Examples


if (FALSE) {
  pool_mm <- psfmi_mm(data=ipdna_md, nimp=5, impvar=".imp", family="linear",
  predictors=c("gender", "afib", "sbp"), clusvar = "centre",
  random.eff="( 1 | centre)", Outcome="dbp", cat.predictors = "bmi_cat",
  p.crit=0.15, method="D1", print.method = FALSE)
  pool_mm$RR_Model
  pool_mm$multiparm
}