Function to check input data for function glm_mi

check_model Function to check input data for function glm_mi

check_model(
  data,
  formula,
  keep.predictors,
  impvar,
  p.crit,
  method,
  nimp,
  direction,
  model_type
)

Arguments

data: Data frame with stacked multiple imputed datasets. The original dataset that contains missing values must be excluded from the dataset. The imputed datasets must be distinguished by an imputation variable, specified under impvar, and starting by 1.
formula: A formula object to specify the model as normally used by glm. See under "Details" and "Examples" how these can be specified.
keep.predictors: A single string or a vector of strings including the variables that are forced in the model during predictor selection. All type of variables are allowed.
impvar: A character vector. Name of the variable that distinguishes the imputed datasets.
p.crit: A numerical scalar. P-value selection criterium. A value of 1 provides the pooled model without selection.
method: A character vector to indicate the pooling method for p-values to pool the total model or used during model selection. This can be "RR", D1", "D2", "D3", "D4", or "MPR". See details for more information. Default is "RR".
nimp: A numerical scalar. Number of imputed datasets. Default is 5.
direction: The direction of model selection, "BW" means backward selection and "FW" means forward selection.
model_type: A character vector for type of model, "binomial" is for logistic regression and "linear" is for linear regression models.

Value

The outcome variable, the names of the predictors and name of variable to keep, if defined. For internal use.

Details

The basic pooling procedure to derive pooled coefficients, standard errors, 95 confidence intervals and p-values is Rubin's Rules (RR). RR are possible when the model includes continuous or dichotomous variables. When the model includes categorical (> 2 categories) or restricted cubic spline variables multiparameter pooling methods have to be used. These pooling methods are: “D1” (pooling of the total covariance matrix), ”D2” pooling of Chi-square values, “D3” and "D4" pooling Likelihood ratio statistics and “MPR”, pooling of median p-values (MPR rule). Spline regression coefficients are defined by using the rcs function for restricted cubic splines of the rms package. A minimum number of 3 knots as defined under knots is required.

A typical formula object has the form Outcome ~ terms. Categorical variables has to be defined as Outcome ~ factor(variable), restricted cubic spline variables as Outcome ~ rcs(variable, 3). Interaction terms can be defined as Outcome ~ variable1*variable2 or Outcome ~ variable1 + variable2 + variable1:variable2. All variables in the terms part have to be separated by a "+".

Author

Martijn Heymans, 2020