R/psfmi_mm.R
psfmi_mm.Rd
psfmi_mm
Pooling and backward selection for 2 level (generalized)
linear mixed models in multiply imputed datasets using different selection methods.
psfmi_mm(
data,
nimp = 5,
impvar = NULL,
clusvar = NULL,
Outcome,
predictors = NULL,
random.eff = NULL,
family = "linear",
p.crit = 1,
cat.predictors = NULL,
spline.predictors = NULL,
int.predictors = NULL,
keep.predictors = NULL,
nknots = NULL,
method = "RR",
print.method = FALSE
)
Data frame with stacked multiple imputed datasets. The original dataset that contains missing values must be excluded from the dataset. The imputed datasets must be distinguished by an imputation variable, specified under impvar, and starting by 1 and the clusters should be distinguished by a cluster variable, specified under clusvar.
A numerical scalar. Number of imputed datasets. Default is 5.
A character vector. Name of the variable that distinguishes the imputed datasets.
A character vector. Name of the variable that distinguishes the clusters.
Character vector containing the name of the outcome variable.
Character vector with the names of the predictor variables. At least one predictor variable has to be defined.
Character vector to specify the random effects as used by the
lmer
and glmer
functions of the lme4
package.
Character vector to specify the type of model, "linear" is used to
call the lmer
function and "binomial" is used to call the glmer
function of the lme4
package. See details for more information.
A numerical scalar. P-value selection criterium. A value of 1 provides the pooled model without selection.
A single string or a vector of strings to define the categorical variables. Default is NULL categorical predictors.
A single string or a vector of strings to define the (restricted cubic) spline variables. Default is NULL spline predictors. See details.
A single string or a vector of strings with the names of the variables that form an interaction pair, separated by a “:” symbol.
A single string or a vector of strings including the variables that are forced in the model during predictor selection. Categorical and interaction variables are allowed.
A numerical vector that defines the number of knots for each spline predictor separately.
A character vector to indicate the pooling method for p-values to pool the total model or used during predictor selection. This can be "D1", "D2", "D3" or "MPR". See details for more information.
logical vector. If TRUE full matrix with p-values of all variables according to chosen method (under method) is shown. If FALSE (default) p-value for categorical variables according to method are shown and for continuous and dichotomous predictors Rubin’s Rules are used.
An object of class smodsmi
(selected models in multiply imputed datasets) from
which the following objects can be extracted: imputed datasets as data
, selected
pooled model as RR_model
, pooled p-values according to pooling method as multiparm
,
random effects as random.eff
, predictors included at each selection step as predictors_in
,
predictors excluded at each step as predictors_out
, and family
, impvar
, clusvar
,
nimp
, Outcome
, method
, p.crit
, predictors
, cat.predictors
,
keep.predictors
, int.predictors
, spline.predictors
, knots
, print.method
,
model_type
, call
, predictors_final
for names of predictors in final step and
fit.formula
is the regression formula of start model.
The basic pooling procedure to derive pooled coefficients, standard errors, 95
confidence intervals and p-values is Rubin's Rules (RR). Specific procedures are
available to derive pooled p-values for categorical (> 2 categories) and spline variables.
print.method allows to choose between the pooling methods: D1, D2 and D3 and MPR for pooling of
median p-values (MPR rule). The D1, D2 and D3 methods are called from the package mitml
.
For Logistic multilevel models (that are estimated using the glmer
function), the D3 method
is not yet available. Spline regression coefficients are defined by using the rcs function for
restricted cubic splines of the rms package. A minimum number of 3 knots as defined under knots is required.
Eekhout I, van de Wiel MA, Heymans MW. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis. BMC Med Res Methodol. 2017;17(1):129.
Enders CK (2010). Applied missing data analysis. New York: The Guilford Press.
Meng X-L, Rubin DB. Performing likelihood ratio tests with multiply-imputed data sets. Biometrika.1992;79:103-11.
van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009;10:550-60.
mitml package https://cran.r-project.org/web/packages/mitml/index.html
Van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd Edition. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton.
http://missingdatasolutions.rbind.io/
if (FALSE) {
pool_mm <- psfmi_mm(data=ipdna_md, nimp=5, impvar=".imp", family="linear",
predictors=c("gender", "afib", "sbp"), clusvar = "centre",
random.eff="( 1 | centre)", Outcome="dbp", cat.predictors = "bmi_cat",
p.crit=0.15, method="D1", print.method = FALSE)
pool_mm$RR_Model
pool_mm$multiparm
}