External Validation of logistic prediction models in multiply imputed datasets

mivalext_lr External validation of logistic prediction models

mivalext_lr(
  data.val = NULL,
  data.orig = NULL,
  nimp = 5,
  impvar = NULL,
  formula = NULL,
  lp.orig = NULL,
  cal.plot = FALSE,
  plot.indiv,
  val.check = FALSE,
  g = 10,
  groups_cal = 10,
  plot.method = "mean"
)

Arguments

data.val: Data frame with stacked multiply imputed validation datasets. The original dataset that contains missing values must be excluded from the dataset. The imputed datasets must be distinguished by an imputation variable, specified under impvar, and starting by 1.
data.orig: A single data frame containing the original dataset that was used to develop the model. Used to estimate the original regression coefficients in case lp.orig is not provided.
nimp: A numerical scalar. Number of imputed datasets. Default is 5.
impvar: A character vector. Name of the variable that distinguishes the imputed datasets.
formula: A formula object to specify the model as normally used by glm.
lp.orig: Numeric vector of the original coefficient values that are externally validated.
cal.plot: If TRUE a calibration plot is generated. Default is FALSE.
plot.indiv: This argument is deprecated; please use plot.method instead.
val.check: logical vector. If TRUE the names of the predictors of the LP are provided and can be used as information for the order of the coefficient values as input for lp.orig. If FALSE (default) validation procedure is executed with coefficient values fitted in the order as used under lp.orig.
g: A numerical scalar. Number of groups for the Hosmer and Lemeshow test. Default is 10.
groups_cal: A numerical scalar. Number of groups used on the calibration plot. Default is 10. If the range of predicted probabilities is low, less than 10 groups can be chosen.
plot.method: If "mean" one calibration plot is generated, first taking the mean of the linear predictor values across the multiply imputed datasets (default), if "individual" the calibration plot in each imputed dataset is plotted, if "overlay" calibration plots from each imputed datasets are plotted in one figure.

Value

A mivalext_lr object from which the following objects can be extracted: calibrate with information about mis-calibration in intercept and slope with and without offset procedure,

coef_pooled, coefficients pooled, ROC results as ROC, R squared results as R2, Hosmer and Lemeshow test as HL_test,

nimp, formula, impvar, val.check, g,

coef.check and groups_cal.

Details

The following information of the externally validated model is provided: calibrate with information of pooled_int and pooled_slope that is the pooled linear predictor (LP), after the LP is freely estimated in each external imputed dataset Outcome ~ a + LP (provides information about miscalibration in intercept and slope), pooled_offset_int as Outcome ~ a + offset(LP) and pooled_offset_slope as Outcome ~ a + LP + offset(LP) with information about miscalibration in intercept and slope separately by using an offset procedure (see Steyerberg, p. 300), coef_pooled with the pooled coefficients when the model is freely estimated in imputed datasets, ROC pooled ROC curve (back transformed after pooling log transformed ROC curves), R2 pooled Nagelkerke R-Square value (back transformed after pooling Fisher transformed values), HLtest pooled Hosmer and Lemeshow Test (using function pool_D2). In addition information is provided about nimp, impvar, formula, val_ckeck, g and coef_check. When the external validation is very poor, the R2 can become negative due to the poor fit of the model in the external dataset (in that case you may report a R2 of zero).

References

F. Harrell. Regression Modeling Strategies. With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd Edition. Springer, New York, NY, 2015.

EW. Steyerberg (2019). Clinical Prediction MOdels. A Practical Approach to Development, Validation, and Updating (2nd edition). Springer Nature Switzerland AG.

Van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd Edition. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton.

http://missingdatasolutions.rbind.io/

Examples


mivalext_lr(data.val=lbpmilr, nimp=5, impvar="Impnr", 
  formula = Chronic ~ Gender + factor(Carrying)  + Function + 
  Tampascale + Age, lp.orig=c(-10, -0.35, 1.00, 1.00, -0.04, 0.26, -0.01),
  cal.plot=TRUE, val.check = FALSE)
#> 
#> Pooled performance measures over m = 5 imputed external validation datasets
#>             correctly estimated
#> 

#> $calibrate
#>                       Estimate   95% CI L  95% CI U     P-val
#> pooled_int          -0.0788081 -0.4838591 0.3262428 0.7012108
#> pooled_slope         0.9818332  0.6645742 1.2990921 0.0000000
#> pooled_offset_int   -0.0768418 -0.4836964 0.3300127 0.7095419
#> pooled_offset_slope -0.0181668 -0.3354258 0.2990921 0.9098709
#> 
#> $coef_pooled
#>       (Intercept)            Gender factor(Carrying)2 factor(Carrying)3 
#>          -9.66084          -0.41365           0.92735           1.05231 
#>          Function        Tampascale               Age 
#>          -0.03579           0.25232          -0.01235 
#> 
#> $ROC
#>                     95% Low C-statistic 95% Up
#> C-statistic (logit)  0.7854      0.8551 0.9048
#> 
#> $R2
#> [1] 0.46333
#> 
#> $HLtest
#>      F_value   P(>F) df1        df2
#> [1,] 0.10684 0.99901   8 3977692743
#> 
#> $nimp
#> [1] 5
#> 
#> $impvar
#> [1] "Impnr"
#> 
#> $formula
#> Chronic ~ Gender + factor(Carrying) + Function + Tampascale + 
#>     Age
#> <environment: 0x0000020fb93c3300>
#> 
#> $val_check
#> [1] FALSE
#> 
#> $g
#> [1] 10
#> 
#> $coef_check
#> [1] "(Intercept)"       "Gender"            "factor(Carrying)2"
#> [4] "factor(Carrying)3" "Function"          "Tampascale"       
#> [7] "Age"              
#> 
#> $groups_cal
#> [1] 10
#>