psfmi_coxr Pooling and backward or forward selection of Cox regression prediction models in multiply imputed data using selection methods D1, D2 and MPR.

psfmi_coxr(
  data,
  formula = NULL,
  nimp = 5,
  impvar = NULL,
  time,
  status,
  predictors = NULL,
  cat.predictors = NULL,
  spline.predictors = NULL,
  int.predictors = NULL,
  keep.predictors = NULL,
  strata.variable = NULL,
  nknots = NULL,
  p.crit = 1,
  method = "RR",
  direction = NULL
)

Arguments

data

Data frame with stacked multiple imputed datasets. The original dataset that contains missing values must be excluded from the dataset. The imputed datasets must be distinguished by an imputation variable, specified under impvar, and starting by 1.

formula

A formula object to specify the model as normally used by coxph. See under "Details" and "Examples" how these can be specified. If a formula object is used set predictors, cat.predictors, spline.predictors or int.predictors at the default value of NULL.

nimp

A numerical scalar. Number of imputed datasets. Default is 5.

impvar

A character vector. Name of the variable that distinguishes the imputed datasets.

time

Survival time.

status

The status variable, normally 0=censoring, 1=event.

predictors

Character vector with the names of the predictor variables. At least one predictor variable has to be defined. Give predictors unique names and do not use predictor name combinations with numbers as, age2, gnder10, etc.

cat.predictors

A single string or a vector of strings to define the categorical variables. Default is NULL categorical predictors.

spline.predictors

A single string or a vector of strings to define the (restricted cubic) spline variables. Default is NULL spline predictors. See details.

int.predictors

A single string or a vector of strings with the names of the variables that form an interaction pair, separated by a “:” symbol.

keep.predictors

A single string or a vector of strings including the variables that are forced in the model during predictor selection. Categorical and interaction variables are allowed.

strata.variable

A single string including the strata variable. See under "Details" and "Examples" how such a variable can be specified.

nknots

A numerical vector that defines the number of knots for each spline predictor separately.

p.crit

A numerical scalar. P-value selection criterion. A value of 1 provides the pooled model without selection.

method

A character vector to indicate the pooling method for p-values to pool the total model or used during predictor selection. This can be "RR", D1", "D2", or "MPR". See details for more information. Default is "RR".

direction

The direction of predictor selection, "BW" means backward selection and "FW" means forward selection.

Value

An object of class pmods (multiply imputed models) from which the following objects can be extracted:

  • data imputed datasets

  • RR_model pooled model at each selection step

  • RR_model_final final selected pooled model

  • multiparm pooled p-values at each step according to pooling method

  • multiparm_final pooled p-values at final step according to pooling method

  • multiparm_out (only when direction = "FW") pooled p-values of removed predictors

  • formula_step formula object at each step

  • formula_final formula object at final step

  • formula_initial formula object at final step

  • predictors_in predictors included at each selection step

  • predictors_out predictors excluded at each step

  • impvar name of variable used to distinguish imputed datasets

  • nimp number of imputed datasets

  • status name of the status variable

  • time name of the time variable

  • method selection method

  • p.crit p-value selection criterium

  • call function call

  • model_type type of regression model used

  • direction direction of predictor selection

  • predictors_final names of predictors in final selection step

  • predictors_initial names of predictors in start model

  • keep.predictors names of predictors that were forced in the model

  • strata.variable names of the strata variable in the model

Details

The basic pooling procedure to derive pooled coefficients, standard errors, 95 confidence intervals and p-values is Rubin's Rules (RR). However, RR is only possible when the model included continuous or dichotomous variables. Specific procedures are available when the model also included categorical (> 2 categories) or restricted cubic spline variables. These pooling methods are: “D1” is pooling of the total covariance matrix, ”D2” is pooling of Chi-square values and “MPR” is pooling of median p-values (MPR rule). Spline regression coefficients are defined by using the rcs function for restricted cubic splines of the rms package. A minimum number of 3 knots as defined under knots is required.

A typical formula object has the form Surv(time, status) ~ terms. Categorical variables has to be defined as Surv(time, status) ~ factor(variable), restricted cubic spline variables as Surv(time, status) ~ rcs(variable, 3). Interaction terms can be defined as Surv(time, status) ~ variable1*variable2 or Surv(time, status) ~ variable1 + variable2 + variable1:variable2. All variables in the terms part have to be separated by a "+". If a formula object is used set predictors, cat.predictors, spline.predictors or int.predictors at the default value of NULL. For Cox models also a strata variable is allowed to include in the formula as Surv(time, status) ~ strata(variable) + terms.

Vignettes

https://mwheymans.github.io/psfmi/articles/psfmi_CoxModels.html

References

Eekhout I, van de Wiel MA, Heymans MW. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis. BMC Med Res Methodol. 2017;17(1):129.

Enders CK (2010). Applied missing data analysis. New York: The Guilford Press.

van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009;10:550-60.

Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009;9:57.

Van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd Edition. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton.

EW. Steyerberg (2019). Clinical Prediction MOdels. A Practical Approach to Development, Validation, and Updating (2nd edition). Springer Nature Switzerland AG.

http://missingdatasolutions.rbind.io/

Author

Martijn Heymans, 2020

Examples

 pool_coxr <- psfmi_coxr(formula = Surv(Time, Status) ~ Pain + Tampascale +
                       Radiation + Radiation*Pain + Age + Duration + Previous,
                     data=lbpmicox, p.crit = 0.05, direction="BW", nimp=5, impvar="Impnr",
                     keep.predictors = "Radiation*Pain", method="D1")
#> Removed at Step 1 is - Previous
#> Removed at Step 2 is - Age
#> Removed at Step 3 is - Tampascale
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
                     
 pool_coxr$RR_model_final
#> $`Step 4`
#>             term     estimate   std.error   statistic       df    p.value
#> 1           Pain -0.106402350 0.053609346 -1.98477238 114.6369 0.04955832
#> 2      Radiation  0.034402970 0.614625339  0.05597389 100.6917 0.95547353
#> 3       Duration -0.007502183 0.003758976 -1.99580506 184.2603 0.04742814
#> 4 Pain:Radiation -0.026253904 0.088245839 -0.29750869  98.0881 0.76670736
#>          HR lower.EXP upper.EXP
#> 1 0.8990628 0.8084829  0.999791
#> 2 1.0350016 0.3057787  3.503280
#> 3 0.9925259 0.9851924  0.999914
#> 4 0.9740877 0.8176075  1.160516
#> 
 
 pool_coxr <- psfmi_coxr(formula = Surv(Time, Status) ~ Pain + Tampascale +
                       Previous + strata(Radiation), data=lbpmicox, p.crit = 0.05, 
                       direction="BW", nimp=5, impvar="Impnr", method="D1")
#> Removed at Step 1 is - Previous
#> Removed at Step 2 is - Tampascale
#> 
#> Selection correctly terminated, 
#> No more variables removed from the model
                     
 pool_coxr$RR_model_final
#> $`Step 3`
#>   term   estimate  std.error statistic       df     p.value        HR lower.EXP
#> 1 Pain -0.1126181 0.04173645 -2.698315 163.6317 0.007700164 0.8934918 0.8228104
#>   upper.EXP
#> 1  0.970245
#>