Skip to contents

Estimate AUC VIM

Usage

vim(
  type,
  time,
  event,
  X,
  landmark_times = stats::quantile(time[event == 1], probs = c(0.25, 0.5, 0.75)),
  restriction_time = max(time[event == 1]),
  approx_times = NULL,
  large_feature_vector,
  small_feature_vector,
  conditional_surv_preds = NULL,
  large_oracle_preds = NULL,
  small_oracle_preds = NULL,
  conditional_surv_generator = NULL,
  conditional_surv_generator_control = NULL,
  large_oracle_generator = NULL,
  large_oracle_generator_control = NULL,
  small_oracle_generator = NULL,
  small_oracle_generator_control = NULL,
  cf_folds = NULL,
  cf_fold_num = 5,
  sample_split = TRUE,
  ss_folds = NULL,
  robust = TRUE,
  scale_est = FALSE,
  alpha = 0.05,
  verbose = FALSE
)

Arguments

type

Type of VIM to compute. Options include "accuracy", "AUC", "Brier", "R-squared" "C-index", and "survival_time_MSE".

time

n x 1 numeric vector of observed follow-up times. If there is censoring, these are the minimum of the event and censoring times.

event

n x 1 numeric vector of status indicators of whether an event was observed.

X

n x p data.frame of observed covariate values

landmark_times

Numeric vector of length J1 giving landmark times at which to estimate VIM ("accuracy", "AUC", "Brier", "R-squared").

restriction_time

Maximum follow-up time for calculation of "C-index" and "survival_time_MSE".

approx_times

Numeric vector of length J2 giving times at which to approximate integrals. Defaults to a grid of 100 timepoints, evenly spaced on the quantile scale of the distribution of observed event times.

large_feature_vector

Numeric vector giving indices of features to include in the 'large' prediction model.

small_feature_vector

Numeric vector giving indices of features to include in the 'small' prediction model. Must be a subset of large_feature_vector.

conditional_surv_preds

User-provided estimates of the conditional survival functions of the event and censoring variables given the full covariate vector (if not using the vim() function to compute these nuisance estimates). Must be a named list of lists with elements S_hat, S_hat_train, G_hat, and G_hat_train. Each of these is itself a list of length K, where K is the number of cross-fitting folds. Each element of these lists is a matrix with J2 columns and number of rows equal to either the number of samples in the kth fold (for S_hat or G_hat) or the number of samples used to compute the nuisance estimator for the kth fold.

large_oracle_preds

User-provided estimates of the oracle prediction function using large_feature_vector. Must be a named list of lists with elements f_hat and f_hat_train. Each of these is itself a list of length K. Each element of these lists is a matrix with J1 columns (for landmark time VIMs) or 1 column (for "C-index" and "survival_time_MSE").

small_oracle_preds

User-provided estimates of the oracle prediction function using small_feature_vector. Must be a named list of lists with elements f_hat and f_hat_train. Each of these is itself a list of length K. Each element of these lists is a matrix with J1 columns (for landmark time VIMs) or 1 column (for "C-index" and "survival_time_MSE").

conditional_surv_generator

A user-written function to estimate the conditional survival functions of the event and censoring variables. Must take arguments time, event, folds (cross-fitting fold identifiers), and newtimes (times at which to generate predictions).

conditional_surv_generator_control

A list of arguments to pass to conditional_surv_generator.

large_oracle_generator

A user-written function to estimate the oracle prediction function using large_feature_vector.Must take arguments time, event, and folds (cross-fitting fold identifiers).

large_oracle_generator_control

A list of arguments to pass to large_oracle_generator.

small_oracle_generator

A user-written function to estimate the oracle prediction function using small_feature_vector.Must take arguments time, event, and folds (cross-fitting fold identifiers).

small_oracle_generator_control

A list of arguments to pass to small_oracle_generator.

cf_folds

Numeric vector of length n giving cross-fitting folds

cf_fold_num

The number of cross-fitting folds, if not providing cf_folds

sample_split

Logical indicating whether or not to sample split

ss_folds

Numeric vector of length n giving sample-splitting folds

robust

Logical, whether or not to use the doubly-robust debiasing approach. This option is meant for illustration purposes only — it should be left as TRUE.

scale_est

Logical, whether or not to force the VIM estimate to be nonnegative

alpha

The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05

verbose

Whether to print progress messages.

Value

Named list with the following elements:

result

Data frame giving results. See the documentation of the individual vim_* functions for details.

folds

A named list giving the cross-fitting fold IDs (cf_folds) and sample-splitting fold IDs (ss_folds).

approx_times

A vector of times used to approximate integrals appearing in the form of the VIM estimator.

conditional_surv_preds

A named list containing the estimated conditional event and censoring survival functions.

large_oracle_preds

A named list containing the estimated large oracle prediction function.

small_oracle_preds

A named list containing the estimated small oracle prediction function.

Examples

# This is a small simulation example
set.seed(123)
n <- 100
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))

T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 *  X[,1] * X[,2]))

C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C > 15] <- 15

time <- pmin(T, C)
event <- as.numeric(T <= C)

# landmark times for AUC
landmark_times <- c(3)

output <- vim(type = "AUC",
              time = time,
              event = event,
              X = X,
              landmark_times = landmark_times,
              large_feature_vector = 1:2,
              small_feature_vector = 2,
              conditional_surv_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
              large_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
              small_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
              cf_fold_num = 2,
              sample_split = FALSE,
              scale_est = TRUE)

print(output$result)
#>   landmark_time       est  var_est        cil       ciu cil_1sided  p
#> 1             3 0.2823303 1.407984 0.04976388 0.5148967 0.08715441 NA
#>   large_predictiveness small_predictiveness vim large_feature_vector
#> 1            0.8209323             0.538602 AUC                  1,2
#>   small_feature_vector
#> 1                    2