Estimate AUC VIM
Usage
vim(
type,
time,
event,
X,
landmark_times = stats::quantile(time[event == 1], probs = c(0.25, 0.5, 0.75)),
restriction_time = max(time[event == 1]),
approx_times = NULL,
large_feature_vector,
small_feature_vector,
conditional_surv_preds = NULL,
large_oracle_preds = NULL,
small_oracle_preds = NULL,
conditional_surv_generator = NULL,
conditional_surv_generator_control = NULL,
large_oracle_generator = NULL,
large_oracle_generator_control = NULL,
small_oracle_generator = NULL,
small_oracle_generator_control = NULL,
cf_folds = NULL,
cf_fold_num = 5,
sample_split = TRUE,
ss_folds = NULL,
robust = TRUE,
scale_est = FALSE,
alpha = 0.05,
verbose = FALSE
)
Arguments
- type
Type of VIM to compute. Options include
"accuracy"
,"AUC"
,"Brier"
,"R-squared"
"C-index"
, and"survival_time_MSE"
.- time
n x 1
numeric vector of observed follow-up times. If there is censoring, these are the minimum of the event and censoring times.- event
n x 1
numeric vector of status indicators of whether an event was observed.- X
n x p
data.frame of observed covariate values- landmark_times
Numeric vector of length J1 giving landmark times at which to estimate VIM (
"accuracy"
,"AUC"
,"Brier"
,"R-squared"
).- restriction_time
Maximum follow-up time for calculation of
"C-index"
and"survival_time_MSE"
.- approx_times
Numeric vector of length J2 giving times at which to approximate integrals. Defaults to a grid of 100 timepoints, evenly spaced on the quantile scale of the distribution of observed event times.
- large_feature_vector
Numeric vector giving indices of features to include in the 'large' prediction model.
- small_feature_vector
Numeric vector giving indices of features to include in the 'small' prediction model. Must be a subset of
large_feature_vector
.- conditional_surv_preds
User-provided estimates of the conditional survival functions of the event and censoring variables given the full covariate vector (if not using the
vim()
function to compute these nuisance estimates). Must be a named list of lists with elementsS_hat
,S_hat_train
,G_hat
, andG_hat_train
. Each of these is itself a list of lengthK
, whereK
is the number of cross-fitting folds. Each element of these lists is a matrix with J2 columns and number of rows equal to either the number of samples in thek
th fold (forS_hat
orG_hat
) or the number of samples used to compute the nuisance estimator for thek
th fold.- large_oracle_preds
User-provided estimates of the oracle prediction function using
large_feature_vector
. Must be a named list of lists with elementsf_hat
andf_hat_train
. Each of these is itself a list of lengthK
. Each element of these lists is a matrix with J1 columns (for landmark time VIMs) or 1 column (for"C-index"
and"survival_time_MSE"
).- small_oracle_preds
User-provided estimates of the oracle prediction function using
small_feature_vector
. Must be a named list of lists with elementsf_hat
andf_hat_train
. Each of these is itself a list of lengthK
. Each element of these lists is a matrix with J1 columns (for landmark time VIMs) or 1 column (for"C-index"
and"survival_time_MSE"
).- conditional_surv_generator
A user-written function to estimate the conditional survival functions of the event and censoring variables. Must take arguments
time
,event
,folds
(cross-fitting fold identifiers), andnewtimes
(times at which to generate predictions).- conditional_surv_generator_control
A list of arguments to pass to
conditional_surv_generator
.- large_oracle_generator
A user-written function to estimate the oracle prediction function using
large_feature_vector
.Must take argumentstime
,event
, andfolds
(cross-fitting fold identifiers).- large_oracle_generator_control
A list of arguments to pass to
large_oracle_generator
.- small_oracle_generator
A user-written function to estimate the oracle prediction function using
small_feature_vector
.Must take argumentstime
,event
, andfolds
(cross-fitting fold identifiers).- small_oracle_generator_control
A list of arguments to pass to
small_oracle_generator
.- cf_folds
Numeric vector of length
n
giving cross-fitting folds- cf_fold_num
The number of cross-fitting folds, if not providing
cf_folds
- sample_split
Logical indicating whether or not to sample split
- ss_folds
Numeric vector of length
n
giving sample-splitting folds- robust
Logical, whether or not to use the doubly-robust debiasing approach. This option is meant for illustration purposes only — it should be left as
TRUE
.- scale_est
Logical, whether or not to force the VIM estimate to be nonnegative
- alpha
The level at which to compute confidence intervals and hypothesis tests. Defaults to 0.05
- verbose
Whether to print progress messages.
Value
Named list with the following elements:
- result
Data frame giving results. See the documentation of the individual
vim_*
functions for details.- folds
A named list giving the cross-fitting fold IDs (
cf_folds
) and sample-splitting fold IDs (ss_folds
).- approx_times
A vector of times used to approximate integrals appearing in the form of the VIM estimator.
- conditional_surv_preds
A named list containing the estimated conditional event and censoring survival functions.
- large_oracle_preds
A named list containing the estimated large oracle prediction function.
- small_oracle_preds
A named list containing the estimated small oracle prediction function.
Examples
# This is a small simulation example
set.seed(123)
n <- 100
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))
T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2]))
C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C > 15] <- 15
time <- pmin(T, C)
event <- as.numeric(T <= C)
# landmark times for AUC
landmark_times <- c(3)
output <- vim(type = "AUC",
time = time,
event = event,
X = X,
landmark_times = landmark_times,
large_feature_vector = 1:2,
small_feature_vector = 2,
conditional_surv_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
large_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
small_oracle_generator_control = list(SL.library = c("SL.mean", "SL.glm")),
cf_fold_num = 2,
sample_split = FALSE,
scale_est = TRUE)
print(output$result)
#> landmark_time est var_est cil ciu cil_1sided p
#> 1 3 0.2823303 1.407984 0.04976388 0.5148967 0.08715441 NA
#> large_predictiveness small_predictiveness vim large_feature_vector
#> 1 0.8209323 0.538602 AUC 1,2
#> small_feature_vector
#> 1 2