Estimate a conditional survival function via local survival stacking
Arguments
- time
n x 1
numeric vector of observed follow-up times If there is censoring, these are the minimum of the event and censoring times.- event
n x 1
numeric vector of status indicators of whether an event was observed. Defaults to a vector of 1s, i.e. no censoring.- entry
Study entry variable, if applicable. Defaults to
NULL
, indicating that there is no truncation.- X
n x p
data.frame of observed covariate values on which to train the estimator.- newX
m x p
data.frame of new observed covariate values at which to obtainm
predictions for the estimated algorithm. Must have the same names and structure asX
.- newtimes
k x 1
numeric vector of times at which to obtaink
predicted conditional survivals.- direction
Whether the data come from a prospective or retrospective study. This determines whether the data are treated as subject to left truncation and right censoring (
"prospective"
) or right truncation alone ("retrospective"
).- bin_size
Size of bins for the discretization of time. A value between 0 and 1 indicating the size of observed event time quantiles on which to grid times (e.g. 0.02 creates a grid of 50 times evenly spaced on the quantile scaled). If NULL, defaults to every observed event time.
- time_basis
How to treat time for training the binary classifier. Options are
"continuous"
and"dummy"
, meaning an indicator variable is included for each time in the time grid.- learner
Which binary regression algorithm to use. Currently, only
SuperLearner
is supported, but more learners will be added. See below for algorithm-specific arguments.- SL_control
Named list of parameters controlling the Super Learner fitting process. These parameters are passed directly to the
SuperLearner
function. Parameters includeSL.library
(library of algorithms to include in the binary classification Super Learner),V
(Number of cross validation folds on which to train the Super Learner classifier, defaults to 10),method
(Method for estimating coefficients for the Super Learner, defaults to"method.NNLS"
),stratifyCV
(logical indicating whether to stratify by outcome inSuperLearner
's cross-validation scheme), andobsWeights
(observation weights, passed directly to prediction algorithms bySuperLearner
).- tau
The maximum time of interest in a study, used for retrospective conditional survival estimation. Rather than dealing with right truncation separately than left truncation, it is simpler to estimate the survival function of
tau - time
. Defaults toNULL
, in which case the maximum study entry time is chosen as the reference point.
Value
A named list of class stackL
.
- S_T_preds
An
m x k
matrix of estimated event time survival probabilities at them
covariate vector values andk
times provided by the user innewX
andnewtimes
, respectively.- fit
The Super Learner fit for binary classification on the stacked dataset.
References
Polley E.C. and van der Laan M.J. (2011). "Super Learning for Right-Censored Data" in Targeted Learning.
Craig E., Zhong C., and Tibshirani R. (2021). "Survival stacking: casting survival analysis as a classification problem."
See also
predict.stackL for stackL
prediction method.
Examples
# This is a small simulation example
set.seed(123)
n <- 500
X <- data.frame(X1 = rnorm(n), X2 = rbinom(n, size = 1, prob = 0.5))
S0 <- function(t, x){
pexp(t, rate = exp(-2 + x[,1] - x[,2] + .5 * x[,1] * x[,2]), lower.tail = FALSE)
}
T <- rexp(n, rate = exp(-2 + X[,1] - X[,2] + .5 * X[,1] * X[,2]))
G0 <- function(t, x) {
as.numeric(t < 15) *.9*pexp(t,
rate = exp(-2 -.5*x[,1]-.25*x[,2]+.5*x[,1]*x[,2]),
lower.tail=FALSE)
}
C <- rexp(n, exp(-2 -.5 * X[,1] - .25 * X[,2] + .5 * X[,1] * X[,2]))
C[C > 15] <- 15
entry <- runif(n, 0, 15)
time <- pmin(T, C)
event <- as.numeric(T <= C)
sampled <- which(time >= entry)
X <- X[sampled,]
time <- time[sampled]
event <- event[sampled]
entry <- entry[sampled]
# Note that this a very small Super Learner library, for computational purposes.
SL.library <- c("SL.mean", "SL.glm")
fit <- stackL(time = time,
event = event,
entry = entry,
X = X,
newX = X,
newtimes = seq(0, 15, .1),
direction = "prospective",
bin_size = 0.1,
time_basis = "continuous",
SL_control = list(SL.library = SL.library,
V = 5))
plot(fit$S_T_preds[1,], S0(t = seq(0, 15, .1), X[1,]))
abline(0,1,col='red')