This is the general data transformation function provided by the
pammtools package. The following main applications must be distinguished:
Transformation of standard time-to-event data.
Transformation of left-truncated time-to-event data.
Transformation of time-to-event data with time-dependent covariates (TDC).
Transformation of competing risks data (single or stacked data sets).
Transformation of recurrent events and multi-state data.
For TDC data, the type of effect one wants to estimate is also
important for the data transformation step. In case of TDCs, the
right-hand-side of the formula can contain formula specials
concurrent and cumulative.
as_ped(data, ...)
# S3 method for class 'data.frame'
as_ped(
data,
formula,
cut = NULL,
max_time = NULL,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
transition = character(),
timescale = c("gap", "calendar"),
min_events = 1L,
...
)
# S3 method for class 'nested_fdf'
as_ped(data, formula, ...)
# S3 method for class 'list'
as_ped(
data,
formula,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
...
)
is.ped(x)
# S3 method for class 'ped'
as_ped(data, newdata, ...)
# S3 method for class 'pamm'
as_ped(data, newdata, ...)
as_ped_multistate(
data,
formula,
cut = NULL,
max_time = NULL,
tdc_specials = c("concurrent", "cumulative"),
censor_code = 0L,
transition = character(),
timescale = c("gap", "calendar"),
min_events = 1L,
...
)Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed.
Further arguments passed to the data.frame method and
eventually to survSplit.
A two sided formula with a Surv object
on the left-hand-side and covariate specification on the right-hand-side (RHS).
The RHS can be an extended formula, which specifies how TDCs should be
transformed using specials concurrent and cumulative. The
left-hand-side can be in start-stop notation. This, however, is only used
to create left-truncated data and does not support the full functionality.
Split points, used to partition the follow-up into intervals.
If unspecified, all unique event times will be used. For competing risks,
when combine = TRUE split points are derived from all event types
combined.
If cut is unspecified, this will be the last
possible event time. All event times after max_time will be
administratively censored at max_time.
A character vector of names of potential specials in
formula for concurrent and/or cumulative effects.
Specifies the value of the status variable that indicates
censoring. Often this will be 0, which is the default.
Character string. Name of the column in data that
identifies the transition type in multi-state models. When supplied,
as_ped performs the multi-state PED transformation, stacking
interval-transition rows for each subject.
Character string, either "gap" (time since
last transition) or "calendar" (time since study entry, not reset
after each transition).
any R object.
A new data set (data.frame) that contains the same
variables that were used to create the PED object (data).
Logical. Only relevant for competing risks data. If
TRUE (the default), cause-specific data sets are stacked into a
single data frame with an additional cause column, using split points
common to all event types. If FALSE, a list of cause-specific data
sets is returned. See the
competing-risks
vignette for details.
Character string. Name of the subject identifier variable in
data.
For standard and left-truncated data, a data frame of class
ped in piece-wise exponential data format. For competing risks data,
either a stacked data frame of class ped_cr (when
combine = TRUE) or a list of cause-specific ped data frames
of class ped_cr_list (when combine = FALSE). For multistate data,
the result is a stacked long-format dataset with one row per subject,
interval, and transition, which can be passed directly to a Poisson
regression model.
For competing risks data, as_ped can return either:
A list of cause-specific data sets (combine = FALSE), where
each element corresponds to one event type and uses cause-specific
interval split points. This is suitable for cause-specific hazards
models without shared effects.
A single stacked data set (combine = TRUE, the default),
where all cause-specific data sets are combined with a cause
column as covariate. Common split points are derived from all event
times. This is required for models with shared covariate effects across
causes, estimated via interaction terms (e.g.,
s(tend, by = cause)).
For multi-state data, as_ped extends the standard PED transformation
to each transition type. The follow-up of each subject is split at all
observed transition times across the entire dataset, and a row is added for
every interval-transition combination the subject is at risk for. Two key
differences arise compared to the single-event case:
Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.
Competing events are treated as censoring for all other transitions within the same interval.
In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.
# Standard single-event transformation
tumor[1:3, ]
#> # A tibble: 3 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <int> <int> <int> <fct> <fct> <fct> <fct>
#> 1 579 0 2 58 female yes no yes
#> 2 1192 0 2 52 male no yes yes
#> 3 308 1 2 74 female yes no yes
#> # ℹ 1 more variable: resection <fct>
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
#> id tstart tend interval offset ped_status age sex
#> 1 1 0 500 (0,500] 6.214608 0 58 female
#> 2 1 500 1000 (500,1000] 4.369448 0 58 female
#> 3 2 0 500 (0,500] 6.214608 0 52 male
#> 4 2 500 1000 (500,1000] 6.214608 0 52 male
#> 5 3 0 500 (0,500] 5.730100 1 74 female
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)
#> id tstart tend interval offset ped_status age sex
#> 1 1 0 308 (0,308] 5.7301 0 58 female
#> 2 2 0 308 (0,308] 5.7301 0 52 male
#> 3 3 0 308 (0,308] 5.7301 1 74 female
# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
if (FALSE) { # \dontrun{
data("fourD", package = "etm")
ped_stacked <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)
# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])
# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
filter(Tstart != Tstop) %>%
as_ped(
formula = Surv(Tstart, Tstop, status) ~ .,
transition = "trans",
id = "id",
timescale = "calendar",
)
head(ped_msm)
} # }
if (FALSE) { # \dontrun{
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
select(id, tstart, tstop, enum, status, age) %>%
filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
formula = Surv(tstart, tstop, status) ~ age + enum,
data = cgd2,
transition = "enum",
timescale = "calendar")
} # }