This is the general data transformation function provided by the pammtools package. The following main applications must be distinguished:

  1. Transformation of standard time-to-event data.

  2. Transformation of left-truncated time-to-event data.

  3. Transformation of time-to-event data with time-dependent covariates (TDC).

  4. Transformation of competing risks data (single or stacked data sets).

  5. Transformation of recurrent events and multi-state data.

For TDC data, the type of effect one wants to estimate is also important for the data transformation step. In case of TDCs, the right-hand-side of the formula can contain formula specials concurrent and cumulative.

as_ped(data, ...)

# S3 method for class 'data.frame'
as_ped(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  transition = character(),
  timescale = c("gap", "calendar"),
  min_events = 1L,
  ...
)

# S3 method for class 'nested_fdf'
as_ped(data, formula, ...)

# S3 method for class 'list'
as_ped(
  data,
  formula,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  ...
)

is.ped(x)

# S3 method for class 'ped'
as_ped(data, newdata, ...)

# S3 method for class 'pamm'
as_ped(data, newdata, ...)

as_ped_multistate(
  data,
  formula,
  cut = NULL,
  max_time = NULL,
  tdc_specials = c("concurrent", "cumulative"),
  censor_code = 0L,
  transition = character(),
  timescale = c("gap", "calendar"),
  min_events = 1L,
  ...
)

Arguments

data

Either an object inheriting from data frame or in case of time-dependent covariates a list of data frames (of length 2), where the first data frame contains the time-to-event information and static covariates while the second (and potentially further data frames) contain information on time-dependent covariates and the times at which they have been observed.

...

Further arguments passed to the data.frame method and eventually to survSplit.

formula

A two sided formula with a Surv object on the left-hand-side and covariate specification on the right-hand-side (RHS). The RHS can be an extended formula, which specifies how TDCs should be transformed using specials concurrent and cumulative. The left-hand-side can be in start-stop notation. This, however, is only used to create left-truncated data and does not support the full functionality.

cut

Split points, used to partition the follow-up into intervals. If unspecified, all unique event times will be used. For competing risks, when combine = TRUE split points are derived from all event types combined.

max_time

If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.

tdc_specials

A character vector of names of potential specials in formula for concurrent and/or cumulative effects.

censor_code

Specifies the value of the status variable that indicates censoring. Often this will be 0, which is the default.

transition

Character string. Name of the column in data that identifies the transition type in multi-state models. When supplied, as_ped performs the multi-state PED transformation, stacking interval-transition rows for each subject.

timescale

Character string, either "gap" (time since last transition) or "calendar" (time since study entry, not reset after each transition).

x

any R object.

newdata

A new data set (data.frame) that contains the same variables that were used to create the PED object (data).

combine

Logical. Only relevant for competing risks data. If TRUE (the default), cause-specific data sets are stacked into a single data frame with an additional cause column, using split points common to all event types. If FALSE, a list of cause-specific data sets is returned. See the competing-risks vignette for details.

id

Character string. Name of the subject identifier variable in data.

Value

For standard and left-truncated data, a data frame of class ped in piece-wise exponential data format. For competing risks data, either a stacked data frame of class ped_cr (when combine = TRUE) or a list of cause-specific ped data frames of class ped_cr_list (when combine = FALSE). For multistate data, the result is a stacked long-format dataset with one row per subject, interval, and transition, which can be passed directly to a Poisson regression model.

Details

For competing risks data, as_ped can return either:

  • A list of cause-specific data sets (combine = FALSE), where each element corresponds to one event type and uses cause-specific interval split points. This is suitable for cause-specific hazards models without shared effects.

  • A single stacked data set (combine = TRUE, the default), where all cause-specific data sets are combined with a cause column as covariate. Common split points are derived from all event times. This is required for models with shared covariate effects across causes, estimated via interaction terms (e.g., s(tend, by = cause)).

For multi-state data, as_ped extends the standard PED transformation to each transition type. The follow-up of each subject is split at all observed transition times across the entire dataset, and a row is added for every interval-transition combination the subject is at risk for. Two key differences arise compared to the single-event case:

  • Delayed entry into the risk set is handled automatically, since subjects are only at risk for transitions out of a state after they have entered it.

  • Competing events are treated as censoring for all other transitions within the same interval.

In any case, the data transformation is specified by a two-sided formula. See the data-transformation, competing-risks, and recurrent-events vignettes for details.

Examples

# Standard single-event transformation
tumor[1:3, ]
#> # A tibble: 3 × 9
#>    days status charlson_score   age sex    transfusion complications metastases
#>   <dbl>  <int>          <int> <int> <fct>  <fct>       <fct>         <fct>     
#> 1   579      0              2    58 female yes         no            yes       
#> 2  1192      0              2    52 male   no          yes           yes       
#> 3   308      1              2    74 female yes         no            yes       
#> # ℹ 1 more variable: resection <fct>
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex, cut = c(0, 500, 1000))
#>   id tstart tend   interval   offset ped_status age    sex
#> 1  1      0  500    (0,500] 6.214608          0  58 female
#> 2  1    500 1000 (500,1000] 4.369448          0  58 female
#> 3  2      0  500    (0,500] 6.214608          0  52   male
#> 4  2    500 1000 (500,1000] 6.214608          0  52   male
#> 5  3      0  500    (0,500] 5.730100          1  74 female
tumor[1:3, ] %>% as_ped(Surv(days, status) ~ age + sex)
#>   id tstart tend interval offset ped_status age    sex
#> 1  1      0  308  (0,308] 5.7301          0  58 female
#> 2  2      0  308  (0,308] 5.7301          0  52   male
#> 3  3      0  308  (0,308] 5.7301          1  74 female

# Competing risks: stacked data set (combine = TRUE, default)
# Suitable for cause-specific hazards models with shared effects,
# estimated via interaction terms e.g. s(tend, by = cause)
if (FALSE) { # \dontrun{
data("fourD", package = "etm")
ped_stacked <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id")
head(ped_stacked)

# Competing risks: list output (combine = FALSE)
# Suitable for cause-specific hazards models without shared effects
ped_list <- fourD %>%
  as_ped(Surv(time, status) ~ ., id = "id", combine = FALSE)
# ped_list[[1]]: data for cause 1 (cardiovascular death)
# ped_list[[2]]: data for cause 2 (death from other causes)
head(ped_list[[1]])
head(ped_list[[2]])

# Multi-state: illness-death model on calendar timescale
# Uses the prothr data (liver cirrhosis patients, n = 488) from mstate.
# Patients can transition between normal (1) and abnormal (2) prothrombin
# levels and death (3): transitions 1->2, 1->3, 2->1, 2->3.
# Calendar timescale is used because hazards depend on overall disease
# duration, not time since last transition.
data("prothr", package = "mstate")
ped_msm <- prothr %>%
  filter(Tstart != Tstop) %>%
  as_ped(
    formula    = Surv(Tstart, Tstop, status) ~ .,
    transition = "trans",
    id         = "id",
    timescale  = "calendar",
)
head(ped_msm)
} # }
if (FALSE) { # \dontrun{
data("cgd", package = "frailtyHL")
cgd2 <- cgd %>%
 select(id, tstart, tstop, enum, status, age) %>%
 filter(enum %in% c(1:2))
ped_re <- as_ped_multistate(
  formula = Surv(tstart, tstop, status) ~ age + enum,
  data = cgd2,
 transition = "enum",
 timescale = "calendar")
} # }