This functions provides a flexible interface to create a data set that
can be plugged in as newdata argument to a suitable predict
function (or similar).
The function is particularly useful in combination with one of the
add_* functions, e.g., add_term,
add_hazard, etc.
make_newdata(x, ...)
# Default S3 method
make_newdata(x, ...)
# S3 method for class 'ped'
make_newdata(x, ...)
# S3 method for class 'fped'
make_newdata(x, ...)A data frame (or object that inherits from data.frame).
Covariate specifications (expressions) that will be evaluated
by looking for variables in x. Must be of the form z = f(z)
where z is a variable in the data set and f a known
function that can be usefully applied to z. Note that this is also
necessary for single value specifications (e.g. age = c(50)).
For data in PED (piece-wise exponential data) format, one can also specify
the time argument, but see "Details" an "Examples" below.
Depending on the type of variables in x, mean or modus values
will be used for variables not specified in ellipsis
(see also sample_info). If x is an object
that inherits from class ped, useful data set completion will be
attempted depending on variables specified in ellipsis. This is especially
useful, when creating a data set with different time points, e.g. to
calculate survival probabilities over time (add_surv_prob)
or to calculate a time-varying covariate effects (add_term).
To do so, the time variable has to be specified in ..., e.g.,
tend = seq_range(tend, 20). The problem with this specification is that
not all values produced by seq_range(tend, 20) will be actual values
of tend used at the stage of estimation (and in general, it will
often be tedious to specify exact tend values). make_newdata
therefore finds the correct interval and sets tend to the respective
interval endpoint. For example, if the intervals of the PED object are
\((0,1], (1,2]\) then tend = 1.5 will be set to 2 and the
remaining time-varying information (e.g. offset) completed accordingly.
See examples below.
# General functionality
tumor %>% make_newdata()
#> # A tibble: 1 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 62.0 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(age=c(50))
#> # A tibble: 1 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 50 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55))
#> # A tibble: 6 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1 0.483 2.78 50 male no no yes
#> 2 1904. 0.483 2.78 50 male no no yes
#> 3 3806 0.483 2.78 50 male no no yes
#> 4 1 0.483 2.78 55 male no no yes
#> 5 1904. 0.483 2.78 55 male no no yes
#> 6 3806 0.483 2.78 55 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55))
#> # A tibble: 12 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <int> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1 0 2.78 50 male no no yes
#> 2 1904. 0 2.78 50 male no no yes
#> 3 3806 0 2.78 50 male no no yes
#> 4 1 1 2.78 50 male no no yes
#> 5 1904. 1 2.78 50 male no no yes
#> 6 3806 1 2.78 50 male no no yes
#> 7 1 0 2.78 55 male no no yes
#> 8 1904. 0 2.78 55 male no no yes
#> 9 3806 0 2.78 55 male no no yes
#> 10 1 1 2.78 55 male no no yes
#> 11 1904. 1 2.78 55 male no no yes
#> 12 3806 1 2.78 55 male no no yes
#> # ℹ 1 more variable: resection <fct>
# mean/modus values of unspecified variables are calculated over whole data
tumor %>% make_newdata(sex=unique(sex))
#> # A tibble: 2 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 62.0 female no no yes
#> 2 1017. 0.483 2.78 62.0 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% group_by(sex) %>% make_newdata()
#> # A tibble: 2 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1060. 0.483 2.96 63.3 male no no yes
#> 2 954. 0.484 2.52 60.1 female no no yes
#> # ℹ 1 more variable: resection <fct>
# Examples for PED data
ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000))
ped %>% make_newdata(age=c(50, 55))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 1.8 6.21 0 2 50
#> 2 0 500 500 (0,500] 1.8 6.21 0 2 55
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
# if time information is specified, other time variables will be specified
# accordingly and offset calculated correctly
ped %>% make_newdata(tend = c(1000), age = c(50, 55))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 500 1000 500 (500,1000] 1.8 6.21 0 2 50
#> 2 500 1000 500 (500,1000] 1.8 6.21 0 2 55
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
ped %>% make_newdata(tend = unique(tend))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 1.8 6.21 0 2 58.8
#> 2 500 1000 500 (500,1000] 1.8 6.21 0 2 58.8
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
ped %>% group_by(sex) %>% make_newdata(tend = unique(tend))
#> # A tibble: 4 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 2 6.21 0 2 52
#> 2 0 500 500 (0,500] 1.67 6.21 0 2 63.3
#> 3 500 1000 500 (500,1000] 2 6.21 0 2 52
#> 4 500 1000 500 (500,1000] 1.67 6.21 0 2 63.3
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
# tend is set to the end point of respective interval:
ped <- tumor %>% as_ped(Surv(days, status)~.)
seq_range(ped$tend, 3)
#> [1] 1.0 1517.5 3034.0
make_newdata(ped, tend = seq_range(tend, 3))
#> Not all requested timepoints correspond to original cut points.
#> # A tibble: 3 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1 1 (0,1] 393. 0 0 2.73 61.3
#> 2 1502 1518. 15.5 (1502,1538] 393. 2.74 0 2.73 61.3
#> 3 2808 3034 226 (2808,3034] 393. 5.42 0 2.73 61.3
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>