This functions provides a flexible interface to create a data set that
can be plugged in as newdata
argument to a suitable predict
function (or similar).
The function is particularly useful in combination with one of the
add_*
functions, e.g., add_term
,
add_hazard
, etc.
make_newdata(x, ...)
# S3 method for default
make_newdata(x, ...)
# S3 method for ped
make_newdata(x, ...)
# S3 method for fped
make_newdata(x, ...)
A data frame (or object that inherits from data.frame
).
Covariate specifications (expressions) that will be evaluated
by looking for variables in x
. Must be of the form z = f(z)
where z
is a variable in the data set and f
a known
function that can be usefully applied to z
. Note that this is also
necessary for single value specifications (e.g. age = c(50)
).
For data in PED (piece-wise exponential data) format, one can also specify
the time argument, but see "Details" an "Examples" below.
Depending on the type of variables in x
, mean or modus values
will be used for variables not specified in ellipsis
(see also sample_info
). If x
is an object
that inherits from class ped
, useful data set completion will be
attempted depending on variables specified in ellipsis. This is especially
useful, when creating a data set with different time points, e.g. to
calculate survival probabilities over time (add_surv_prob
)
or to calculate a time-varying covariate effects (add_term
).
To do so, the time variable has to be specified in ...
, e.g.,
tend = seq_range(tend, 20)
. The problem with this specification is that
not all values produced by seq_range(tend, 20)
will be actual values
of tend
used at the stage of estimation (and in general, it will
often be tedious to specify exact tend
values). make_newdata
therefore finds the correct interval and sets tend
to the respective
interval endpoint. For example, if the intervals of the PED object are
\((0,1], (1,2]\) then tend = 1.5
will be set to 2
and the
remaining time-varying information (e.g. offset) completed accordingly.
See examples below.
# General functionality
tumor %>% make_newdata()
#> # A tibble: 1 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 62.0 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(age=c(50))
#> # A tibble: 1 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 50 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55))
#> # A tibble: 6 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1 0.483 2.78 50 male no no yes
#> 2 1904. 0.483 2.78 50 male no no yes
#> 3 3806 0.483 2.78 50 male no no yes
#> 4 1 0.483 2.78 55 male no no yes
#> 5 1904. 0.483 2.78 55 male no no yes
#> 6 3806 0.483 2.78 55 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55))
#> # A tibble: 12 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <int> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1 0 2.78 50 male no no yes
#> 2 1904. 0 2.78 50 male no no yes
#> 3 3806 0 2.78 50 male no no yes
#> 4 1 1 2.78 50 male no no yes
#> 5 1904. 1 2.78 50 male no no yes
#> 6 3806 1 2.78 50 male no no yes
#> 7 1 0 2.78 55 male no no yes
#> 8 1904. 0 2.78 55 male no no yes
#> 9 3806 0 2.78 55 male no no yes
#> 10 1 1 2.78 55 male no no yes
#> 11 1904. 1 2.78 55 male no no yes
#> 12 3806 1 2.78 55 male no no yes
#> # ℹ 1 more variable: resection <fct>
# mean/modus values of unspecified variables are calculated over whole data
tumor %>% make_newdata(sex=unique(sex))
#> # A tibble: 2 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1017. 0.483 2.78 62.0 female no no yes
#> 2 1017. 0.483 2.78 62.0 male no no yes
#> # ℹ 1 more variable: resection <fct>
tumor %>% group_by(sex) %>% make_newdata()
#> # A tibble: 2 × 9
#> days status charlson_score age sex transfusion complications metastases
#> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
#> 1 1060. 0.483 2.96 63.3 male no no yes
#> 2 954. 0.484 2.52 60.1 female no no yes
#> # ℹ 1 more variable: resection <fct>
# Examples for PED data
ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000))
ped %>% make_newdata(age=c(50, 55))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 1.8 6.21 0 2 50
#> 2 0 500 500 (0,500] 1.8 6.21 0 2 55
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
# if time information is specified, other time variables will be specified
# accordingly and offset calculated correctly
ped %>% make_newdata(tend = c(1000), age = c(50, 55))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 500 1000 500 (500,1000] 1.8 6.21 0 2 50
#> 2 500 1000 500 (500,1000] 1.8 6.21 0 2 55
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
ped %>% make_newdata(tend = unique(tend))
#> # A tibble: 2 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 1.8 6.21 0 2 58.8
#> 2 500 1000 500 (500,1000] 1.8 6.21 0 2 58.8
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
ped %>% group_by(sex) %>% make_newdata(tend = unique(tend))
#> # A tibble: 4 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 500 500 (0,500] 2 6.21 0 2 52
#> 2 0 500 500 (0,500] 1.67 6.21 0 2 63.3
#> 3 500 1000 500 (500,1000] 2 6.21 0 2 52
#> 4 500 1000 500 (500,1000] 1.67 6.21 0 2 63.3
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>
# tend is set to the end point of respective interval:
ped <- tumor %>% as_ped(Surv(days, status)~.)
seq_range(ped$tend, 3)
#> [1] 1.0 1517.5 3034.0
make_newdata(ped, tend = seq_range(tend, 3))
#> Some values of 'tend' have been set to the respective interval end-points
#> # A tibble: 3 × 14
#> tstart tend intlen interval id offset ped_status charlson_score age
#> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1 1 (0,1] 393. 0 0 2.73 61.3
#> 2 1502 1538 36 (1502,1538] 393. 3.58 0 2.73 61.3
#> 3 2808 3034 226 (2808,3034] 393. 5.42 0 2.73 61.3
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> # metastases <fct>, resection <fct>