This functions provides a flexible interface to create a data set that can be plugged in as newdata argument to a suitable predict function (or similar). The function is particularly useful in combination with one of the add_* functions, e.g., add_term, add_hazard, etc.

make_newdata(x, ...)

# S3 method for default
make_newdata(x, ...)

# S3 method for ped
make_newdata(x, ...)

# S3 method for fped
make_newdata(x, ...)

Arguments

x

A data frame (or object that inherits from data.frame).

...

Covariate specifications (expressions) that will be evaluated by looking for variables in x. Must be of the form z = f(z) where z is a variable in the data set and f a known function that can be usefully applied to z. Note that this is also necessary for single value specifications (e.g. age = c(50)). For data in PED (piece-wise exponential data) format, one can also specify the time argument, but see "Details" an "Examples" below.

Details

Depending on the type of variables in x, mean or modus values will be used for variables not specified in ellipsis (see also sample_info). If x is an object that inherits from class ped, useful data set completion will be attempted depending on variables specified in ellipsis. This is especially useful, when creating a data set with different time points, e.g. to calculate survival probabilities over time (add_surv_prob) or to calculate a time-varying covariate effects (add_term). To do so, the time variable has to be specified in ..., e.g., tend = seq_range(tend, 20). The problem with this specification is that not all values produced by seq_range(tend, 20) will be actual values of tend used at the stage of estimation (and in general, it will often be tedious to specify exact tend values). make_newdata therefore finds the correct interval and sets tend to the respective interval endpoint. For example, if the intervals of the PED object are \((0,1], (1,2]\) then tend = 1.5 will be set to 2 and the remaining time-varying information (e.g. offset) completed accordingly. See examples below.

Examples

# General functionality
tumor %>% make_newdata()
#> # A tibble: 1 × 9
#>    days status charlson_score   age sex   transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <fct> <fct>       <fct>         <fct>     
#> 1 1017.  0.483           2.78  62.0 male  no          no            yes       
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(age=c(50))
#> # A tibble: 1 × 9
#>    days status charlson_score   age sex   transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <fct> <fct>       <fct>         <fct>     
#> 1 1017.  0.483           2.78    50 male  no          no            yes       
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), age=c(50, 55))
#> # A tibble: 6 × 9
#>    days status charlson_score   age sex   transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <fct> <fct>       <fct>         <fct>     
#> 1    1   0.483           2.78    50 male  no          no            yes       
#> 2 1904.  0.483           2.78    50 male  no          no            yes       
#> 3 3806   0.483           2.78    50 male  no          no            yes       
#> 4    1   0.483           2.78    55 male  no          no            yes       
#> 5 1904.  0.483           2.78    55 male  no          no            yes       
#> 6 3806   0.483           2.78    55 male  no          no            yes       
#> # ℹ 1 more variable: resection <fct>
tumor %>% make_newdata(days=seq_range(days, 3), status=unique(status), age=c(50, 55))
#> # A tibble: 12 × 9
#>     days status charlson_score   age sex   transfusion complications metastases
#>    <dbl>  <int>          <dbl> <dbl> <fct> <fct>       <fct>         <fct>     
#>  1    1       0           2.78    50 male  no          no            yes       
#>  2 1904.      0           2.78    50 male  no          no            yes       
#>  3 3806       0           2.78    50 male  no          no            yes       
#>  4    1       1           2.78    50 male  no          no            yes       
#>  5 1904.      1           2.78    50 male  no          no            yes       
#>  6 3806       1           2.78    50 male  no          no            yes       
#>  7    1       0           2.78    55 male  no          no            yes       
#>  8 1904.      0           2.78    55 male  no          no            yes       
#>  9 3806       0           2.78    55 male  no          no            yes       
#> 10    1       1           2.78    55 male  no          no            yes       
#> 11 1904.      1           2.78    55 male  no          no            yes       
#> 12 3806       1           2.78    55 male  no          no            yes       
#> # ℹ 1 more variable: resection <fct>
# mean/modus values of unspecified variables are calculated over whole data
tumor %>% make_newdata(sex=unique(sex))
#> # A tibble: 2 × 9
#>    days status charlson_score   age sex    transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <fct>  <fct>       <fct>         <fct>     
#> 1 1017.  0.483           2.78  62.0 female no          no            yes       
#> 2 1017.  0.483           2.78  62.0 male   no          no            yes       
#> # ℹ 1 more variable: resection <fct>
tumor %>% group_by(sex) %>% make_newdata()
#> # A tibble: 2 × 9
#>    days status charlson_score   age sex    transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <fct>  <fct>       <fct>         <fct>     
#> 1 1060.  0.483           2.96  63.3 male   no          no            yes       
#> 2  954.  0.484           2.52  60.1 female no          no            yes       
#> # ℹ 1 more variable: resection <fct>
# You can also pass a part of the data sets as data frame to make_newdata
purrr::cross_df(list(days = c(0, 500, 1000), sex = c("male", "female"))) %>%
  make_newdata(x=tumor)
#> Warning: `cross_df()` was deprecated in purrr 1.0.0.
#>  Please use `tidyr::expand_grid()` instead.
#>  See <https://github.com/tidyverse/purrr/issues/768>.
#> # A tibble: 6 × 9
#>    days status charlson_score   age sex    transfusion complications metastases
#>   <dbl>  <dbl>          <dbl> <dbl> <chr>  <fct>       <fct>         <fct>     
#> 1     0  0.483           2.78  62.0 male   no          no            yes       
#> 2   500  0.483           2.78  62.0 male   no          no            yes       
#> 3  1000  0.483           2.78  62.0 male   no          no            yes       
#> 4     0  0.483           2.78  62.0 female no          no            yes       
#> 5   500  0.483           2.78  62.0 female no          no            yes       
#> 6  1000  0.483           2.78  62.0 female no          no            yes       
#> # ℹ 1 more variable: resection <fct>

# Examples for PED data
ped <- tumor %>% slice(1:3) %>% as_ped(Surv(days, status)~., cut = c(0, 500, 1000))
ped %>% make_newdata(age=c(50, 55))
#>   tstart tend intlen interval  id   offset ped_status charlson_score age    sex
#> 1      0  500    500  (0,500] 1.8 6.214608          0              2  50 female
#> 2      0  500    500  (0,500] 1.8 6.214608          0              2  55 female
#>   transfusion complications metastases resection
#> 1         yes            no        yes        no
#> 2         yes            no        yes        no

# if time information is specified, other time variables will be specified
# accordingly and offset calculated correctly
ped %>% make_newdata(tend = c(1000), age = c(50, 55))
#>   tstart tend intlen   interval  id   offset ped_status charlson_score age
#> 1    500 1000    500 (500,1000] 1.8 6.214608          0              2  50
#> 2    500 1000    500 (500,1000] 1.8 6.214608          0              2  55
#>      sex transfusion complications metastases resection
#> 1 female         yes            no        yes        no
#> 2 female         yes            no        yes        no
ped %>% make_newdata(tend = unique(tend))
#>   tstart tend intlen   interval  id   offset ped_status charlson_score  age
#> 1      0  500    500    (0,500] 1.8 6.214608          0              2 58.8
#> 2    500 1000    500 (500,1000] 1.8 6.214608          0              2 58.8
#>      sex transfusion complications metastases resection
#> 1 female         yes            no        yes        no
#> 2 female         yes            no        yes        no
ped %>% group_by(sex) %>% make_newdata(tend = unique(tend))
#> # A tibble: 4 × 14
#>   tstart  tend intlen interval      id offset ped_status charlson_score   age
#>    <dbl> <dbl>  <dbl> <fct>      <dbl>  <dbl>      <dbl>          <dbl> <dbl>
#> 1      0   500    500 (0,500]     2      6.21          0              2  52  
#> 2      0   500    500 (0,500]     1.67   6.21          0              2  63.3
#> 3    500  1000    500 (500,1000]  2      6.21          0              2  52  
#> 4    500  1000    500 (500,1000]  1.67   6.21          0              2  63.3
#> # ℹ 5 more variables: sex <fct>, transfusion <fct>, complications <fct>,
#> #   metastases <fct>, resection <fct>

# tend is set to the end point of respective interval:
ped <- tumor %>% as_ped(Surv(days, status)~.)
seq_range(ped$tend, 3)
#> [1]    1.0 1517.5 3034.0
make_newdata(ped, tend = seq_range(tend, 3))
#> Some values of 'tend' have been set to the respective interval end-points
#>   tstart tend intlen    interval       id   offset ped_status charlson_score
#> 1      0    1      1       (0,1] 392.6801 0.000000          0        2.72929
#> 2   1502 1538     36 (1502,1538] 392.6801 3.583519          0        2.72929
#> 3   2808 3034    226 (2808,3034] 392.6801 5.420535          0        2.72929
#>        age  sex transfusion complications metastases resection
#> 1 61.31348 male          no            no        yes        no
#> 2 61.31348 male          no            no        yes        no
#> 3 61.31348 male          no            no        yes        no