In this vignette we demonstrate the offered functionality for pooling multiple surveys.
We offer convenience functions that allow for easily performing the pooling approach:
get_surveys
: Wrapper that uses scrape_wahlrecht
and collapse_parties
to download the most current survey results from https://www.wahlrecht.de/ and stores the prepared data inside a nested tibble
(see tidyr::nest
)
pool_surveys
: Pool all newest surveys (obtained with get_surveys
), using a specified time window that defaults to the last 14 days, and assuming a certain correlation between the number of party-specific votes of any two polling agencies, which defaults to 0.5. Per polling agency only the newest survey in the time windows is considered.
The three arguments last_date
, period
and period_extended
define the time window used in pool_surveys
. Using these arguments one can choose between two types of pooling:
If period_extended
equals NA
: Surveys in the time window from last_date
to last_date - period
will be considered for each polling agency.
If period_extended
does not equal NA
: Same as 1. Additionally however, surveys in the time window from last_date - period
to last_date - period_extended
will also be considered for each polling agency, but only after downweighting them by halving their true sample size.
The latter option can be especially useful if opinion polls for a specific election are only published very rarely. As default, pool_surveys
uses a time window starting from the current date and going 14 days back, not making use of period_extended
.
# Scrape current surveys from the major polling agencies in Germany
# surveys <- get_surveys()
# As the web connection is sometimes a bit unstable we here use the sample data set of pre-scraped surveys
surveys <- coalitions::surveys_sample
surveys
## # A tibble: 7 x 2
## pollster surveys
## <chr> <list>
## 1 allensbach <tibble[,5] [3 × 5]>
## 2 emnid <tibble[,5] [3 × 5]>
## 3 fgw <tibble[,5] [3 × 5]>
## 4 forsa <tibble[,5] [3 × 5]>
## 5 gms <tibble[,5] [3 × 5]>
## 6 infratest <tibble[,5] [3 × 5]>
## 7 insa <tibble[,5] [3 × 5]>
# Obtain the pooled sample for today, based on the last 14 days
last_date <- surveys %>% tidyr::unnest() %>% pull(date) %>% max()
pool <- pool_surveys(surveys, last_date = last_date)
pool %>% select(-start, -end)
## # A tibble: 7 x 6
## pollster date respondents party percent votes
## <chr> <date> <dbl> <chr> <dbl> <dbl>
## 1 pooled 2017-09-02 3055. afd 8.89 272.
## 2 pooled 2017-09-02 3055. cdu 38.0 1161.
## 3 pooled 2017-09-02 3055. fdp 8.52 260.
## 4 pooled 2017-09-02 3055. greens 7.41 226.
## 5 pooled 2017-09-02 3055. left 9.06 277.
## 6 pooled 2017-09-02 3055. others 4.51 138.
## 7 pooled 2017-09-02 3055. spd 23.6 722.