In this vignette we demonstrate the offered functionality for pooling multiple surveys.
We offer convenience functions that allow for easily performing the pooling approach:
get_surveys: Wrapper that uses
scrape_wahlrecht and collapse_parties to
download the most current survey results from https://www.wahlrecht.de/ and stores the prepared data
inside a nested tibble (see
tidyr::nest)
pool_surveys: Pool all newest surveys (obtained with
get_surveys), using a specified time window that defaults
to the last 14 days, and assuming a certain correlation between the
number of party-specific votes of any two polling agencies, which
defaults to 0.5. Per polling agency only the newest survey in the time
windows is considered.
The three arguments last_date, period and
period_extended define the time window used in
pool_surveys. Using these arguments one can choose between
two types of pooling:
If period_extended equals NA: Surveys
in the time window from last_date to
last_date - period will be considered for each polling
agency.
If period_extended does not equal NA:
Same as 1. Additionally however, surveys in the time window from
last_date - period to
last_date - period_extended will also be considered for
each polling agency, but only after downweighting them by halving their
true sample size.
The latter option can be especially useful if opinion polls for a
specific election are only published very rarely. As default,
pool_surveys uses a time window starting from the current
date and going 14 days back, not making use of
period_extended.
# Scrape current surveys from the major polling agencies in Germany
# surveys <- get_surveys()
# As the web connection is sometimes a bit unstable we here use the sample data set of pre-scraped surveys
surveys <- coalitions::surveys_sample
surveys## # A tibble: 3 × 2
## # Groups: pollster [3]
## pollster surveys
## <chr> <list>
## 1 emnid <tibble [3 × 5]>
## 2 forsa <tibble [3 × 5]>
## 3 infratest <tibble [3 × 5]>
# Obtain the pooled sample for today, based on the last 14 days
last_date <- surveys %>% tidyr::unnest("surveys") %>% tidyr::unnest("survey") %>% pull(date) %>% max()
pool <- pool_surveys(surveys, last_date = last_date)
pool %>% select(-start, -end)## # A tibble: 7 × 6
## pollster date respondents party percent votes
## <chr> <date> <dbl> <chr> <dbl> <dbl>
## 1 pooled 2017-09-02 2931. afd 9.16 269.
## 2 pooled 2017-09-02 2931. cdu 37.8 1107.
## 3 pooled 2017-09-02 2931. fdp 8 234.
## 4 pooled 2017-09-02 2931. greens 7.57 222.
## 5 pooled 2017-09-02 2931. left 9 264.
## 6 pooled 2017-09-02 2931. others 4.76 139.
## 7 pooled 2017-09-02 2931. spd 23.8 696.