vignettes/handling_incidence_objects.Rmd
handling_incidence_objects.Rmd
incidence()
objects are easy to work with, and we providing helper functions for both manipulating and accessing the underlying data and attributes. As incidence()
objects are subclasses of tibbles they also have good integration with tidyverse verbs.
regroup()
Sometimes you may find you’ve created a grouped incidence but now want to change the internal grouping. Assuming you are after a subset of the grouping already generated, then you can use to regroup()
function to get the desired aggregation:
library(outbreaks)
library(dplyr)
library(incidence2)
# load data
dat <- ebola_sim_clean$linelist
# generate the incidence object with 3 groups
inci <- incidence(dat, date_of_onset, groups = c(gender, hospital, outcome), interval = "week")
inci
#> An incidence object: 1,448 x 5
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index gender hospital outcome count
#> <yrwk> <fct> <fct> <fct> <int>
#> 1 2014-W15 f Military Hospital <NA> 1
#> 2 2014-W16 m Connaught Hospital <NA> 1
#> 3 2014-W17 f <NA> <NA> 1
#> 4 2014-W17 f <NA> Death 1
#> 5 2014-W17 f other Recover 2
#> 6 2014-W17 m other Recover 1
#> 7 2014-W18 f <NA> Recover 1
#> 8 2014-W18 f Connaught Hospital Recover 1
#> 9 2014-W18 f Princess Christian Maternity Hospital (PCMH) Death 1
#> 10 2014-W18 f Rokupa Hospital Recover 1
#> # … with 1,438 more rows
# regroup to just two groups
inci %>% regroup(c(gender, outcome))
#> An incidence object: 320 x 4
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index gender outcome count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W15 f <NA> 1
#> 2 2014-W16 m <NA> 1
#> 3 2014-W17 f <NA> 1
#> 4 2014-W17 f Death 1
#> 5 2014-W17 f Recover 2
#> 6 2014-W17 m Recover 1
#> 7 2014-W18 f Death 1
#> 8 2014-W18 f Recover 3
#> 9 2014-W19 f <NA> 4
#> 10 2014-W19 f Death 2
#> # … with 310 more rows
# drop all groups
inci %>% regroup()
#> An incidence object: 56 x 2
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index count
#> <yrwk> <int>
#> 1 2014-W15 1
#> 2 2014-W16 1
#> 3 2014-W17 5
#> 4 2014-W18 4
#> 5 2014-W19 12
#> 6 2014-W20 17
#> 7 2014-W21 15
#> 8 2014-W22 19
#> 9 2014-W23 23
#> 10 2014-W24 21
#> # … with 46 more rows
keep_first()
and keep_last()
Once your data is grouped by date, you may want to select the first or last few entries based on a particular date grouping using keep_first()
and keep_last()
:
inci %>% keep_first(3)
#> An incidence object: 6 x 5
#> date range: [2014-W15] to [2014-W17]
#> cases: 7
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index gender hospital outcome count
#> <yrwk> <fct> <fct> <fct> <int>
#> 1 2014-W15 f Military Hospital <NA> 1
#> 2 2014-W16 m Connaught Hospital <NA> 1
#> 3 2014-W17 f <NA> <NA> 1
#> 4 2014-W17 f <NA> Death 1
#> 5 2014-W17 f other Recover 2
#> 6 2014-W17 m other Recover 1
inci %>% keep_last(3)
#> An incidence object: 63 x 5
#> date range: [2015-W16] to [2015-W18]
#> cases: 103
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index gender hospital outcome count
#> <yrwk> <fct> <fct> <fct> <int>
#> 1 2015-W16 f <NA> <NA> 1
#> 2 2015-W16 f <NA> Death 7
#> 3 2015-W16 f <NA> Recover 1
#> 4 2015-W16 f Connaught Hospital <NA> 1
#> 5 2015-W16 f Connaught Hospital Death 5
#> 6 2015-W16 f Connaught Hospital Recover 3
#> 7 2015-W16 f Military Hospital Recover 1
#> 8 2015-W16 f other <NA> 1
#> 9 2015-W16 f other Death 2
#> 10 2015-W16 f other Recover 1
#> # … with 53 more rows
incidence2 has been written with tidyverse compatibility (in particular dplyr) at the forefront of the design choices we have made. By this we mean that if an operation from dplyr is applied to an incidence object then as long as the invariants of the object are preserved (i.e. groups, interval and uniqueness of rows) then the object returned will be an incidence object. If the invariants are not preserved then a tibble will be returned instead.
library(dplyr)
# create incidence object
inci <- incidence(dat, date_of_onset, interval = "week", groups = c(hospital, gender))
# filtering preserves class
inci %>% filter(gender == "f", hospital == "Rokupa Hospital")
#> An incidence object: 48 x 4
#> date range: [2014-W18] to [2015-W18]
#> cases: 210
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index hospital gender count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W18 Rokupa Hospital f 1
#> 2 2014-W20 Rokupa Hospital f 1
#> 3 2014-W22 Rokupa Hospital f 1
#> 4 2014-W23 Rokupa Hospital f 1
#> 5 2014-W25 Rokupa Hospital f 1
#> 6 2014-W27 Rokupa Hospital f 1
#> 7 2014-W28 Rokupa Hospital f 4
#> 8 2014-W29 Rokupa Hospital f 2
#> 9 2014-W30 Rokupa Hospital f 1
#> 10 2014-W31 Rokupa Hospital f 1
#> # … with 38 more rows
# slice operations preserve class
inci %>% slice_sample(n = 10)
#> An incidence object: 10 x 4
#> date range: [2014-W19] to [2015-W18]
#> cases: 95
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index hospital gender count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W43 Connaught Hospital m 42
#> 2 2014-W20 Princess Christian Maternity Hospital (PCMH) m 1
#> 3 2014-W27 Military Hospital m 1
#> 4 2014-W35 Connaught Hospital f 24
#> 5 2014-W19 Rokupa Hospital m 1
#> 6 2014-W34 <NA> f 13
#> 7 2015-W14 Princess Christian Maternity Hospital (PCMH) m 1
#> 8 2014-W32 Military Hospital m 3
#> 9 2015-W18 Rokupa Hospital f 1
#> 10 2014-W23 <NA> f 8
inci %>% slice(1, 5, 10)
#> An incidence object: 3 x 4
#> date range: [2014-W15] to [2014-W19]
#> cases: 3
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index hospital gender count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W15 Military Hospital f 1
#> 2 2014-W17 other m 1
#> 3 2014-W19 <NA> f 1
# mutate preserve class
inci %>% mutate(future = date_index + 999)
#> An incidence object: 601 x 5
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index hospital gender count future
#> <yrwk> <fct> <fct> <int> <yrwk>
#> 1 2014-W15 Military Hospital f 1 2033-W22
#> 2 2014-W16 Connaught Hospital m 1 2033-W23
#> 3 2014-W17 <NA> f 2 2033-W24
#> 4 2014-W17 other f 2 2033-W24
#> 5 2014-W17 other m 1 2033-W24
#> 6 2014-W18 <NA> f 1 2033-W25
#> 7 2014-W18 Connaught Hospital f 1 2033-W25
#> 8 2014-W18 Princess Christian Maternity Hospital (PCMH) f 1 2033-W25
#> 9 2014-W18 Rokupa Hospital f 1 2033-W25
#> 10 2014-W19 <NA> f 1 2033-W26
#> # … with 591 more rows
# rename preserve class
inci %>% rename(left_bin = date_index)
#> An incidence object: 601 x 4
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> left_bin hospital gender count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W15 Military Hospital f 1
#> 2 2014-W16 Connaught Hospital m 1
#> 3 2014-W17 <NA> f 2
#> 4 2014-W17 other f 2
#> 5 2014-W17 other m 1
#> 6 2014-W18 <NA> f 1
#> 7 2014-W18 Connaught Hospital f 1
#> 8 2014-W18 Princess Christian Maternity Hospital (PCMH) f 1
#> 9 2014-W18 Rokupa Hospital f 1
#> 10 2014-W19 <NA> f 1
#> # … with 591 more rows
# select returns a tibble unless all date, count and group variables are preserved
inci %>% select(-1)
#> # A tibble: 601 × 3
#> hospital gender count
#> <fct> <fct> <int>
#> 1 Military Hospital f 1
#> 2 Connaught Hospital m 1
#> 3 <NA> f 2
#> 4 other f 2
#> 5 other m 1
#> 6 <NA> f 1
#> 7 Connaught Hospital f 1
#> 8 Princess Christian Maternity Hospital (PCMH) f 1
#> 9 Rokupa Hospital f 1
#> 10 <NA> f 1
#> # … with 591 more rows
inci %>% select(everything())
#> An incidence object: 601 x 4
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index hospital gender count
#> <yrwk> <fct> <fct> <int>
#> 1 2014-W15 Military Hospital f 1
#> 2 2014-W16 Connaught Hospital m 1
#> 3 2014-W17 <NA> f 2
#> 4 2014-W17 other f 2
#> 5 2014-W17 other m 1
#> 6 2014-W18 <NA> f 1
#> 7 2014-W18 Connaught Hospital f 1
#> 8 2014-W18 Princess Christian Maternity Hospital (PCMH) f 1
#> 9 2014-W18 Rokupa Hospital f 1
#> 10 2014-W19 <NA> f 1
#> # … with 591 more rows
We provide multiple accessors to easily access information about an incidence()
objects structure:
get_count_names()
, get_dates_name()
, and get_group_names()
all return character vectors of the column names corresponding to the requested variables.get_n()
returns the number of observations.get_interval()
returns the interval of the object.get_timespan()
returns the number of days the object covers.