Compute the incidence of events

incidence(
  x,
  date_index,
  groups = NULL,
  interval = 1L,
  na_as_group = TRUE,
  counts = NULL,
  firstdate = NULL
)

Arguments

x

A data frame representing a linelist (or potentially a pre-aggregated dataset).

date_index

The time index(es) of the given data. This should be the name(s) corresponding to the desired date column(s) in x of class: integer, numeric, Date, POSIXct, POSIXlt, and character. (See Note about numeric and character formats). Multiple inputs only make sense when x is a linelist, and in this situation, to avoid ambiguity, the vector must be named. These names will be used for the resultant count columns.

groups

An optional vector giving the names of the groups of observations for which incidence should be grouped.

interval

An integer or character indicating the (fixed) size of the time interval used for computing the incidence; defaults to 1 day. This can also be a text string that corresponds to a valid date interval, e.g.

* (x) day(s)
* (x) weeks(s)
* (x) epiweeks(s)
* (x) isoweeks(s)
* (x) months(s)
* (x) quarter(s)
* (x) years(s)

More details can be found in the "Interval specification" and "Week intervals" sections below.

na_as_group

A logical value indicating if missing group values (NA) should treated as a separate category (TRUE) or removed from consideration (FALSE). Defaults to TRUE.

counts

The count variables of the given data. If NULL (default) the data is taken to be a linelist of individual observations.

firstdate

When the interval is numeric or in days/months and has a numeric prefix greater than 1, then you can optionally specify the date that you wish to anchor your intervals to begin from. If NULL (default) then the intervals will start at the minimum value contained in the date_index column. Note that the class of firstdate must be Date if the date_index column is Date, POSIXct, POSIXlt, or character and integer otherwise.

Value

An incidence2 object. This is a subclass of incidence_df and aggregated count of observations grouped according to the specified interval and, optionally, the given groups. By default it will contain the following columns:

  • date / date_index: If the default interval of 1 day is used then this will be the dates of the given observations and given the name "date", otherwise, this will be values obtained from the specified date grouping with column name "date_index" (See Interval specification below).

  • groups (if specified): Column(s) containing the categories of the given groups.

  • count (or name of count variables): The aggregated observation counts.

Note

Input data (date_index)

  • Decimal (numeric) dates: will be truncated.

  • Character dates should be in the unambiguous yyyy-mm-dd (ISO 8601) format. Any other format will trigger an error.

Interval specification (interval)

incidence() uses the grates package to generate date groupings. The grouping used depends on the value of interval. This can be specified as either an integer value or a more standard specification such as "day", "week", "month", "quarter" or "year". The format in this situation is similar to that used by seq.Date() where these values can optionally be preceded by a (positive or negative) integer and a space, or followed by "s". When no prefix is given:

When a prefix is provided (e.g. 2 weeks) the output is an object of class "period" (see as_period()). Note that for the values "month", "quarter" and "year" intervals are always chosen to start at the beginning of the calendar equivalent. If the input is an integer value the input is treated as if it was specified in days (i.e. 2 and 2 days) produce the same output.

The only interval values that do not produce these grouped classes are 1, 1L, "day" or "days" (both without prefix) are used. In this situation the returned object is of the standard "Date" class.

Week intervals

It is possible to construct incidence objects standardized to any day of the week. The default state is to use ISO 8601 definition of weeks, which start on Monday. You can specify the day of the week an incidence object should be standardised to by using the pattern "n W weeks" where "W" represents the weekday in an English or current locale and "n" represents the duration, but this can be ommitted. Below are examples of specifying weeks starting on different days assuming we had data that started on 2016-09-05, which is ISO week 36 of 2016:

  • interval = "2 monday weeks" (Monday 2016-09-05)

  • interval = "1 tue week" (Tuesday 2016-08-30)

  • interval = "1 Wed week" (Wednesday 2016-08-31)

  • interval = "1 Thursday week" (Thursday 2016-09-01)

  • interval = "1 F week" (Friday 2016-09-02)

  • interval = "1 Saturday week" (Saturday 2016-09-03)

  • interval = "Sunday week" (Sunday 2016-09-04)

It's also possible to use something like "3 weeks: Saturday"; In addition, there are keywords reserved for specific days of the week:

  • interval = "week", (Default, Monday)

  • interval = "ISOweek" (Monday)

  • interval = "EPIweek" (Sunday)

  • interval = "MMWRweek" (Sunday)

Examples

if (requireNamespace("outbreaks", quietly = TRUE)) { withAutoprint({ data(ebola_sim_clean, package = "outbreaks") dat <- ebola_sim_clean$linelist # daily incidence incidence(dat, date_of_onset) # weekly incidence incidence(dat, date_of_onset, interval = "week") # starting on a Monday incidence(dat, date_of_onset, interval = "isoweek") # starting on a Sunday incidence(dat, date_of_onset, interval = "epiweek") # group by gender incidence(dat, date_of_onset, interval = 7, groups = gender) # group by gender and hospital incidence(dat, date_of_onset, interval = "2 weeks", groups = c(gender, hospital)) }) }
#> > data(ebola_sim_clean, package = "outbreaks") #> > dat <- ebola_sim_clean$linelist #> > incidence(dat, date_of_onset) #> An incidence object: 367 x 2 #> date range: [2014-04-07] to [2015-04-30] #> cases: 5829 #> interval: 1 day #> #> date_index count #> <date> <int> #> 1 2014-04-07 1 #> 2 2014-04-15 1 #> 3 2014-04-21 2 #> 4 2014-04-25 1 #> 5 2014-04-26 1 #> 6 2014-04-27 1 #> 7 2014-05-01 2 #> 8 2014-05-03 1 #> 9 2014-05-04 1 #> 10 2014-05-05 1 #> # … with 357 more rows #> > incidence(dat, date_of_onset, interval = "week") #> An incidence object: 56 x 2 #> date range: [2014-W15] to [2015-W18] #> cases: 5829 #> interval: 1 (Monday) week #> #> date_index count #> <yrwk> <int> #> 1 2014-W15 1 #> 2 2014-W16 1 #> 3 2014-W17 5 #> 4 2014-W18 4 #> 5 2014-W19 12 #> 6 2014-W20 17 #> 7 2014-W21 15 #> 8 2014-W22 19 #> 9 2014-W23 23 #> 10 2014-W24 21 #> # … with 46 more rows #> > incidence(dat, date_of_onset, interval = "isoweek") #> An incidence object: 56 x 2 #> date range: [2014-W15] to [2015-W18] #> cases: 5829 #> interval: 1 (Monday) week #> #> date_index count #> <yrwk> <int> #> 1 2014-W15 1 #> 2 2014-W16 1 #> 3 2014-W17 5 #> 4 2014-W18 4 #> 5 2014-W19 12 #> 6 2014-W20 17 #> 7 2014-W21 15 #> 8 2014-W22 19 #> 9 2014-W23 23 #> 10 2014-W24 21 #> # … with 46 more rows #> > incidence(dat, date_of_onset, interval = "epiweek") #> An incidence object: 56 x 2 #> date range: [2014-W15] to [2015-W17] #> cases: 5829 #> interval: 1 (Sunday) week #> #> date_index count #> <yrwk> <int> #> 1 2014-W15 1 #> 2 2014-W16 1 #> 3 2014-W17 4 #> 4 2014-W18 4 #> 5 2014-W19 12 #> 6 2014-W20 15 #> 7 2014-W21 15 #> 8 2014-W22 22 #> 9 2014-W23 22 #> 10 2014-W24 17 #> # … with 46 more rows #> > incidence(dat, date_of_onset, interval = 7, groups = gender) #> An incidence object: 109 x 3 #> date range: [2014-04-07 to 2014-04-13] to [2015-04-27 to 2015-05-03] #> cases: 5829 #> interval: 7 days #> #> date_index gender count #> <period> <fct> <int> #> 1 2014-04-07 to 2014-04-13 f 1 #> 2 2014-04-14 to 2014-04-20 m 1 #> 3 2014-04-21 to 2014-04-27 f 4 #> 4 2014-04-21 to 2014-04-27 m 1 #> 5 2014-04-28 to 2014-05-04 f 4 #> 6 2014-05-05 to 2014-05-11 f 9 #> 7 2014-05-05 to 2014-05-11 m 3 #> 8 2014-05-12 to 2014-05-18 f 7 #> 9 2014-05-12 to 2014-05-18 m 10 #> 10 2014-05-19 to 2014-05-25 f 8 #> # … with 99 more rows #> > incidence(dat, date_of_onset, interval = "2 weeks", groups = c(gender, #> + hospital)) #> An incidence object: 316 x 4 #> date range: [2014-04-07 to 2014-04-20] to [2015-04-20 to 2015-05-03] #> cases: 5829 #> interval: 14 days #> #> date_index gender hospital count #> <period> <fct> <fct> <int> #> 1 2014-04-07 to 2014-04-20 f Military Hospital 1 #> 2 2014-04-07 to 2014-04-20 m Connaught Hospital 1 #> 3 2014-04-21 to 2014-05-04 f NA 3 #> 4 2014-04-21 to 2014-05-04 f Connaught Hospital 1 #> 5 2014-04-21 to 2014-05-04 f other 2 #> 6 2014-04-21 to 2014-05-04 f Princess Christian Maternity Hospital … 1 #> 7 2014-04-21 to 2014-05-04 f Rokupa Hospital 1 #> 8 2014-04-21 to 2014-05-04 m other 1 #> 9 2014-05-05 to 2014-05-18 f NA 4 #> 10 2014-05-05 to 2014-05-18 f Connaught Hospital 3 #> # … with 306 more rows
# use of first_date dat <- data.frame(dates = Sys.Date() + sample(-3:10, 10, replace = TRUE)) incidence(dat, dates, interval = "week", firstdate = Sys.Date() + 1)
#> An incidence object: 2 x 2 #> date range: [2021-W34] to [2021-W35] #> cases: 9 #> interval: 1 (Monday) week #> #> date_index count #> <yrwk> <int> #> 1 2021-W34 7 #> 2 2021-W35 2