incidence()
calculates event the incidence of different events across
specified time periods and groupings.
Usage
incidence(
x,
date_index,
groups = NULL,
counts = NULL,
count_names_to = "count_variable",
count_values_to = "count",
date_names_to = "date_index",
rm_na_dates = TRUE,
interval = NULL,
offset = NULL,
...
)
Arguments
- x
A data frame object representing a linelist or pre-aggregated dataset.
- date_index
[character]
The time index(es) of the given data.
This should be the name(s) corresponding to the desired date column(s) in x.
A name vector can be used for convenient relabelling of the resultant output.
Multiple indices only make sense when
x
is a linelist.- groups
[character]
An optional vector giving the names of the groups of observations for which incidence should be grouped.
- counts
[character]
The count variables of the given data. If NULL (default) the data is taken to be a linelist of individual observations.
- count_names_to
[character]
The column to create which will store the
counts
column names provided thatcounts
is not NULL.- count_values_to
[character]
The name of the column to store the resultant count values in.
- date_names_to
[character]
The name of the column to store the date variables in.
- rm_na_dates
[logical]
Should
NA
dates be removed prior to aggregation?- interval
An optional scalar integer or string indicating the (fixed) size of the desired time interval you wish to use for for computing the incidence.
Defaults to NULL in which case the date_index columns are left unchanged.
Numeric values are coerced to integer and treated as a number of days to group.
Text strings can be one of:
* day or daily * week(s) or weekly * epiweek(s) * isoweek(s) * month(s) or monthly * yearmonth(s) * quarter(s) or quarterly * yearquarter(s) * year(s) or yearly
More details can be found in the "Interval specification" section.
- offset
Only applicable when
interval
is not NULL.An optional scalar integer or date indicating the value you wish to start counting periods from relative to the Unix Epoch:
Default value of NULL corresponds to 0L.
For other integer values this is stored scaled by
n
(offset <- as.integer(offset) %% n
).For date values this is first converted to an integer offset (
offset <- floor(as.numeric(offset))
) and then scaled vian
as above.
- ...
Not currently used.
Details
<incidence2>
objects are a sub class of data frame with some
additional invariants. That is, an <incidence2>
object must:
have one column representing the date index (this does not need to be a
date
object but must have an inherent ordering over time);have one column representing the count variable (i.e. what is being counted) and one variable representing the associated count;
have zero or more columns representing groups;
not have duplicated rows with regards to the date and group variables.
Interval specification
Where interval
is specified, incidence()
, predominantly uses the
grates
package to generate
appropriate date groupings. The grouping used depends on the value of
interval
. This can be specified as either an integer value or a string
corresponding to one of the classes:
integer values:
<grates_period>
object, grouped by the specified number of days.day, daily:
<Date>
objects.week(s), weekly, isoweek:
<grates_isoweek>
objects.epiweek(s):
<grates_epiweek>
objects.month(s), monthly, yearmonth:
<grates_yearmonth>
objects.quarter(s), quarterly, yearquarter:
<grates_yearquarter>
objects.year(s) and yearly:
<grates_year>
objects.
For "day" or "daily" interval, we provide a thin wrapper around as.Date()
that ensures the underlying data are whole numbers and that time zones are
respected. Note that additional arguments are not forwarded to as.Date()
so for greater flexibility users are advised to modifying your input prior to
calling incidence()
.
Examples
data.table::setDTthreads(2)
if (requireNamespace("outbreaks", quietly = TRUE)) {
withAutoprint({
data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist
incidence(dat, "date_of_onset")
incidence(dat, "date_of_onset", groups = c("gender", "hospital"))
})
}
#> > data(ebola_sim_clean, package = "outbreaks")
#> > dat <- ebola_sim_clean$linelist
#> > incidence(dat, "date_of_onset")
#> # incidence: 367 x 3
#> # count vars: date_of_onset
#> date_index count_variable count
#> * <date> <chr> <int>
#> 1 2014-04-07 date_of_onset 1
#> 2 2014-04-15 date_of_onset 1
#> 3 2014-04-21 date_of_onset 2
#> 4 2014-04-25 date_of_onset 1
#> 5 2014-04-26 date_of_onset 1
#> 6 2014-04-27 date_of_onset 1
#> 7 2014-05-01 date_of_onset 2
#> 8 2014-05-03 date_of_onset 1
#> 9 2014-05-04 date_of_onset 1
#> 10 2014-05-05 date_of_onset 1
#> # ℹ 357 more rows
#> > incidence(dat, "date_of_onset", groups = c("gender", "hospital"))
#> # incidence: 2,535 x 5
#> # count vars: date_of_onset
#> # groups: gender, hospital
#> date_index gender hospital count_variable count
#> * <date> <fct> <fct> <chr> <int>
#> 1 2014-04-07 f Military Hospital date_of_onset 1
#> 2 2014-04-15 m Connaught Hospital date_of_onset 1
#> 3 2014-04-21 f other date_of_onset 1
#> 4 2014-04-21 m other date_of_onset 1
#> 5 2014-04-25 f NA date_of_onset 1
#> 6 2014-04-26 f other date_of_onset 1
#> 7 2014-04-27 f NA date_of_onset 1
#> 8 2014-05-01 f Princess Christian Maternity Hospital… date_of_onset 1
#> 9 2014-05-01 f Rokupa Hospital date_of_onset 1
#> 10 2014-05-03 f Connaught Hospital date_of_onset 1
#> # ℹ 2,525 more rows