Generation creates a simulated distribution from specify()
.
In the context of confidence intervals, this is a bootstrap distribution
based on the result of specify()
. In the context of hypothesis testing,
this is a null distribution based on the result of specify()
and
hypothesize().
Learn more in vignette("infer")
.
generate(x, reps = 1, type = NULL, variables = !!response_expr(x), ...)
x | A data frame that can be coerced into a tibble. |
---|---|
reps | The number of resamples to generate. |
type | The method used to generate resamples of the observed
data reflecting the null hypothesis. Currently one of
|
variables | If |
... | Currently ignored. |
A tibble containing reps
generated datasets, indicated by the
replicate
column.
The type
argument determines the method used to create the null
distribution.
bootstrap
: A bootstrap sample will be drawn for each replicate,
where a sample of size equal to the input sample size is drawn (with
replacement) from the input sample data.
permute
: For each replicate, each input value will be randomly
reassigned (without replacement) to a new output value in the sample.
draw
: A value will be sampled from a theoretical distribution
with parameters specified in hypothesize()
for each replicate. This
option is currently only applicable for testing point estimates. This
generation type was previously called "simulate"
, which has been
superseded.
Other core functions:
calculate()
,
hypothesize()
,
specify()
# generate a null distribution by taking 200 bootstrap samples
gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
generate(reps = 200, type = "bootstrap")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 100,000 × 2
#> # Groups: replicate [200]
#> replicate hours
#> <int> <dbl>
#> 1 1 48.6
#> 2 1 38.6
#> 3 1 38.6
#> 4 1 8.62
#> 5 1 38.6
#> 6 1 38.6
#> 7 1 18.6
#> 8 1 38.6
#> 9 1 38.6
#> 10 1 58.6
#> # … with 99,990 more rows
# generate a null distribution for the independence of
# two variables by permuting their values 1000 times
gss %>%
specify(partyid ~ age) %>%
hypothesize(null = "independence") %>%
generate(reps = 200, type = "permute")
#> Dropping unused factor levels DK from the supplied response variable 'partyid'.
#> Response: partyid (factor)
#> Explanatory: age (numeric)
#> Null Hypothesis: independence
#> # A tibble: 100,000 × 3
#> # Groups: replicate [200]
#> partyid age replicate
#> <fct> <dbl> <int>
#> 1 rep 36 1
#> 2 ind 34 1
#> 3 dem 24 1
#> 4 dem 42 1
#> 5 ind 31 1
#> 6 dem 32 1
#> 7 ind 48 1
#> 8 rep 36 1
#> 9 ind 30 1
#> 10 ind 33 1
#> # … with 99,990 more rows
# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}