Simulate model parameters

sim() simulates model parameters from a multivariate normal or t distribution that are then used by sim_apply() to calculate quantities of interest.

Usage

sim(fit, n = 1000, vcov = NULL, coefs = NULL, dist = NULL)

Arguments

fit: a model fit, such as the output of a call to lm() or glm(). Can be left unspecified if coefs and vcov are not functions.
n: the number of simulations to run; default is 1000. More is always better but resulting calculations will take longer.
vcov: either a square covariance matrix of the coefficient covariance estimates or a function to use to extract it from fit. By default, uses stats::vcov() or insight::get_varcov() if that doesn't work.
coefs: either a vector of coefficient estimates or a function to use to extract it from fit. By default, uses stats::coef() or insight::get_parameters() if that doesn't work.
dist: a string containing the name of the multivariate distribution to use to draw simulated coefficients. Should be one of "normal" (multivariate normal distribution) or "t({#})" (multivariate t distribution), where {#} corresponds to the desired degrees of freedom (e.g., "t(100)"). If NULL, the right distribution to use will be determined based on heuristics; see Details.

Value

A clarify_sim object, which has the following components:

sim.coefs: a matrix containing the simulated coefficients with a column for each coefficient and a row for each simulation
coefs: the original coefficients extracted from fit or supplied to coefs.
vcov: the covariance matrix of the coefficients extracted from fit or supplied to vcov
fit: the original model fit supplied to fit

The "dist" attribute contains "normal" if the coefficients were sampled from a multivariate normal distribution and "t(df)" if sampled from a multivariate t distribution. The "clarify_hash" attribute contains a unique hash generated by rlang::hash().

Details

When dist is NULL, sim() samples from a multivariate normal or t distribution depending on the degrees of freedom extracted from insight::get_df(., type = "wald"). If Inf, a normal distribution will be used; otherwise, a t-distribution with the returned degrees of freedom will be used. Models not supported by insight will use a normal distribution.

When a multivariate normal is used, it is sampled from with means equal to the estimated coefficients and the parameter covariance matrix as the covariance matrix using mvnfast::rmvn(). When a multivariate t distribution is used, it is sampled from with means equal to the estimated coefficients and scaling matrix equal to cov*(df - 2)/df, where cov is the parameter covariance matrix and df is the residual degrees of freedom for the model, using mvnfast::rmvt().

Examples


data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat * (age + race + nodegree + re74), data = lalonde)

# Simulate coefficients
s <- sim(fit)
s
#> A `clarify_sim` object
#>  - 12 coefficients, 1000 simulated values
#>  - sampled distribution: multivariate t(602)
#>  - original fitting function call:
#> 
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)

## Could also use a robust covariance matrix, e.g.,
s <- sim(fit, vcov = "HC3")

# Simulated coefficients assuming a normal distribution
# for coefficients; default for `lm` objects is a t-
# distribution
s <- sim(fit, dist = "normal")
s
#> A `clarify_sim` object
#>  - 12 coefficients, 1000 simulated values
#>  - sampled distribution: multivariate normal
#>  - original fitting function call:
#> 
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)

Usage

Arguments

Value

Details

See also

Examples