Skip to contents

sim() simulates model parameters from a multivariate normal or t distribution that are then used by sim_apply() to calculate quantities of interest.

Usage

sim(fit, n = 1000, vcov = NULL, coefs = NULL, dist = NULL)

Arguments

fit

a model fit, such as the output of a call to lm() or glm(). Can be left unspecified if coefs and vcov are not functions.

n

the number of simulations to run; default is 1000. More is always better but resulting calculations will take longer.

vcov

either a square covariance matrix of the coefficient covariance estimates or a function to use to extract it from fit. By default, uses stats::vcov() or insight::get_varcov() if that doesn't work.

coefs

either a vector of coefficient estimates or a function to use to extract it from fit. By default, uses stats::coef() or insight::get_parameters() if that doesn't work.

dist

a string containing the name of the multivariate distribution to use to draw simulated coefficients. Should be one of "normal" (multivariate normal distribution) or "t({#})" (multivariate t distribution), where {#} corresponds to the desired degrees of freedom (e.g., "t(100)"). If NULL, the right distribution to use will be determined based on heuristics; see Details.

Value

A clarify_sim object, which has the following components:

sim.coefs

a matrix containing the simulated coefficients with a column for each coefficient and a row for each simulation

coefs

the original coefficients extracted from fit or supplied to coefs.

vcov

the covariance matrix of the coefficients extracted from fit or supplied to vcov

fit

the original model fit supplied to fit

The "dist" attribute contains "normal" if the coefficients were sampled from a multivariate normal distribution and "t(df)" if sampled from a multivariate t distribution. The "clarify_hash" attribute contains a unique hash generated by rlang::hash().

Details

When dist is NULL, sim() samples from a multivariate normal or t distribution depending on the degrees of freedom extracted from insight::get_df(., type = "wald"). If Inf, a normal distribution will be used; otherwise, a t-distribution with the returned degrees of freedom will be used. Models not supported by insight will use a normal distribution.

When a multivariate normal is used, it is sampled from with means equal to the estimated coefficients and the parameter covariance matrix as the covariance matrix using mvnfast::rmvn(). When a multivariate t distribution is used, it is sampled from with means equal to the estimated coefficients and scaling matrix equal to cov*(df - 2)/df, where cov is the parameter covariance matrix and df is the residual degrees of freedom for the model, using mvnfast::rmvt().

See also

  • misim() for simulating model coefficients after multiple imputation

  • sim_apply() for applying a function to each set of simulated coefficients

  • sim_ame() for computing average marginal effects in each simulation draw

  • sim_setx() for computing marginal predictions and first differences at typical values in each simulation draw

  • sim_adrf() for computing average dose-response functions in each simulation draw

Examples


data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat * (age + race + nodegree + re74), data = lalonde)

# Simulate coefficients
s <- sim(fit)
s
#> A `clarify_sim` object
#>  - 12 coefficients, 1000 simulated values
#>  - sampled distribution: multivariate t(602)
#>  - original fitting function call:
#> 
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)

## Could also use a robust covariance matrix, e.g.,
s <- sim(fit, vcov = "HC3")

# Simulated coefficients assuming a normal distribution
# for coefficients; default for `lm` objects is a t-
# distribution
s <- sim(fit, dist = "normal")
s
#> A `clarify_sim` object
#>  - 12 coefficients, 1000 simulated values
#>  - sampled distribution: multivariate normal
#>  - original fitting function call:
#> 
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)