sim()
simulates model parameters from a multivariate normal or t distribution that are then used by sim_apply()
to calculate quantities of interest.
Arguments
- fit
a model fit, such as the output of a call to
lm()
orglm()
. Can be left unspecified ifcoefs
andvcov
are not functions.- n
the number of simulations to run; default is 1000. More is always better but resulting calculations will take longer.
- vcov
either a square covariance matrix of the coefficient covariance estimates or a function to use to extract it from
fit
. By default, usesstats::vcov()
orinsight::get_varcov()
if that doesn't work.- coefs
either a vector of coefficient estimates or a function to use to extract it from
fit
. By default, usesstats::coef()
orinsight::get_parameters()
if that doesn't work.- dist
a string containing the name of the multivariate distribution to use to draw simulated coefficients. Should be one of
"normal"
(multivariate normal distribution) or"t({#})"
(multivariate t distribution), where{#}
corresponds to the desired degrees of freedom (e.g.,"t(100)"
). IfNULL
, the right distribution to use will be determined based on heuristics; see Details.
Value
A clarify_sim
object, which has the following components:
- sim.coefs
a matrix containing the simulated coefficients with a column for each coefficient and a row for each simulation
- coefs
the original coefficients extracted from
fit
or supplied tocoefs
.- vcov
the covariance matrix of the coefficients extracted from
fit
or supplied tovcov
- fit
the original model fit supplied to
fit
The "dist"
attribute contains "normal"
if the coefficients were sampled from a multivariate normal distribution and "t(df)"
if sampled from a multivariate t distribution. The "clarify_hash"
attribute contains a unique hash generated by rlang::hash()
.
Details
When dist
is NULL
, sim()
samples from a multivariate normal or t distribution depending on the degrees of freedom extracted from insight::get_df(., type = "wald")
. If Inf
, a normal distribution will be used; otherwise, a t-distribution with the returned degrees of freedom will be used. Models not supported by insight
will use a normal distribution.
When a multivariate normal is used, it is sampled from with means equal to the estimated coefficients and the parameter covariance matrix as the covariance matrix using mvnfast::rmvn()
. When a multivariate t distribution is used, it is sampled from with means equal to the estimated coefficients and scaling matrix equal to cov*(df - 2)/df
, where cov
is the parameter covariance matrix and df
is the residual degrees of freedom for the model, using mvnfast::rmvt()
.
See also
misim()
for simulating model coefficients after multiple imputationsim_apply()
for applying a function to each set of simulated coefficientssim_ame()
for computing average marginal effects in each simulation drawsim_setx()
for computing marginal predictions and first differences at typical values in each simulation drawsim_adrf()
for computing average dose-response functions in each simulation draw
Examples
data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat * (age + race + nodegree + re74), data = lalonde)
# Simulate coefficients
s <- sim(fit)
s
#> A `clarify_sim` object
#> - 12 coefficients, 1000 simulated values
#> - sampled distribution: multivariate t(602)
#> - original fitting function call:
#>
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)
## Could also use a robust covariance matrix, e.g.,
s <- sim(fit, vcov = "HC3")
# Simulated coefficients assuming a normal distribution
# for coefficients; default for `lm` objects is a t-
# distribution
s <- sim(fit, dist = "normal")
s
#> A `clarify_sim` object
#> - 12 coefficients, 1000 simulated values
#> - sampled distribution: multivariate normal
#> - original fitting function call:
#>
#> lm(formula = re78 ~ treat * (age + race + nodegree + re74), data = lalonde)