sim_apply()
applies a function that produces quantities of
interest to each set of simulated coefficients produced by sim()
; these
calculated quantities form the posterior sampling distribution for the
quantities of interest. Capabilities are available for parallelization.
Arguments
- sim
a
clarify_sim
object; the output of a call tosim()
ormisim()
.- FUN
a function to be applied to each set of simulated coefficients. See Details.
- verbose
logical
; whether to display a text progress bar indicating progress and estimated time remaining for the procedure. Default isTRUE
.- cl
a cluster object created by
parallel::makeCluster()
, or an integer to indicate the number of child-processes (integer values are ignored on Windows) for parallel evaluations. Seepbapply::pblapply()
for details. IfNULL
, no parallelization will take place.- ...
optional arguments passed to
FUN
.
Value
A clarify_est
object, which is a matrix with a column for each
estimated quantity and a row for each simulation. The original estimates
(FUN
applied to the original coefficients or model fit object) are stored
in the attribute "original"
. The "sim_hash"
attribute contains the
simulation hash produced by sim()
.
Details
sim_apply()
applies a function, FUN
, to each set of simulated
coefficients, similar to apply()
. This function should return a numeric
vector containing one or more estimated quantities. This should be a named
vector to more easily keep track of the meaning of each estimated quantity.
Care should be taken to ensure that the returned vector is the same length
each time FUN
is called. NA
s are allowed in the output but should be
avoided if possible.
The arguments to FUN
can be specified in a few ways. If FUN
has an
argument called coefs
, a simulated set of coefficients will be passed to
this argument, and FUN
should compute and return a quantity based on the
coefficients (e.g., the difference between two coefficients if one wants to
test whether two coefficients are equal). If FUN
has an argument called
fit
, a model fit object of the same type as the one originally supplied
to sim()
(e.g., an lm
or glm
object) will be passed to this argument,
where the coefficients of the fit object have been replaced by the
simulated coefficients generated by sim()
, and FUN
should compute and
return a quantity based on the model fit (e.g., a computation based on the
output of predict()
). If neither coefs
nor fit
are the names of
arguments to FUN
, the model fit object with replaced coefficients will be
supplied to the first argument of FUN
.
When custom coefficients are supplied to sim()
, i.e., when the coefs
argument to sim()
is not left at its default value, FUN
must accept a
coefs
argument and a warning will be thrown if it accepts a fit
argument. This is because sim_apply()
does not know how to reconstruct
the original fit object with the new coefficients inserted. The quantities
computed by sim_apply()
must therefore be computed directly from the
coefficients.
If FUN
is not supplied at all, the simulated values of the coefficients will be returned in the output with a warning. Set FUN
to NULL
or verbose
to FALSE
to suppress this warning.
sim_apply()
with multiply imputed data
When using misim()
and sim_apply()
with multiply imputed data, the
coefficients are supplied to the model fit corresponding to the imputation
identifier associated with each set of coefficients, which means if FUN
uses a dataset extracted from a model (e.g., using insight::get_data()
), it will do so from the model fit in
the corresponding imputation.
The original estimates (see Value below) are computed as the mean of the
estimates across the imputations using the original coefficients averaged
across imputations. That is, first, the coefficients estimated in the
models in the imputed datasets are combined to form a single set of pooled
coefficients; then, for each imputation, the quantities of interest are
computed using the pooled coefficients; finally, the mean of the resulting
estimates across the imputations are taken as the "original" estimates.
Note this procedure is only valid for quantities with symmetric sampling
distributions, which excludes quantities like risk ratios and odds ratios,
but includes log risk ratios and log odds ratios. The desired quantities
can be transformed from their log versions using
transform()
.
See also
sim()
for generating the simulated coefficientssummary.clarify_est()
for computing p-values and confidence intervals for the estimated quantitiesplot.clarify_est()
for plotting estimated quantities and their simulated posterior sampling distribution.
Examples
data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat + age + race + nodegree + re74,
data = lalonde)
coef(fit)
#> (Intercept) treat age racehispan racewhite
#> 4596.7282858 1719.6323705 -6.5455292 1605.6558582 1338.9751895
#> nodegree re74
#> -1174.9861400 0.3855893
set.seed(123)
s <- sim(fit, n = 500)
# Function to compare predicted values for two units
# using `fit` argument
sim_fun <- function(fit) {
pred1 <- unname(predict(fit, newdata = lalonde[1,]))
pred2 <- unname(predict(fit, newdata = lalonde[2,]))
c(pred1 = pred1, pred2 = pred2)
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Add difference between predicted values as
# additional quantity
est <- transform(est, `diff 1-2` = pred1 - pred2)
# Examine estimates and confidence intervals
summary(est)
#> Estimate 2.5 % 97.5 %
#> pred1 4899 3527 6149
#> pred2 6603 4471 8706
#> diff 1-2 -1704 -3823 433
# Function to compare coefficients using `coefs`
# argument
sim_fun <- function(coefs) {
setNames(coefs["racewhite"] - coefs["racehispan"],
"wh - his")
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Examine estimates and confidence intervals
summary(est)
#> Estimate 2.5 % 97.5 %
#> wh - his -267 -1985 1678
# Another way to do the above:
est <- sim_apply(s, FUN = NULL)
est <- transform(est,
`wh - his` = `racewhite` - `racehispan`)
summary(est, parm = "wh - his")
#> Estimate 2.5 % 97.5 %
#> wh - his -267 -1985 1678