`sim_apply()`

applies a function that produces quantities of
interest to each set of simulated coefficients produced by `sim()`

; these
calculated quantities form the posterior sampling distribution for the
quantities of interest. Capabilities are available for parallelization.

## Arguments

- sim
a

`clarify_sim`

object; the output of a call to`sim()`

or`misim()`

.- FUN
a function to be applied to each set of simulated coefficients. See Details.

- verbose
`logical`

; whether to display a text progress bar indicating progress and estimated time remaining for the procedure. Default is`TRUE`

.- cl
a cluster object created by

`parallel::makeCluster()`

, or an integer to indicate the number of child-processes (integer values are ignored on Windows) for parallel evaluations. See`pbapply::pblapply()`

for details. If`NULL`

, no parallelization will take place.- ...
optional arguments passed to

`FUN`

.

## Value

A `clarify_est`

object, which is a matrix with a column for each
estimated quantity and a row for each simulation. The original estimates
(`FUN`

applied to the original coefficients or model fit object) are stored
in the attribute `"original"`

. The `"sim_hash"`

attributes contained the
simulation hash produced by `sim()`

.

## Details

`sim_apply()`

applies a function, `FUN`

, to each set of simulated
coefficients, similar to `apply()`

. This function should return a numeric
vector containing one or more estimated quantities. This should be a named
vector to more easily keep track of the meaning of each estimated quantity.
Care should be taken to ensure that the returned vector is the same length
each time `FUN`

is called. `NA`

s are allowed in the output but should be
avoided if possible.

The arguments to `FUN`

can be specified in a few ways. If `FUN`

has an
argument called `coefs`

, a simulated set of coefficients will be passed to
this argument, and `FUN`

should compute and return a quantity based on the
coefficients (e.g., the difference between two coefficients if one wants to
test whether two coefficients are equal). If `FUN`

has an argument called
`fit`

, a model fit object of the same type as the one originally supplied
to `sim()`

(e.g., an `lm`

or `glm`

object) will be passed to this argument,
where the coefficients of the fit object have been replaced by the
simulated coefficients generated by `sim()`

, and `FUN`

should compute and
return a quantity based on the model fit (e.g., a computation based on the
output of `predict()`

). If neither `coefs`

nor `fit`

are the names of
arguments to `FUN`

, the model fit object with replaced coefficients will be
supplied to the first argument of `FUN`

.

When custom coefficients are supplied to `sim()`

, i.e., when the `coefs`

argument to `sim()`

is not left at its default value, `FUN`

must accept a
`coefs`

argument and a warning will be thrown if it accepts a `fit`

argument. This is because `sim_apply()`

does not know how to reconstruct
the original fit object with the new coefficients inserted. The quantities
computed by `sim_apply()`

must therefore be computed directly from the
coefficients.

`sim_apply()`

with multiply imputed data

When using `misim()`

and `sim_apply()`

with multiply imputed data, the
coefficients are supplied to the model fit corresponding to the imputation
identifier associated with each set of coefficients, which means if `FUN`

uses a dataset extracted from a model, it will do so from the model fit in
the corresponding imputation.

The original estimates (see Value below) are computed as the mean of the
estimates across the imputations using the original coefficients averaged
across imputations. That is, first, the coefficients estimated in the
models in the imputed datasets are combined to form a single set of pooled
coefficients; then, for each imputation, the quantities of interest are
computed using the pooled coefficients; finally, the mean of the resulting
estimates across the imputations are taken as the "original" estimates.
Note this procedure is only valid for quantities with symmetric sampling
distributions, which excludes quantities like risk ratios and odds ratios,
but includes log risk ratios and log odds ratios. The desired quantities
can be transformed from their log versions using
`transform()`

.

## See also

`sim()`

for generating the simulated coefficients`summary.clarify_est()`

for computing p-values and confidence intervals for the estimated quantities`plot.clarify_est()`

for plotting estimated quantities and their simulated posterior sampling distribution.

## Examples

```
data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat + age + race + nodegree + re74,
data = lalonde)
coef(fit)
#> (Intercept) treat age racehispan racewhite
#> 4596.7282858 1719.6323705 -6.5455292 1605.6558582 1338.9751895
#> nodegree re74
#> -1174.9861400 0.3855893
set.seed(123)
s <- sim(fit, n = 100)
# Function to compare predicted values for two units
# using `fit` argument
sim_fun <- function(fit) {
pred1 <- unname(predict(fit, newdata = lalonde[1,]))
pred2 <- unname(predict(fit, newdata = lalonde[2,]))
c(pred1 = pred1, pred2 = pred2)
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Add difference between predicted values as
# additional quantity
est <- transform(est, `diff 1-2` = pred1 - pred2)
# Examine estimates and confidence intervals
summary(est)
#> Estimate 2.5 % 97.5 %
#> pred1 4899 3692 6116
#> pred2 6603 4972 8739
#> diff 1-2 -1704 -3802 297
# Function to compare coefficients using `coefs`
# argument
sim_fun <- function(coefs) {
c(`wh - his` = coefs["racewhite"] - coefs["racehispan"])
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Examine estimates and confidence intervals
summary(est)
#> Estimate 2.5 % 97.5 %
#> wh - his.racewhite -267 -1745 1740
```