Introduction
In this document, we demonstrate some common uses of
Zelig
(Imai, King, and Lau
2008) and how the same tasks can be performed using
clarify
. We’ll include examples for computing predictions
at representative values (i.e., setx()
and
sim()
in Zelig
), the rare-events logit model,
estimating the average treatment effect (ATT) after matching, and
combining estimates after multiple imputation.
The usual workflow in Zelig
is to fit a model using
zelig()
, specify quantities of interest to simulate using
setx()
on the zelig()
output, and then
simulate those quantities using sim()
. clarify
uses a similar approach, except that the model is fit outside
clarify
using functions in a different R package. In
addition, clarify
’s sim_apply()
allows for the
computation of any arbitrary quantity of interest. Unlike
Zelig
, clarify
follows the recommendations of
Rainey (2023) to use the estimates
computed from the original model coefficients rather than the average of
the simulated draws. We’ll demonstrate how to replicate a standard
Zelig
analysis using clarify
step-by-step.
Because simulation-based inference involves randomness and some of the
algorithms may not perfectly align, one shouldn’t expect results to be
identical, though in most cases, they should be similar.
Note that both Zelig
and clarify
have a
function called “sim()
”, so we will always make it clear
which package’s sim()
is being used.
Predictions at representative values
Here we’ll use the lalonde
dataset in
MatchIt and fit a linear model for re78
as a
function of the treatment treat
and covariates.
data("lalonde", package = "MatchIt")
We’ll be interested in the predicted values of the outcome for a typical unit at each level of treatment and their first difference.
Zelig
workflow
In Zelig
, we fit the model using
zelig()
:
fit <- zelig(re78 ~ treat + age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
model = "ls", cite = FALSE)
Next, we use setx()
and setx1()
to set our
values of treat
:
fit <- setx(fit, treat = 0)
fit <- setx1(fit, treat = 1)
Next we simulate the values using sim()
:
fit <- Zelig::sim(fit)
Finally, we can print and plot the predicted values and first differences:
fit
plot(fit)
clarify
workflow
In clarify
, we fit the model using functions outside
clarify
, like stats::lm()
or
fixest::feols()
.
fit <- lm(re78 ~ treat + age + educ + married + race +
nodegree + re74 + re75, data = lalonde)
Next, we simulate the model coefficients using
clarify::sim()
:
s <- clarify::sim(fit)
Next, we use sim_setx()
to set our values of the
predictors:
Finally, we can summarize and plot the predicted values:
summary(est)
#> Estimate 2.5 % 97.5 %
#> treat = 0 6686.0 5304.8 7989.7
#> treat = 1 8234.3 6411.0 10027.2
#> FD 1548.2 -19.7 3057.5
plot(est)
Rare-events logit
Zelig
uses a special method for logistic regression with
rare events as described in King and Zeng
(2001). This is the primary implementation of the method in R.
However, newer methods have been developed that perform similarly to or
better than the method of King and Zeng (Puhr et
al. 2017) and are implemented in R packages that are compatible
with clarify
, such as logistf
and
brglm2
.
Here, we’ll use the lalonde
dataset with a constructed
rare outcome variable to demonstrate how to perform a rare events
logistic regression in Zelig
and in
clarify
.
data("lalonde", package = "MatchIt")
#Rare outcome: 1978 earnings over $20k; ~6% prevalence
lalonde$re78_20k <- lalonde$re78 >= 20000
Zelig
workflow
In Zelig
, we fit a rare events logistic model using
zelig()
with model = "relogit"
.
fit <- zelig(re78_20k ~ treat + age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
model = "relogit", cite = FALSE)
fit
We can compute predicted values at representative values using
setx()
and Zelig::sim()
as above.
fit <- setx(fit, treat = 0)
fit <- setx1(fit, treat = 1)
fit <- Zelig::sim(fit)
fit
plot(fit)
clarify
workflow
Here, we’ll use logistf::logistif()
with
flic = TRUE
, which performs a variation on Firth’s logistic
regression with a correction for bias in the intercept (Puhr et al. 2017).
fit <- logistf::logistf(re78_20k ~ treat + age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
flic = TRUE)
summary(fit)
#> logistf::logistf(formula = re78_20k ~ treat + age + educ + married +
#> race + nodegree + re74 + re75, data = lalonde, flic = TRUE)
#>
#> Model fitted by Penalized ML
#> Coefficients:
#> coef se(coef) lower 0.95 upper 0.95 Chisq
#> (Intercept) -9.313465e+00 2.036981e+00 -1.330595e+01 -5.3209820512 24.4427501
#> treat 1.106769e+00 6.165332e-01 -1.029333e-01 2.3139987641 3.2185792
#> age 4.144350e-02 2.147498e-02 -8.432745e-04 0.0825427498 3.6937075
#> educ 2.936673e-01 1.283624e-01 4.612509e-02 0.5477908006 5.4633187
#> married 2.909164e-01 4.769332e-01 -6.309243e-01 1.2240602694 0.3844676
#> racehispan 1.056388e+00 7.276772e-01 -4.150796e-01 2.4305568008 2.0440063
#> racewhite 5.431919e-01 6.160352e-01 -6.255328e-01 1.7838676828 0.8008622
#> nodegree 2.868319e-01 5.852738e-01 -8.531608e-01 1.4243907533 0.2462839
#> re74 1.108995e-04 2.821452e-05 5.775266e-05 0.0001664918 17.0427331
#> re75 4.457916e-05 4.720736e-05 -4.706292e-05 0.0001334568 0.9472092
#> p method
#> (Intercept) 7.655103e-07 1
#> treat 7.280680e-02 2
#> age 5.461808e-02 2
#> educ 1.941973e-02 2
#> married 5.352219e-01 2
#> racehispan 1.528067e-01 2
#> racewhite 3.708357e-01 2
#> nodegree 6.197040e-01 2
#> re74 3.654797e-05 2
#> re75 3.304307e-01 2
#>
#> Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
#>
#> Likelihood ratio test=65.80085 on 9 df, p=1.007627e-10, n=614
#> Wald test = 179.4243 on 9 df, p = 0
We can compute predictions at representative values using
clarify::sim()
and sim_setx()
.
s <- clarify::sim(fit)
est <- sim_setx(s, x = list(treat = 0), x1 = list(treat = 1),
verbose = FALSE)
summary(est)
#> Estimate 2.5 % 97.5 %
#> treat = 0 0.02341 0.00885 0.06229
#> treat = 1 0.06760 0.01958 0.22209
#> FD 0.04419 -0.00322 0.19271
plot(est)
Estimating the ATT after matching
Here we’ll use the lalonde
dataset and perform
propensity score matching and then fit a linear model for
re78
as a function of the treatment treat
, the
covariates, and their interaction. From this model, we’ll compute the
ATT of treat
using Zelig
and
clarify
.
data("lalonde", package = "MatchIt")
m.out <- MatchIt::matchit(treat ~ age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
method = "nearest")
Zelig
workflow
In Zelig
, we fit the model using zelig()
directly on the matchit
object:
fit <- zelig(re78 ~ treat * (age + educ + married + race +
nodegree + re74 + re75),
data = m.out, model = "ls", cite = FALSE)
Next, we use ATT()
to request the ATT of
treat
and simulate the values:
fit <- ATT(fit, "treat")
fit
plot(fit)
clarify
workflow
In clarify
, we need to extract the matched dataset and
fit a model outside clarify
using another package.
m.data <- MatchIt::match.data(m.out)
fit <- lm(re78 ~ treat * (age + educ + married + race +
nodegree + re74 + re75),
data = m.data)
Next, we simulate the model coefficients using
clarify::sim()
. Because we performed pair matching, we will
request a cluster-robust standard error:
s <- clarify::sim(fit, vcov = ~subclass)
Next, we use sim_ame()
to request the average marginal
effect of treat
within the subset of treated units:
est <- sim_ame(s, var = "treat", subset = treat == 1,
contrast = "diff", verbose = FALSE)
Finally, we can summarize and plot the ATT:
summary(est)
#> Estimate 2.5 % 97.5 %
#> E[Y(0)] 5228 4052 6335
#> E[Y(1)] 6349 5220 7483
#> Diff 1121 -418 2604
plot(est)
Combining results after multiple imputation
Here we’ll use the africa
dataset in
Amelia to demonstrate combining estimates after multiple
imputation. This analysis is also demonstrated using
clarify
at the end of vignette("clarify")
.
First we multiply impute the data using amelia()
using
the specification in the Amelia documentation.
# Multiple imputation
a.out <- amelia(x = africa, m = 10, cs = "country",
ts = "year", logs = "gdp_pc", p2s = 0)
Zelig
workflow
With Zelig
, we can supply the amelia
object
directly to the data
argument of zelig()
to
fit a model in each imputed dataset:
fit <- zelig(gdp_pc ~ infl * trade, data = a.out,
model = "ls", cite = FALSE)
Summarizing the coefficient estimates after the simulation can be
done using summary()
:
summary(fit)
We can use Zelig::sim()
and setx()
to
compute predictions at specified values of the predictors:
fit <- setx(fit, infl = 0, trade = 40)
fit <- setx1(fit, infl = 0, trade = 60)
fit <- Zelig::sim(fit)
Zelig
does not allow you to combine predicted values
across imputations.
fit
plot(fit)
clarify
workflow
clarify
does not combine coefficients, unlike
zelig()
; instead, the models should be fit using
Amelia::with()
. To view the combined coefficient estimates,
use Amelia::mi.combine()
.
#Use Amelia functions to model and combine coefficients
fits <- with(a.out, lm(gdp_pc ~ infl * trade))
mi.combine(fits)
#> # A tibble: 4 × 8
#> term estimate std.error statistic p.value df r miss.info
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -177. 117. -1.51 1.87e+ 0 130515. 0.00837 0.00832
#> 2 infl 12.3 9.36 1.31 1.91e- 1 1210309. 0.00273 0.00273
#> 3 trade 21.3 1.82 11.7 1.26e-31 83428. 0.0105 0.0104
#> 4 infl:trade -0.290 0.140 -2.07 1.96e+ 0 954984. 0.00308 0.00307
Derived quantities can be computed using
clarify::misim()
and sim_apply()
or its
wrappers on the with()
output, which is a list of
regression model fits:
#Simulate coefficients, 100 in each of 10 imputations
s <- misim(fits, n = 100)
#Compute predictions at specified values
est <- sim_setx(s, x = list(infl = 0, trade = 40),
x1 = list(infl = 0, trade = 60),
verbose = FALSE)
summary(est)
#> Estimate 2.5 % 97.5 %
#> trade = 40 675 570 784
#> trade = 60 1101 1024 1174
#> FD 426 352 498
plot(est)