Fredrik Sävje
Assistant Professor
Department of Political Science
Department of Statistics and Data Science
Yale University
Publications
Software

R package with tools for distance metrics.

Quick Generalized Full Matching in R.

Quick Threshold Blocking in R.

C library for sizeconstrained clustering.
Working papers

Matching methods are used to make units comparable on observed characteristics. Full matching can be used to derive optimal matches. However, the method has only been defined in the case of two treatment categories, it places unnecessary restrictions on the matched groups, and existing implementations are computationally intractable in large samples. As a result, the method has not been feasible in studies with large samples or complex designs. We introduce a generalization of full matching that inherits its optimality properties but allows the investigator to specify any desired structure of the matched groups over any number of treatment conditions. We also describe a new approximation algorithm to derive generalized full matchings. In the worst case, the maximum withingroup dissimilarity produced by the algorithm is no worse than four times the optimal solution, but it typically performs close to on par with existing optimal algorithms when they exist. Despite its performance, the algorithm is fast and uses little memory: it terminates, on average, in linearithmic time using linear space. This enables investigators to derive wellperforming matchings within minutes even in complex studies with samples of several million units.

A common method to reduce the uncertainty of causal inferences from experiments is to assign treatments in fixed proportions within groups of similar units: blocking. Previous results indicate that one can expect substantial reductions in variance if these groups are formed so to contain exactly as many units as treatment conditions. This approach can be contrasted to threshold blocking which, instead of specifying a fixed size, requires that the groups contain a minimum number of units. In this paper, I investigate the advantages of respective method. In particular, I show that threshold blocking is superior to fixedsized blocking in the sense that it always finds a weakly better grouping for any objective and sample. However, this does not necessarily hold when the objective function of the blocking problem is unknown, and a fixedsized design can perform better in that case. I specifically examine the factors that govern how the methods perform in the common situation where the objective is to reduce the estimator's variance, but where groups are constructed based on covariates. This reveals that the relative performance of threshold blocking improves when the covariates become more predictive of the outcome.

Recent studies of the effects of political incumbency on election outcomes have almost exclusively used regression discontinuity designs. This shift from the past methods has provided credible identification, but only for a specific type of incumbency effect: the effect for parties. The other effects in the literature, most notably the personal incumbency effect, have largely been abandoned together with the methods previously used to estimate them. This study aims at connecting the new methodical strides with the effects discussed in the past literature. A causal model is first introduced which allows for formal definitions of several effects that previously only been discussed informally. The model also allows previous methods to be revisited and derive how their estimated effects are related. Several strategies are then introduced which, under suitable assumptions, can identify some of the newly defined effects. Last, using these strategies, the incumbency effects in Brazilian mayoral elections are investigated.

Scholars have theorized that congenital health endowment is a critical determinant of economic outcomes later in a person's life. In an important contribution, Field, Robles and Torero [American Economic Journal: Applied Economics, 1, 4 (2009)] use iodine supplementation programs in Tanzania to estimate the impact of fetal iodine deficiency on educational attainment. The study is one of the first validations of the fetal origins hypothesis. Based on their large estimated effects, the authors conclude that geographic variation in iodine deficiency plausibly accounts for a substantial share of the variation in educational attainment in the developing world. We revisit the Tanzanian iodine supplementation programs through a narrow and wide replication of Field, Robles and Torero's study. We are able to exactly replicate the original results, but we find that they rest on a set of undocumented and unmotivated specification choices and sample restrictions. With a better motivated specification, we cannot establish an effect of fetal iodine protection on educational attainment. The result is unchanged after we increase the sample size fourfold and improve the precision of the treatment variable by incorporating new institutional and medical insights. We conclude that the available data do not provide sufficient power to detect an eventual effect since treatment cannot be measured with sufficient precision.
Works in progress

Average treatment effects under unknown interference,

The get out of jail card: The effect of political office holding on court rulings,

Blocking estimators and inference under the NeymanRubin model,

A twofactor approximation algorithm for paired threshold blocking.

Semiblocking: Cross block dependence to improve inference.

Assumptionfree permutation tests for the existence of unit fixedeffect.

Intuitive construction of distance matrices and metrics (software).

Hypothesis testing with the Synthetic Control Method,
Last updated July 3, 2017.