Fredrik Sävje
Postdoctoral fellow, Department of Political Science & Statistics, UC Berkeley.
Publications

Inferences from randomized experiments can be improved by blocking: assigning treatment in fixed proportions within groups of similar units. However, the use of the method is limited by the difficulty in deriving these groups. Current blocking methods are restricted to special cases or run in exponential time; are not sensitive to clustering of data points; and are often heuristic, providing an unsatisfactory solution in many common instances. We present an algorithm that implements a new, widely applicable class of blocking—threshold blocking—that solves these problems. Given a minimum required group size and a distance metric, we study the blocking problem of minimizing the maximum distance between any two units within the same group. We prove this is a NPhard problem and derive an approximation algorithm that yields a blocking where the maximum distance is guaranteed to be at most four times the optimal value. This algorithm runs in O(n log n) time with O(n) space complexity. This makes it the first blocking method with an ensured level of performance that works in massive experiments. While many commonly used algorithms form pairs of units, our algorithm constructs the groups flexibly for any chosen minimum size. This facilitates complex experiments with several treatment arms and clustered data. A simulation study demonstrates the efficiency and efficacy of the algorithm; tens of millions of units can be blocked using a desktop computer in a few minutes.
Working papers

A common method to reduce the uncertainty of causal inferences from experiments is to assign treatments in fixed proportions within groups of similar units: blocking. Previous results indicate that one can expect substantial reductions in variance if these groups are formed so to contain exactly as many units as treatment conditions. This approach can be contrasted to threshold blocking which, instead of specifying a fixed size, requires that the groups contain a minimum number of units. In this paper, I investigate the advantages of respective method. In particular, I show that threshold blocking is superior to fixedsized blocking in the sense that it always finds a weakly better grouping for any objective and sample. However, this does not necessarily hold when the objective function of the blocking problem is unknown, and a fixedsized design can perform better in that case. I specifically examine the factors that govern how the methods perform in the common situation where the objective is to reduce the estimator's variance, but where groups are constructed based on covariates. This reveals that the relative performance of threshold blocking improves when the covariates become more predictive of the outcome.

Recent research has reported positive effects on schooling, particularly for girls, due to in utero protection from iodine deficiency resulting from iodized oil capsule distribution in Tanzania. These results suggest that similar health interventions might have contributed to the reduction of the educational gender gap and, more generally, unveiled a mechanism through which the natural health environment affects social and economic development. We revisit the Tanzanian experience by investigating how these effects differ over time and across surveys; across different treatment specifications; and across additional educational outcome measures. Contrary to previous studies, we find that the estimated effects tend to be small and not robust across specifications or samples. Using all available data and a medically motivated iodine depletion function, we find no evidence of a positive longrun effect of iodine deficiency protection on educational attainment.
Works in progress
Defining and identifying incumbency effects.

The get out of jail card: The effect of political office holding on court rulings,

Blocking estimators and inference under the NeymanRubin model,

Fast, nearoptimal matching for massive observational studies,
A twofactor approximation algorithm for paired threshold blocking.
Semiblocking: Cross block dependence to improve inference.

OrMatching: Reducing extrapolation error in sparse samples,
Identifying Average Causal Effects without the SUTVA in randomized experiments.
Assumptionfree permutation tests for the existence of unit fixedeffect.
Intuitive construction of distance matrices and metrics (software).

Hypothesis testing with the Synthetic Control Method,