There has been a growing use of regression discontinuity design (RDD), introduced by Thistlewaite and Campbell (1960), in evaluating impacts of development programs. Lee and Lemieux (2010), Imbens and Lemieux (2007), and Cook (2008) provide comprehensive reviews of regression discontinuity design and its applications in the social sciences. This provides a summary. In Part 2, a comparison of user-written Stata estimation packages is provided. In Part 3, validation or falsification tests are discussed.

RDD is a quasi-experimental method for evaluating program impact when observation units (example, households) can be sorted using some continuous metric (example, income) and program assignment is based on a pre-determined threshold or cutoff point of the sorting metric. Observations just below the cutoff are deemed similar to, and therefore, compare well to those just above the cutoff. In the absence of the program, one would expect that any shifts in outcome variables would happen smoothly alongside minor changes in the running variable. Thus, a large jump in the outcome variable, observed precisely at the threshold value of the running variable, after program intervention can be attributed to the program itself.

Among the advantages of RDD are the weaker assumptions required for its validity compared to other non-experimental impact evaluation methods. For example, Hahn, Todd, and van der Klaauw (2001) showed that RDD requires milder assumptions relative to those needed for other non-experimental methods.

The main caveat in RDD is that because program impact is estimated locally, or using observations very close to the cutoff, the generalizability of RDD estimated effect is limited. While the evaluation results using RDD has strong internal validity properties considered by many as next only to RCT, it needs to be recognized that its external validity is limited to observation units near the eligibility threshold.

RDD can be characterized as an estimation of whether an outcome variable exhibits a discontinuous jump precisely at the cutoff of the running variable. The magnitude of the discontinuous jump at the cutoff may be estimated using a local regression that limits the observations to a specified bandwidth around the cutoff where the functional form is most likely linear. Figures below graphically illustrates a local linear regression RDD before and after program participation on a simulated data within a specified bandwidth,

*h*. In the right panel, the discontinuous jump,

*tau*, at the cutoff is the estimated program impact.

Drawing the graphs above:

// Before program participation set seed 2 set obs 1000 range x_obs -1 1 1000 g y_pre = x_obs^3 + rnormal() tw scatter y_pre x_obs, msize(small) mcolor(gs10) /// || lfit y_pre x_obs, range(-.35 .35) lcolor(black) lw(thick) /// xline(0, lpattern(-)) /// yt("Outcome variable (Y)") /// xt("Assignment variable (X)") /// t("Before program participation") /// legend(off) /// xline(-.35 .35, lp(-) lc(gs10)) /// text(-4.5 -.35 "{it:-h}") /// text(-4.5 +.35 "{it:h}") // After program participation set seed 2 local tau = 1.25 cap drop y_post g y_post= x +`tau' + rnormal() if x=0 tw scatter y_post x_obs, msize(small) mcolor(gs10) yl(-4(2)4) /// || lfit y_post x_obs if inrange(x_obs, -.35, 0), /// range(-.35 0 ) lcolor(black) lw(thick) /// || lfit y_post x_obs if inrange(x_obs, 0, .35), /// range(0 .35 ) lcolor(black) lw(thick) /// xline(0, lpattern(-)) /// yt("Outcome variable (Y)") /// xt("Assignment variable (X)") /// t("After program participation") /// legend(off) /// text(.9 0.05 "{&tau}", size(*2)) /// xline(-.35 .35, lp(-) lc(gs10)) /// text(-4.5 -.35 "{it:-h}") /// text(-4.5 +.35 "{it:h}")

How to select the appropriate bandwidth,

*h*, from which to estimate

*tau*? The determination of the bandwidth is a tradeoff between bias and variance. Bias increases as one moves away from the cut-off while variance increases with smaller number of observations as one moves closer to the cut-off and vice-versa. A narrow bandwidth will have lower bias because more observations are near the cut off, but will have larger variance because of smaller number of observations. An optimal

*h*therefore balances this tradeoff. Selcting bandwidths have been proposed by, among others, Imbens and Kalyanaraman (2012), Calonico, Cattaneo, and Titiunik (2014), and Ludwig and Miller (2007), which we will refer to as the IK, CCT, and CV (cross-validation) bandwidths, repsectively.

In Stata, there are atleast three user-written RD estimation packages: (1) Austin Nichols’s -rd- (ssc install rd); (2) CCT’s -rdrobust- (ssc install rdrobust); and Boris Kaiser’s -rdcv- (ssc install rdcv). A comparison of these will be presented in Part 2.

*Note: The discussions above are recycled from what I wrote for the report “Keeping children healthy and in school: Evaluating the Pantawid Pamilya Using Regression Discontinuity Design” (2014). Full report written with Dr. Babes Orbeta, Mico del Mundo, Melba Tutor, Mai Valera, and Dama Yarcia. We benefited a lot from Mattias Cattaneo (University of Michigan) who provided a short course on regression discontinuity here in Manila in September 2014 through the Asian Development Bank and Jed Friedman (World Bank) during the technical review sessions.*

References:

Bloom, Howard. 2012. Modern Regression Discontinuity Analysis. Journal of Research on Educational Effectiveness, 5(1):43-82.

Calonico, S., M. D. Cattaneo, and R. Titiunik. 2014. Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs. Econometrica, 82(6):2295–2326.

Calonico, Sebastian, Matias Cattaneo, and Rocio Titiunik. 2014b. Robust data-driven inference in the regression discontinuity design. The Stata Journal, vv(ii): 1–36.

Cook, Thomas. 2008. Waiting for Life to Arrive: A history of the regression-discontinuity design in Psychology, Statistics and Economics. Journal of Econometrics, 142(2): 636–654.

Hahn, Jinyong, Petra Todd, and Wilbert Vab der Klaauw. 2001. Identification and Estimation of Treatment Effects with a Regression Discontinuity Design. Econometrica, 69(1): 201–209.

Imbens, Guido and Karthik Kalyanaraman. 2012. Optimal bandwidth choice for the regression discontinuity estimator. Review of Economic Studies, 79: 933–959.

Imbens, Guido and Thomas Lemieux. 2007. “Regression Discontinuity Designs: A Guide to Practice.” NBER Working Paper 13039. http://nber.org/papers/w13039

Lee, David S., and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.” Journal of Economic Literature, 48(2): 281–355. http://www.aeaweb.org/articles.PhP?doi=10.1257/jel.48.2.281

McRary, Justin. 2008. Manipulation of the Running Variable in the Regression Discontinuity Design: a Density Test. Journal of Econometrics, 142(2): 698–714.

Nichols, Austin. 2011. rd 2.0: Revised Stata module for regression discontinuity estimation. http://ideas.repec.org/c/boc/bocode/s456888.html

Thistlewaite, D.L., and Campbell, D.T. 1960. Regression Discontinuity Analysis: An Alternative to the Ex-Post Facto Design. Journal of Educational Psychology, 51: 309-317.

Filed under: Econometrics / Statistics, Graphics Tagged: | impact evaluation, rd, rdd, regression discontinuity

Yujun Lian, on 22 November 2016 at 5:39 PM said:the following command may be wrong: g y_post= x +`tau’ + rnormal() if x=0. There are no observations with x==0.

Raquel Sampaio, on 1 September 2016 at 7:10 AM said:Good job! Looking foward to part II!

andes, on 1 June 2016 at 10:26 PM said:waiting for part II of RD; your work through is contribution is amazing!!!

All the best

Mitch Abdon, on 2 June 2016 at 6:22 PM said:hi andes. thank you for your support. just got back to 90% health. will probably have time to focus on writing again soon

rose, on 16 March 2016 at 12:19 AM said:Can you explain in detail how to get RD estimates in stata?

Yujun Lian, on 9 September 2017 at 9:55 AM said:There are several user commands to do RD estimates in stata: -rd-, -rdrobust-. Type -ssc install cmd- to install.

hans582, on 26 January 2016 at 6:59 PM said:Thanks for this interesting post. There is at least one error in the syntax. It has to be if x==0 (you missed a =).

Jorge Guzman, on 18 December 2015 at 8:09 AM said:the second part of your code does not work.

change this section in order to fix it.

// After program participation

set seed 2

*local tau = 1.25

cap drop y_post

g y_post= x_obs + 1.25 + rnormal()