Regression discontinuity design in Stata (Part 1)

There has been a growing use of regression discontinuity design (RDD), introduced by Thistlewaite and Campbell (1960), in evaluating impacts of development programs. Lee and Lemieux (2010), Imbens and Lemieux (2007), and Cook (2008) provide comprehensive reviews of regression discontinuity design and its applications in the social sciences. This provides a summary. In Part 2, a comparison of user-written Stata estimation packages is provided. In Part 3, validation or falsification tests are discussed.
Continue reading


This is a shameless plug.

By now you may already know that StataCorp initiated a contest in Facebook that started a week ago and will end tomorrow, Dec 2 at 930am CST.

I submitted an earlier blog post “Using Stata to make sense of my Uber Data” (31 Aug 2015).

Please help me win Stata MP by clicking on the photo below and casting your vote. To vote please click the photo and ‘like’. (This will lead you to StataCorp’s Facebook page.)

Alternatively, you may click this link and ‘like’:

Thank you very much! :)


creative destruction: collapse and contract

Creative destruction, coined by Joseph Schumpeter in Capitalism, Socialism, and Democracy, refers to the process by which new innovations kill old inefficient products or processes. But we are not talking about that but instead, of destroying data to create more useful information. By destroying, we mean altering the data currently loaded in memory with no undo button to rely to. When you load or open data into Stata, Stata stores the data in your machine’s RAM. Any changes made, therefore, are not permanent or saved in your hard drive until you call on save, but still be careful that you do not overwrite your raw data files.
Continue reading

UN Comtrade API in Stata

The UN Comtrade is the largest repository of disaggregated trade statistics. It offers free access to detailed annual trade data starting from 1962 and monthly trade data from 2010. Free access is limited to 50,000 records per query. This limitation is relaxed in some cases but the API (as of date) only allows a maximum of 50,000 records per query for all users.

The UN Comtrade data extraction API (currently beta version) to access the database is publicly available. How can we exploit this to download Comtrade data directly from Stata?
Continue reading

dates in Starbucks

In yesterday’s post, one of the needed cleaning was to extract date and day of week from the string:
. list date in 1/5

     |                                       date |
  1. |  Date: August 31, 2015 at 1:42:41 PM GMT+8 |
  2. | Date: August 24, 2015 at 12:36:55 PM GMT+8 |
  3. |    Date: July 27, 2015 at 2:51:27 PM GMT+8 |
  4. |    Date: July 20, 2015 at 2:45:43 PM GMT+8 |
  5. |    Date: July 20, 2015 at 2:07:49 PM GMT+8 |

Continue reading

Using Stata to make sense of my Uber data

I tried Uber in late May and since then it has been 131 Uber rides covering 1,200 kilometers and 80 hours on the road. Uber (and GrabTaxi) has eliminated the wait under the heat (and rain) and the dealing with the assholeness of most taxi drivers here in Metro Manila. But what I love most about Uber, apart from their customer service, is the data they send. Trip receipts are automatically sent as soon as the trip has ended. These do not only show how much I am charged but include time, distance, fare disaggregated by time and distance, and many more. GrabTaxi receipts, on the other hand, only show amount paid and manually encoded by drivers.
Continue reading

The adventures of tin()

Using the if qualifer with time-series data is tricky. Until you meet tin(). Let us use quarterly German macro data, lutkepohl2, from Stata website to illustrate.
Continue reading

-destring- uncomplicated

In a comment to the previous post destring complication: negative numbers, Nick Cox pointed out “the most important advise” in using destring: “never destring, replace unless you are absolutely sure that you are right or are willing to do things again if you made the wrong decision. The generate() option is there for a purpose.”

In addition, his comment point to simpler solution than using regular expressions.

Continue reading

-destring- complication: negative numbers

Less than 2 hours flight…

In a Stata training, one of the students wondered why after importing an Excel file of financial indicators into Stata some were read as strings. A quick browse at the data indicates the presence of hyphens (“-“) and that these were used in different ways: one to indicate a negative number and another to indicate a missing observation.


How do we convert these variables to numeric as destring returns an error?

Continue reading

Import data from Excel sheets

How do we import data from all sheets in a number of Excel files? Each Excel file has a different number of sheets with names of no discernible pattern, but (thankfully) each sheet has the same structure: the first observation is in the same row and the columns correspond to each other. An example is the set of 17 Excel files of census data of barangays (villages) that was provided to me. Each Excel file corresponds to one region and within each file are sheets corresponding to the province in the region.  How do we consolidate all sheets in all files into one data file?

Continue reading