creative destruction: collapse and contract


Creative destruction, coined by Joseph Schumpeter in Capitalism, Socialism, and Democracy, refers to the process by which new innovations kill old inefficient products or processes. But we are not talking about that but instead, of destroying data to create more useful information. By destroying, we mean altering the data currently loaded in memory with no undo button to rely to. When you load or open data into Stata, Stata stores the data in your machine’s RAM. Any changes made, therefore, are not permanent or saved in your hard drive until you call on save, but still be careful that you do not overwrite your raw data files.
Continue reading

dates in Starbucks


In yesterday’s post, one of the needed cleaning was to extract date and day of week from the string:
. list date in 1/5

     +--------------------------------------------+
     |                                       date |
     |--------------------------------------------|
  1. |  Date: August 31, 2015 at 1:42:41 PM GMT+8 |
  2. | Date: August 24, 2015 at 12:36:55 PM GMT+8 |
  3. |    Date: July 27, 2015 at 2:51:27 PM GMT+8 |
  4. |    Date: July 20, 2015 at 2:45:43 PM GMT+8 |
  5. |    Date: July 20, 2015 at 2:07:49 PM GMT+8 |
     +--------------------------------------------+

Continue reading

Using Stata to make sense of my Uber data


I tried Uber in late May and since then it has been 131 Uber rides covering 1,200 kilometers and 80 hours on the road. Uber (and GrabTaxi) has eliminated the wait under the heat (and rain) and the dealing with the assholeness of most taxi drivers here in Metro Manila. But what I love most about Uber, apart from their customer service, is the data they send. Trip receipts are automatically sent as soon as the trip has ended. These do not only show how much I am charged but include time, distance, fare disaggregated by time and distance, and many more. GrabTaxi receipts, on the other hand, only show amount paid and manually encoded by drivers.
Continue reading

The adventures of tin()


Using the if qualifer with time-series data is tricky. Until you meet tin(). Let us use quarterly German macro data, lutkepohl2, from Stata website to illustrate.
Continue reading

-destring- uncomplicated


In a comment to the previous post destring complication: negative numbers, Nick Cox pointed out “the most important advise” in using destring: “never destring, replace unless you are absolutely sure that you are right or are willing to do things again if you made the wrong decision. The generate() option is there for a purpose.”

In addition, his comment point to simpler solution than using regular expressions.

Continue reading

-destring- complication: negative numbers


Less than 2 hours flight…

In a Stata training, one of the students wondered why after importing an Excel file of financial indicators into Stata some were read as strings. A quick browse at the data indicates the presence of hyphens (“-“) and that these were used in different ways: one to indicate a negative number and another to indicate a missing observation.

hyphen

How do we convert these variables to numeric as destring returns an error?

Continue reading

Import data from Excel sheets


How do we import data from all sheets in a number of Excel files? Each Excel file has a different number of sheets with names of no discernible pattern, but (thankfully) each sheet has the same structure: the first observation is in the same row and the columns correspond to each other. An example is the set of 17 Excel files of census data of barangays (villages) that was provided to me. Each Excel file corresponds to one region and within each file are sheets corresponding to the province in the region.  How do we consolidate all sheets in all files into one data file?

Continue reading

Haiku: cls


clean results window
in Stata thirteen, typing
cls clears screen

 

break me


Sometimes we want to break a continuous variable into a smaller set of categories—into evenly spaced or equally sized groups, or groups based on limits we specify, or groups based on another variable or a set of variables.

Let us take for example the variable price of cars in auto.dta.
sysuse auto.dta, clear    // open a Stata built-in data
summ price

Continue reading

Put anything anywhere in Excel without sweat


putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

Continue reading