Haiku: cls

clean results window
in Stata thirteen, typing
cls clears screen


Where to buy Stata in the Philippines

In the Philippines, there has been an emerging demand in the public sector for data-driven analysis of various govenrment programs (yes, emerging.. don’t ask why.. but the leadership has a lot to do with this.. bless them). I am lucky enough to be invited to introduce Stata to some government agencies (and some private institutions) and how it can be used with their data. Often the the question where to get a copy of Stata pops up. So, for reference: here is the sole distributor of Stata in the country:

Continue reading

break me

Sometimes we want to break a continuous variable into a smaller set of categories—into evenly spaced or equally sized groups, or groups based on limits we specify, or groups based on another variable or a set of variables.

Let us take for example the variable price of cars in auto.dta.
sysuse auto.dta, clear    // open a Stata built-in data
summ price

Continue reading

Put anything anywhere in Excel without sweat

putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

Continue reading

No -usespss- for Mac

In Reading SPSS data file into Stata, I describe Sergiy Radyakin’s -usespss- that loads SPSS data (.sav) into Stata. I was on Windows then. -usespss- is unfortunately not available for Mac OS. StatTransfer could easily do this if you have the software. Another option is to use R.

A quick Google search led me to a simple R routine that does exactly this. Following the steps outlined in Daniel Marcelino’s Loading SPSS (.sav) into Stata, I managed to covert an SPSS data set I downloaded from IDRA UCLA website into a Stata .dta file.

Try it out! It is easy to follow.

An interesting extension is Gabriel Rossman’s importspss.ado (requires R). It implements the R routine as an ado-file.

Tell me, where did I go wrong

If you are Filipino, you are most likely singing the title by now :)

Looking for a missing bracket, a misplaced comma, or a space that shouldn’t be there—or debugging in general—can be a pain sometimes. When the usual error message fails to point out where you messed up, try turning trace on to track down the error. trace literally traces the execution of programs. It echoes the lines that Stata executes internally. Reading through the whole thing on your results window can be daunting but you don’t really have to go through those. You just need to know which line it stops executing and see why it stopped there. To turn on trace, type:

set trace on
And.. don’t forget to turn it off when you don’t need it. They can be really very long.

set trace off
Next time you wonder where you went wrong, use trace before you lose your mind. For more options, see help set trace.

Now, continue with the singing.. “what did I do to make you change your mind completely..”


Statalist, too, has a new home

If you have been following Statalist (see Stuck? Hello Statalist), the email-based support system for Stata users, you must have long known that it has moved to a new home and a new format. Statalist is now a forum hosted at Stalist.org maintained by StataCorp but moderated by a “friendly group” (quoted from the site) of users.

…a forum where Stata users from experts to neophytes maintain a lively dialogue about all things statistical and Stata…

Everyone can browse through the forum but only registered users may participate in the discussions. You may register here. DON’T forget to read the FAQ before posting.

PS They really are nice people until you piss them off 😉



Haiku: do

run me ctrl-d
in pc, in ios
’tis cmd-shift-d

Splitting strings

In -destring- complication, Anup asked how to split a string variable. In his case, he has a variable of the form 28-18-0018-02183100-02-O-B where 28 represents state code, 18 represents districts code, 0018 represents subdistricts code and 02183100 represents village code. His problem is how to extract the state, districts, etc. codes separately from the variable and label all the code accordingly.

In response, Freddy provided a solution using the substr function assuming that the code for each part is of the same length of characters, i.e., a district code is always 2 characters, a subdistrict code is always 4 characters.
gen state = substr(yourvariablename, 1, 2)
gen district = substring(yourvariablename, 4, 2)
gen subdistrict = substring(yourvariablename, 7, 4)

A similar solution was suggested in Splitting numbers before nsplit was discussed for numeric variables.

But how about when the codes are not of the same length but is separated by a character, such as a hyphen: 8-18-18-02183100-02-O-B and 8-18-018-02183100-02-O-B. In this case, split would be helpful. split literally splits string variables into parts using specified character or strings as a separator. The basic syntax for split is (see help split):

split stringvariable, parse(stringseparator)

To split a code of the form 8-18-0018-02183100-02-O-B into 7 parts using the hyphen as a parser:

split yourvariablename, parse(-)

This will create 7 new variables: yourvariablename1, yourvariablename2, and so on. You may specify a new prefix using the gen() option. You may also want rename the variable names after.

In splitting variables, string or numeric, I would like to echo Nick Cox’s comment in Splitting numbers, “the bottom line is just standard: be careful.”

Rolling standard deviations and missing observations

In And we’re rolling, rolling; rolling on the river, Hasan asked how he could “keep only those values that were calculated using at least 3 observations” after he calculated the 4 period rolling standard deviation of a set of observations. One solution is to tag the periods when the missing observations within the window (in this case 4) is more than 1 then replace the calculated standard deviations for these periods to missing.

Two things to note are:

(1) rolling requires that your data has been declared as a time-series dataset (see help tsset). Time-series operators, such L. for lags, are allowed.

(2) The keep() option in rolling allows you to keep the date variable, which you can use as an identifier in merging files

Here is an illustration (assuming nonrecursive analysis):
set obs 20
set seed 1
gen date = _n
gen v1 = 1+int((100)*runiform())
gen v2 = v1
replace v2 = . in 1/4
replace v2 = . in 10/12
replace v2 = . in 18/20
tsset date

rolling sd2 = r(sd), window(4) keep(date) saving(f2, replace): sum v2
merge 1:1 date using f2, nogenerate

gen tag = missing(l3.v2) + missing(l2.v2) + missing(l1.v2) + missing(v2) > 1
gen sd = sd2 if tag==0

In the first block, we created an artificial data set of 20 uniformly distributed random integers between 1 and 100, replaced some observations to missing, and told Stata that we are dealing with a time-series data set.

In the second block, we calculated the 4 window rolling standard deviation. By using the saving() option rather than clear, we have not replaced the current data in memory and saved the resulting dataset from the rolling command in f2.dta. We merged this to our current data.

In the last block, we generated the variable tag that returns 1 if the expression missing(l3.v2) + missing(l2.v2) + missing(l1.v2) + missing(v2) > 1 is true, i.e., if the number of missing observations within the 4 period window is more than 1. Otherwise, tag is 0. Finally, in the last line, we created a new variable sd that is missing if the number of observations used in each window is less than 3.