Put anything anywhere in Excel without sweat

putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

Continue reading

_n, its big brother _N, and Super -bysort-

_n and _N are Stata system variables—they exist whether you like them or not. They are also referred to as “underscore variables” for the obvious reason that they are written as _variable. Little  _n contains the line number of the current observation, while its big brother _N contains the total number of observations in the data.

_n is often used to generate unique codes for each observation:

gen code=_n          /* generates the variable code that contains the integer 1 (for the first obervation, _n==1) to _N (for the last observation, _n==_N) */

Or to refer to neighboring observations (also called subscripting):

gen gdplag=gdp[_n-1]          /* generates the variable gdplag, which is equal to the preceding observation’s gdp */

gen gdpgrowth=(gdp /gdp[_n-1] – 1)*100         /* generates the variable gdpgrowth, the growth rate for the variable gdp */

[Note: The gdplag and gdpgrowth for the first observation (_n==1) will be missing since observation [_n-1] does not exist for _n==1.]

Make sure, however, that you refer to the right neighbor! For example, if you are calculating the growth rate of variable gdp between 1999 and 2000, gdp must be in order such that the gdp subscripted by [_n-1] is the gdp for 1999. This is easily addressed by invoking the -sort- command, “sort year,” before generating the growth rate variable. There is another complication, however, when you are calculating this for different groups of observations, say by country. Will “sort country year” before generating the variable suffice? No. Why? Because the [_n-1] for the first observation of country B refers to the last observation of country A. Here is where Super -bysort- comes to the rescue:

bysort country year: gen gdplag=gdp[_n-1] 

bysort country year: gen gdpgrowth=(gdp /gdp[_n-1] – 1)*100       

// Another syntax for bysort is: by country (year), sort: … 

_n may also be used to keep the nth observation by group:

bysort householdid: keep if _n==1         /* keeps the first observation for each householdid */

Big brother _N, on the other hand, may be used to generate  a variable that contains the number of observations by group:

bysort householdid: gen householdsize=_N         /* generates the variable householdsize, which is equal to the number of observations  for each householdid. */

What we have illustrated above are just a few examples to showcase the potential of underscore variables _n and _N. For sure, you will find other uses of _n and _N. Another underscore variable is the beautiful number π, which, as you would’ve guessed, is written as _pi.