Import data from Excel sheets


How do we import data from all sheets in a number of Excel files? Each Excel file has a different number of sheets with names of no discernible pattern, but (thankfully) each sheet has the same structure: the first observation is in the same row and the columns correspond to each other. An example is the set of 17 Excel files of census data of barangays (villages) that was provided to me. Each Excel file corresponds to one region and within each file are sheets corresponding to the province in the region.  How do we consolidate all sheets in all files into one data file?

Continue reading

Put anything anywhere in Excel without sweat


putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

Continue reading

Saving variable labels before -collapse-


collapse literally collapses the dataset  into a dataset of summary statistics. After collapse, the dataset in memory is lost unless -preserve- was declared. Also, the labels of all variables in clist are replaced with (stat) variable name, where stat can be mean, sum, etc. (see help collapse).

Instead of retyping all the variable labels, you can use the extended macro function var label (see help extended_fcn) to  save the variable label of each variable in the varlist into a local macro before collapse. Restore the labels using label var (see help label). Example:
sysuse auto, clear

foreach var of varlist * {
    local vlab`var': var label `var'
    }

collapse price - gear_ratio, by(foreign)

foreach var of varlist * {
    label var `var' "`vlab`var''"
    }   

Counting occurrence of strings within strings


Somebody asked how to count the number of occurrences of a string within a string. For example, if I have the following data, I want to generate new variables countSS, countSM, and countSG that contains the number of occurrences of “SS”, “SM”, or “SG” in variable awards.

————————————————————————————
clear
input id str40 awards
1    “SS; SS; SM; SG”
2    “SM; SG”
3    “SG; SG; SG; SS”
4    “SS; SS; SG; SG; SS; SM; SG”
end
list
————————————————————————————

Here is one solution using the macro extended function -subinstr- (-help extended_fcn-).

————————————————————————————
local tocount SS SM SG
foreach t of local tocount{
gen countt'</em>=0
<strong>local </strong><em>N</em> = _N
<strong>forvalues </strong>i = 1/
N'{
local a = awards[i']
<strong>local </strong><em>c</em> : subinstr local  a  "
t'” “t'" , all  count(local <em>c2</em>)
<strong>replace </strong>count
t’ = c2' ini’
}
}
————————————————————————————




*Thanks to Jacob Reynolds (jlreynol@nps.edu) for the question. Although, for the best advise on Stata, Statalist is the best place to ask :). See Stuck? Hello Statalist .

-foreach- for all


If you are still doing this:

gen lnX=ln(X)
gen lnY=ln(Y)
gen lnZ=ln(Z)
gen lnX2=lnX^2
gen lnY2=lnY^2
gen lnZ2=lnZ^2

then you are wasting a lot of space (and time). Stata is a programming language. It does not only have the ready-to-use statistical/econometric packages but is also capable of doing a lot of things, and one of them is looping or doing repetitive tasks. For example, the 6 lines above can be written concisely as:

foreach var of varlist X Y Z{
gen ln<em>var</em>’=ln(var’)
gen ln<em>var</em>’<em>2</em>=(lnvar’)^2
}

[Note: the punctuations that enclose var are not the same. The first (`) is a backquote (the one below the tilde, ~, in the keyboard) and the second (‘) is the apostrophe (the one below the quotation mark “). Thanks to Clarence for pointing this out.]

The loop above repeats the 2 -gen- commands for each of the variables in varlist. The other variants of -foreach- are:
  1. List defined in macros (local or global). For example, we define a local macro called “country” with the list of countries phl, sgp, and tha:

local country phl sgp tha
foreach c of local country{
use data`c’.dta, clear
…..
}

What if you want a loop across a list of 100 items, is it necessary to list all the items? There are 2 ways to do this. One is to list all the items as the example above, and the other is to use -levelsof- For example,

levelsof varname, local(macroname)
foreach c of local macroname{
display “`c’”
}

If vaname is countrycode, i.e., the list of country 3-letter codes, the line “levelsof varname, local(macroname)” above is the same as:

local macroname afg bel chn … zam
  1. List of numbers. For example, years:

foreach year of numlist 1970 1980 1990{
….
}

or -forvalues- (future post).

The -foreach- command is very handy and is accessible to all. I find Nicolas Cox’s “How to face lists with fortitude” and “How to repeat yourself without going mad” and Christopher Baum’s “A little bit of Stata programming goes a long way…” very instructive.