Saving variable labels before -collapse-


collapse literally collapses the dataset  into a dataset of summary statistics. After collapse, the dataset in memory is lost unless -preserve- was declared. Also, the labels of all variables in clist are replaced with (stat) variable name, where stat can be mean, sum, etc. (see help collapse).

Instead of retyping all the variable labels, you can use the extended macro function var label (see help extended_fcn) to  save the variable label of each variable in the varlist into a local macro before collapse. Restore the labels using label var (see help label). Example:
sysuse auto, clear

foreach var of varlist * {
    local vlab`var': var label `var'
    }

collapse price - gear_ratio, by(foreign)

foreach var of varlist * {
    label var `var' "`vlab`var''"
    }   

Counting occurrence of strings within strings


Somebody asked how to count the number of occurrences of a string within a string. For example, if I have the following data, I want to generate new variables countSS, countSM, and countSG that contains the number of occurrences of “SS”, “SM”, or “SG” in variable awards.

————————————————————————————
clear
input id str40 awards
1    “SS; SS; SM; SG”
2    “SM; SG”
3    “SG; SG; SG; SS”
4    “SS; SS; SG; SG; SS; SM; SG”
end
list
————————————————————————————

Here is one solution using the macro extended function -subinstr- (-help extended_fcn-).

————————————————————————————
local tocount SS SM SG
foreach t of local tocount{
gen countt'</em>=0
<strong>local </strong><em>N</em> = _N
<strong>forvalues </strong>i = 1/
N'{
local a = awards[i']
<strong>local </strong><em>c</em> : subinstr local  a  "
t'” “t'" , all  count(local <em>c2</em>)
<strong>replace </strong>count
t’ = c2' ini’
}
}
————————————————————————————




*Thanks to Jacob Reynolds (jlreynol@nps.edu) for the question. Although, for the best advise on Stata, Statalist is the best place to ask :). See Stuck? Hello Statalist .

File names in a local macro


Last week I needed to convert a number of Stata data files into text files so that they can be uploaded to Googledocs (why Googledocs is another story).  If my file names have a specific pattern, such as:

data2001.dta
data2002.dta
.
.
.
data2009.dta

-forvalues- would have done the trick.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
forvalues y=1/9{
use “data200y'.dta", clear
outfile using "data
200y’.txt”, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

My file names, however, look like the file names you get when you type -sysuse dir-:

auto.dta
autornd.dta
bplong.dta
bpwide.dta
cancer.dta

If there were few of them, it would have been alright to place them in a local macro by listing all the file names.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
local datafiles auto autornd bplong bpwide cancer
foreach file of local datafiles{
use <em>file</em>'.<em>dta</em>, clear
<strong>outfile using </strong>"
file.txt“, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The latter, however, is not efficient since I have to manually type so many file names so that they can be stored in a macro. Here is where Stata’s extended macro functions (-help extended_fcn-) comes to the rescue. Stata has exactly the right function for what I wanted to do.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
local datafiles : dir . files  “*.dta”
foreach file of local datafiles{
local filenew : subinstr local file “.dta” “.txt”
use file', clear
<strong>outfile using </strong>
filenew‘, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

local datafiles : dir . files  “*.dta”— makes a list, named datafiles, of all files with the extension .dta in the current directory .

local filenew : subinstr local file “.dta” “.txt”— replaces the file extensions of the file names from .dta to .txt and makes a new list, named filenew, of the new file names.