Window stopbox: R U SURE?


I sometimes inadvertently run some do-files. This is usually not a problem, but in some cases I find this irritating. Today, for example, I double-clicked on a wrong do-file that downloads data from UN Comtrade and saves these data as Stata files (see ComtradeTools and Stata). As UN Comtrade data is frequently updated, this could be a problem. Good thing I have only downloaded the data yesterday and that I have not used these for anything yet.

To avoid the same problem in the future, I wrote proceed.ado. This is a simple program that I can call at the beginning of some do-files that need caution before executing. This ado-file  uses -window stopbox- to prompt me with a message box whether to proceed or not with the execution (this idea I got from Eric Booth when he suggested the use of -window stopbox- in Proxy settings in Stata). Next time, when I unwittingly run do-files with -proceed- at the beginning, I have the option not to continue. If I choose not to proceed, Stata terminates and exits. Otherwise, Stata will read and execute the rest of the do-file.

You may ask, “Why not just use the -break- key?” Because in some do-files I first create a blank file to which I append the files I later generate. This means I already lost the file I needed most in the first few lines of the do-file. Now, I think I need to change this practice as well…

The ado-file for -proceed-, which I saved in C:\ado\personal, looks like this:

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
program define proceed
version 11

capture window stopbox rusure “Do you want to proceed?” ///
“This do-file takes hours to run or will overwrite important files.”

if _rc == 0 {
exit
}
else{
exit, STATA
}
end
*- – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – -*

See -help window stopbox-.

Counting occurrence of strings within strings


Somebody asked how to count the number of occurrences of a string within a string. For example, if I have the following data, I want to generate new variables countSS, countSM, and countSG that contains the number of occurrences of “SS”, “SM”, or “SG” in variable awards.

————————————————————————————
clear
input id str40 awards
1    “SS; SS; SM; SG”
2    “SM; SG”
3    “SG; SG; SG; SS”
4    “SS; SS; SG; SG; SS; SM; SG”
end
list
————————————————————————————

Here is one solution using the macro extended function -subinstr- (-help extended_fcn-).

————————————————————————————
local tocount SS SM SG
foreach t of local tocount{
gen countt'</em>=0
<strong>local </strong><em>N</em> = _N
<strong>forvalues </strong>i = 1/
N'{
local a = awards[i']
<strong>local </strong><em>c</em> : subinstr local  a  "
t'” “t'" , all  count(local <em>c2</em>)
<strong>replace </strong>count
t’ = c2' ini’
}
}
————————————————————————————




*Thanks to Jacob Reynolds (jlreynol@nps.edu) for the question. Although, for the best advise on Stata, Statalist is the best place to ask :). See Stuck? Hello Statalist .

ComtradeTools and Stata: Automating UN Comtrade data downloads


ComtradeTools (developed by UN) is a command line program that allows you to obtain data via UN Web Services and to convert the data into CSV format (or to an SQL server). You can use Stata’s -shell- command to run ComtradeTools.

First, read the instructions here. Download the latest version (March 2010) of ComtradeTools here and the required Microsoft .NET Framework (version 1.1.4322.573) here.

The command line parameters are listed in /help which you can copy into a text file. To copy ComtradeTools /help contents to ComtradeTools_Help.txt:

cd “C:\Program Files\UNSD\ComtradeTools\”
shell ComtradeTools /help >>U:\Data\Comtrade\ComtradeTools_Help.txt

/help does not enumerate all possible entries for each parameter, but you can find them in UN Comtrade’s website. Some are listed below:

/r: Reporter Code. To get the list of countries and their corresponding codes:

copy http://comtrade.un.org/ws/refs/getCountryList.aspx U:\Data\Comtrade\Countrylist.xml //You can open this in Excel

/y: Year. 4-digit year.

/px: Classification.
HS2002=H2
HS1996=H1
HS1988=H0
SITC Rev.3=S3
SITC Rev.2=S2
SITC Rev.1=S1
BEC=BE

/cc: Commodity Code. For example, to get commodity codes for HS1996 and SITC Rev. 2:

local classification H1 S2
foreach c of local classification{
copyhttp://comtrade.un.org/ws/refs/getCommodityList.aspx?px=c'" <em>U:\Data\Comtrade\Commoditylist_</em>c’.xml
}

/rg: Trade Flow.
Import=1
Export=2
Re-export=3
Re-import=4

Try to download trade data (SITC Rev. 2) for the Philippines for the year 2007 and load it to Stata.

shell ComtradeTools /r:608 /y:2007 /px:S2 /action:DownloadAndConvertToCSV /outputDirectory:U:\Data\Comtrade\
insheet using U:\Data\Comtrade\S22007608_CSV.txt, comma clear

Note that if you do not specify /outputDirectory, the files will be saved in the current working directory, that is, where the program ComtradeTools is saved.

Now, it is easy to download data for multiple countries and years by using loops. For example, to download trade data (SITC Rev. 2) for the Philippines every 5 years from 1980 to 2005 and save them as dta files:

forvalues y=1980(5)2005{
shell ComtradeTools /r:608 /y:y' <strong>/px:</strong><em>S2</em> <strong>/action:</strong>DownloadAndConvertToCSV <strong>/outputDirectory:</strong><em>U:\Data\Comtrade\</em>
<strong>insheet </strong>using <em>U:\Data\Comtrade\S2</em>
y’608_CSV.txt, comma clear
save U:\Data\Comtrade\S2y'<em>608.dta</em>, replace
<strong>erase </strong><em>U:\Data\Comtrade\S2</em>
y’608_CSV.txt
}

Marek Hlavac’s Hangman


I was going to post a different article today, but changed my mind after I received an e-mail form Marek Hlavac with a link to the game called Hangman, an adaptation of the classic paper and pencil game which he wrote using Stata.

I tried the game… lost my first 2 and won my next 3. It was a good break from work.

To play the game, you need two files: a Stata do-file (hangman.do) and a Stata database of words to be guessed and their categories (hangman_data.dta). You may download these files here, where a Youtube demo is also available (view it in full screen for the texts to be readable).

If there is a way to incorporate sounds in it, it would be more fun.

Holidays are near (actually Christmas season started September here in the Philippines)… take a break… play Hangman. Merry Christmas!

File names in a local macro


Last week I needed to convert a number of Stata data files into text files so that they can be uploaded to Googledocs (why Googledocs is another story).  If my file names have a specific pattern, such as:

data2001.dta
data2002.dta
.
.
.
data2009.dta

-forvalues- would have done the trick.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
forvalues y=1/9{
use “data200y'.dta", clear
outfile using "data
200y’.txt”, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

My file names, however, look like the file names you get when you type -sysuse dir-:

auto.dta
autornd.dta
bplong.dta
bpwide.dta
cancer.dta

If there were few of them, it would have been alright to place them in a local macro by listing all the file names.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
local datafiles auto autornd bplong bpwide cancer
foreach file of local datafiles{
use <em>file</em>'.<em>dta</em>, clear
<strong>outfile using </strong>"
file.txt“, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

The latter, however, is not efficient since I have to manually type so many file names so that they can be stored in a macro. Here is where Stata’s extended macro functions (-help extended_fcn-) comes to the rescue. Stata has exactly the right function for what I wanted to do.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
local datafiles : dir . files  “*.dta”
foreach file of local datafiles{
local filenew : subinstr local file “.dta” “.txt”
use file', clear
<strong>outfile using </strong>
filenew‘, dictionary replace
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

local datafiles : dir . files  “*.dta”— makes a list, named datafiles, of all files with the extension .dta in the current directory .

local filenew : subinstr local file “.dta” “.txt”— replaces the file extensions of the file names from .dta to .txt and makes a new list, named filenew, of the new file names.

Pairwise comparison of means


The -ttest- allows comparison of means between groups; the syntax of which is:

ttest varname [if] [in] , by(groupvar) [options1]

However, this only works if you have at most 2 distinct groups in groupvar. What would you do if you have more than 2 groups and you want to compare the means for each pairwise combination? This problem was presented to me recently. Since I am not aware of a single command that does this, it seems that the solution is to loop between groups.

Below, I used student2.dta (used in Statistics with STATA: Version 12) to illustrate one way of solving the problem. In this example, I want to (1) test whether the mean gpa between students taking different majors are the same, and (2)  save the results I need into a tab-delimited text file. Since there are 7 groups in major — coded as 1,2,…,7 — 21 pairs of means will be tested.




Since comparing the means of gpa for i=1 and j=2 is the same as comparing the means for i=2 and j=1, the -if- command is specified so that these duplicates are excluded. The -makematrix- (Nick Cox) command produces a matrix of the results of the command specified after the “:”. Here, we have specified that it will keep in memory 3 saved results from -ttest-: (1) mean for the 1st group, (2) mean for the 2nd group, and (3) the two-sided p-value. -matrix colnames-, on the other hand, is specified to indicate the names for each column. If this is not indicated, the default column names are the names of the saved results — “r(mu_1),” “r(mu_2),” and “r(p)”. Lastly, -mat2txt- (Michael Blasnik and Ben Jann) writes the matrix into a text file. Note that -mat2txt- needs to be installed. To install, type:

ssc install mat2txt

Now, what if I need to test the means for more than 1 variable, say for both gpa and study?  I can just add another loop for this:


The code above tests for means of gpa and study between each pair of groups in major.

I am sure there is a shorter and better way to do this. Until I find that solution, I will have to bear with what I have come up with.


Note: The code above is “wrapped”. If the line is long, its continuation is indented in the next line. Thanks to Cuong Nguyen for giving me the opportunity to learn something new over the weekend.

-levelsof-


-levelsof- lists the unique values of varname. It is particularly helpful when you loop over a variable with many distinct values by using -foreach-. For example, you want to loop over all countries in the world:

local countryiso AFG    AGO    AIA    ALB    AND    ANT    ARB    ARE    ARG    ARM    ATG    AUS    AUT    AZE    BDI    BEL    BEN    BFA    BGD    BGR    BHR    BHS    BIH    BLR    BLZ    BMU    BOL    BRA    BRB    BRN    BTN    BWA    CAF    CAN    CHE    CHL    CHN    CIV    CMR    COG    COK    COL    COM    CPV    CRI    CSK    CUB    CYM    CYP    CZE    DEU    DJI    DMA    DNK    DOM    DZA    ECU    EGY    ERI    ESP    EST    ETH    FIN    FJI    FRA    FRO    GAB    GBR    GEO    GHA    GIN    GLP    GMB    GNB    GRC    GRD    GRL    GTM    GUF    GUY    HKG    HND    HRV    HTI    HUN    IDN    IND    IRL    IRN    IRQ    ISL    ISR    ITA    JAM    JOR    JPN    KAZ    KEN    KGZ    KHM    KIR    KNA    KOR    KWT    LBN    LBR    LBY    LCA    LKA    LSO    LTU    LUX    LVA    MAC    MAR    MDA    MDG    MDV    MEX    MKD    MLI    MLT    MMR    MNG    MOZ    MRT    MSR    MTQ    MUS    MWI    MYS    MYT    NAM    NCL    NER    NGA    NIC    NIU    NLD    NOR    NPL    NZL    OMN    PAK    PAN    PER    PHL    PNG    POL    PRT    PRY    PSE    PYF    QAT    REU    ROM    RUS    RWA    SAU    SCG    SDN    SEN    SGP    SLB    SLE    SLV    SOM    SPM    SRB    STP    SUR    SVK    SVN    SWE    SWZ    SYC    SYR    TCA    TGO    THA    TJK    TKM    TMP    TON    TTO    TUN    TUR    TUV    TZA    UGA    UKR    URY    USA    UZB    VCT    VEN    VNM    VUT    WSM    YEM    YMD    YUG    ZAF    ZMB    ZWE

foreach iso of local countryiso {

}

This long list of reporter ISO codes (reporteriso) can be avoided by using -levelsof-:

levelsof reporteriso, local(countryiso)
foreach iso of local countryiso{

}

The general syntax for -levelsof- is:

levelsof varname [if] [in] [, options]

Double loops


Can you loop inside a loop? Yes. Actually, you can have loop inside a loop that is inside another loop that is inside another loop… and so on. For example, try the following:

forvalues i=1/4{
__forvalues j=1/4{
____local sum=i'+j’
____ display i'+j'” = ” `sum’
__}
}

[Note: The lines represent indentation.]

In the example above, each element in i is added, one by one, to each element in j. This will return 16 (4×4) results:

1 + 1 = 2
1 + 2 = 3
1 + 3 = 4
1 + 4 = 5
2 + 1 = 3
2 + 2 = 4
2 + 3 = 5
2 + 4 = 6
3 + 1 = 4
3 + 2 = 5
3 + 3 = 6
3 + 4 = 7
4 + 1 = 5
4 + 2 = 6
4 + 3 = 7
4 + 4 = 8

The double loops, such as the example above, is usually used (but not limited to) as subscripts to identify matrix elements in matrix operations. In this context, each ij-pair corresponds to the element of a matrix in row i, column j.

You can also do the following, which will return 256 results:

forvalues i=1/4{
__forvalues j=1/4{
____forvalues k=1/4{
______forvalues l=1/4{
________display i'j’k'l’
______}
____}
__}
}

[Note: The lines represent indentation.]

My -fpref-: anti -fren-


Yesterday, I used -fren- to delete the prefix “mus” in the file names of all data and do-files used in the book “Microeconometrics Using Stata” (MUS). Now, I want them back! -fren- is not helpful. So, here is the little program -fpref- that I wrote (my first ado-file) to add a prefix to file names by batch:

fpref.ado


[Note: I had to paste it as an object to retain indentations. While indentations are purely aesthetics, a program (like a mathematical proof) doesn’t look right without them. Have not found a way yet to keep indentations (and font style) here for free.]

The first line “capture prog drop fpref” drops the program called -fpref-, if it exists. If it does not exist, Stata will not return an error and will continue to read the next line because of the command -capture- (this line can be deleted in the final version of the ado file). In the next line “prog define fpref”, the command is named as -fpref-. By typing “version 10” in the third line, the command that was named -fpref- is compatible with Stata 10 or newer versions of Stata. The next line defines the syntax. The body of the program follows the syntax. In this program, the first part of the body, with the while{} and if{} loops, returns an error if the operating system used is not Windows or the prefix is missing or separated by blank spaces. The second part of the body concatenates prefix to the old file names, which we have listed in a local macro. Finally, a Stata program always ends with an “end”.

I have saved my fpref.ado (and its corresponding help file fpref.sthlp) and changed my directory. I can now type “help fpref”, which will open the -help- window with the following information:

-help fpref-


To bring back the prefix “mus” to my files:

fpref dta, prefix(mus)  /* adds the prefix “mus” to all *.dta files in the current directory */

fpref do, prefix(mus)  /* adds the prefix “mus” to all *.do files in the current directory */

[Note: fileextensions can by anything, e.g., doc, xls, txt; and prefix can be any character or string (without spaces) that is allowed in a file name.]

Now, all my MUS file names are as they were. Happiness…

Variable names and transformers


What are acceptable variable names? How do you use the commands -rename-, -renpfix-, and -renvars- to rename variable names?

A Stata variable name can contain up to 32 characters in any version of Stata (Small, Stata/IC, Stata/SE, or Stata/MP). Not all characters in the keyboard, however, are allowed in naming variables. A variable name may contain only the digits 0 to 9 and upper or lower case English alphabets (A to Z); and the first character cannot be a number. For example,

population
_Income
_1997
income0001
INC

are allowed, but not the following:

pop#
2gdppc
2007
~1997
POP/2001

Renaming variables are made easy by the commands -rename-, -renpfix-, and -renvars-. -renvars- (Weesie and Cox, 2005) is not available in the original package of Stata, but you may download and install it by typing:

net install dm88_1.pkg (see note below)


Do you need all these 3 renaming commands? Not really, but you will see that in some cases one is more convenient to use than the others.

If you only need to rename one variable, use -rename-. The syntax for -rename- is:

rename old_variable_name new_variable_name

For example, to rename the variable _Income to income, type:

rename _Income income

If, instead, you just want to replace the prefix of variables, use -renpfix-. The syntax for -renpfix- is:

renpfix old_prefix [new_prefix]

[Note: If new_prefix is not declared, -renpfix- returns the new variable names without the prefix provided they are allowed as variable names.]

For example, to change the prefix exportshare of the variables

exportshare1970
exportshare1980
exportshare1990
exportshare2000

to s_, type:

renpfix exportshare s_

This transforms the variables above to:

s_1970
s_1980
s_1990
s_2000

To drop the prefix export in the variable names, type:

renpfix export

This transforms the variables to:

share1970
share1980
share1990
share2000

But, you cannot do this:

renpfix exportshare

Why? Because, the first character of variable names cannot be numbers.

-renvars-, if installed, is the most flexible. It allows you to rename more than one variable at once, it can change the case of the variable names, and more. The syntax for -renvars- are:

renvars varlist \ new_varlist
renvars varlist, options

Examples:

renvars population income \ pop inc
/* renames population to pop and income to inc */

renvars population income, upper
/* renames population to POPULATION and income to INCOME */

renvars population income, trim(3)
/*renames population to pop and income to inc */

renvars population income, prefix(x_)
/renames population to x_population and income to x_income /

For more -renvars- options, type:

help renvars

Note: Sometimes “net install dm88_1” returns the error:

file http://www.stata.com/dm88_1.pkg not found
server says file temporarily redirected to http://www.stata.com/error/404.html
could not load dm88_1.pkg from http://www.stata.com/
r(601);

You may also install -renvars- by typing “search renvars”, then click on “dm88_1” and then click “(click here to install)”.