Pairwise comparison of means

The -ttest- allows comparison of means between groups; the syntax of which is:

ttest varname [if] [in] , by(groupvar) [options1]

However, this only works if you have at most 2 distinct groups in groupvar. What would you do if you have more than 2 groups and you want to compare the means for each pairwise combination? This problem was presented to me recently. Since I am not aware of a single command that does this, it seems that the solution is to loop between groups.

Below, I used student2.dta (used in Statistics with STATA: Version 12) to illustrate one way of solving the problem. In this example, I want to (1) test whether the mean gpa between students taking different majors are the same, and (2)  save the results I need into a tab-delimited text file. Since there are 7 groups in major — coded as 1,2,…,7 — 21 pairs of means will be tested.

Since comparing the means of gpa for i=1 and j=2 is the same as comparing the means for i=2 and j=1, the -if- command is specified so that these duplicates are excluded. The -makematrix- (Nick Cox) command produces a matrix of the results of the command specified after the “:”. Here, we have specified that it will keep in memory 3 saved results from -ttest-: (1) mean for the 1st group, (2) mean for the 2nd group, and (3) the two-sided p-value. -matrix colnames-, on the other hand, is specified to indicate the names for each column. If this is not indicated, the default column names are the names of the saved results — “r(mu_1),” “r(mu_2),” and “r(p)”. Lastly, -mat2txt- (Michael Blasnik and Ben Jann) writes the matrix into a text file. Note that -mat2txt- needs to be installed. To install, type:

ssc install mat2txt

Now, what if I need to test the means for more than 1 variable, say for both gpa and study?  I can just add another loop for this:

The code above tests for means of gpa and study between each pair of groups in major.

I am sure there is a shorter and better way to do this. Until I find that solution, I will have to bear with what I have come up with.

Note: The code above is “wrapped”. If the line is long, its continuation is indented in the next line. Thanks to Cuong Nguyen for giving me the opportunity to learn something new over the weekend.

Generating scalars for coefficients or standard errors after regression

Besides displaying output in the results window, Stata stores results that you can use as inputs to subsequent commands. We have shown examples of using saved results in Writing Greek letters and other symbols in graphs and Ways to count the number of unique values in a variable where we used results stored in r(). In this post, we will use estimation results saved in e() after -regress- to generate a scalar (or a local macro) for coefficients and standard errors. (See note below)

sysuse auto /* opens example data auto.dta */

reg price mpg /* estimates the equation price = b0 + b1*mpg + e ; to display all saved results after -regress-, type “ereturn list” */

matrix b=e(b)
matrix V=e(V)
/* defines matrix b equal to the row vector of estimated coefficients, e(b); and  matrix V equal to the variance-covariance matrix, e(V). */

matrix list b // or matrix list e(b)
matrix list V // or matri list e(V)
/* displays b and V */

scalar c_mpg=b[1,1]
scalar se_mpg=sqrt(V[1,1])
/* defines scalar c_mpg equal to element (1,1)  of vector b; and defines scalar se_mpg equal to the square root of element (1,1) of matrix V */

scalar list /* displays c_mpg and se_mpg */

Alternatively, you may define c_mpg and se_mpg as local macros instead of scalars:

local c_mpg=b[1,1]
local se_mpg=sqrt(V[1,1])
display “c_mpg= “c_mpg' " ; se_mpg= "se_mpg’

Note: Where Stata saves the results depends on the type of command executed. Stata commands can be classified into 5 classes—r-, e-, s-, n-, and c-class commands:

r-class: general commands that do not require parameter estimation (example: -summarize-); results are stored in r()
e-class: parameter estimation commands (example: -regress-); results are stored in e()
s-class: programming commands that assist in parsing; results are stored in s()
n-class: commands that do not save other results except those that are explicitly generated (example: -generate-); no results stored
c-lass: stores system parameters and some constants (example: c(pi) returns the value of pi); values are stored in c() (try typing “creturn list”)

Loading and storing data to and from Stata and Mata

mkmat varlist, matrix(M)    /creates Stata matrix M/
M=st_matrix(“M”)        /loads matrix M to Mata/

st_matrix(“M”, M)
svmat M    /loads matrix M back to Stata/


M=st_data(.,.)   /reads current Stata data as matrix M in Mata/

st_store(.,1..cols(M), M)    /loads matrix M back to Stata/