Pairwise comparison of means

The -ttest- allows comparison of means between groups; the syntax of which is:

ttest varname [if] [in] , by(groupvar) [options1]

However, this only works if you have at most 2 distinct groups in groupvar. What would you do if you have more than 2 groups and you want to compare the means for each pairwise combination? This problem was presented to me recently. Since I am not aware of a single command that does this, it seems that the solution is to loop between groups.

Below, I used student2.dta (used in Statistics with STATA: Version 12) to illustrate one way of solving the problem. In this example, I want to (1) test whether the mean gpa between students taking different majors are the same, and (2)  save the results I need into a tab-delimited text file. Since there are 7 groups in major — coded as 1,2,…,7 — 21 pairs of means will be tested.

Since comparing the means of gpa for i=1 and j=2 is the same as comparing the means for i=2 and j=1, the -if- command is specified so that these duplicates are excluded. The -makematrix- (Nick Cox) command produces a matrix of the results of the command specified after the “:”. Here, we have specified that it will keep in memory 3 saved results from -ttest-: (1) mean for the 1st group, (2) mean for the 2nd group, and (3) the two-sided p-value. -matrix colnames-, on the other hand, is specified to indicate the names for each column. If this is not indicated, the default column names are the names of the saved results — “r(mu_1),” “r(mu_2),” and “r(p)”. Lastly, -mat2txt- (Michael Blasnik and Ben Jann) writes the matrix into a text file. Note that -mat2txt- needs to be installed. To install, type:

ssc install mat2txt

Now, what if I need to test the means for more than 1 variable, say for both gpa and study?  I can just add another loop for this:

The code above tests for means of gpa and study between each pair of groups in major.

I am sure there is a shorter and better way to do this. Until I find that solution, I will have to bear with what I have come up with.

Note: The code above is “wrapped”. If the line is long, its continuation is indented in the next line. Thanks to Cuong Nguyen for giving me the opportunity to learn something new over the weekend.

5 Responses

  1. As I said, this is a contentious matter. Some think the problem lies with an unduly rigid interpretation of P-values, and so long as the researcher waves a flag and expresses due caution, no great harm will be done. Some think that no significance test is legitimate unless it corresponds to a hypothesis laid down on substantive grounds in advance. It is difficult (for me) to see how you could learn something totally new just by looking at data if that’s your stance.

    Many researchers do far more analyses than they ever report, and the temptation is usually to report the stronger relationships and the stronger differences, so the problem, if there is one, is even more general. In fields in which nothing is publishable unless P < .05 or .01 there is a bias whereby a fraction of studies inevitably end up in the file drawer. The bias is thus against confirmations of null hypotheses.

    The bottom line is that there are big differences between

    (a) this is my only hypothesis and the P-value is whatever

    (b) I tested lots of hypotheses and these are the most interesting results

    (c) I tested lots of hypotheses but I am not even going to hint about the ones I am not going to discuss.

    You won't get much consensus about whether (b) is a good way to do statistical science. (c) as stated is an example of hypocrisy or deceit but in practice it is very difficult to distinguish from (b).

  2. Thanks. This is really informative. I just found out that this “multiple comparison problem” is a common pitfall. It could lead to the false conclusion of rejecting H0, and the probability of getting significant results by chance increases with the number of groups!

    Does this mean that pairwise comparison of multiple groups must be avoided? In such case when you need to compare multiple groups, what would you suggest?

  3. Suppose you fire up k (k – 1) / 2 simultaneous significance tests for the differences between k distinct variables. This is like firing a shotgun: you are more likely to hit a target (say P < 0.05) with so many tests than with one. With k = 10, a modest number, that is 45 tests.

    After analysis of variance there is often a more or less elaborate machinery to adjust P-values accordingly (and indeed a contentious literature over several decades over whether and how to do that adjustment). But the same problem would arise here for many statistical people.

    Otherwise put, you could ask why this isn't a lot easier in Stata and one answer is that it's a dubious thing to do!

  4. Author mixed up corrected. Thanks (again) :) May I ask why there is a concern about the interpretation of P-values in this case?

  5. -mat2txt- (SSC) is by Michael Blasnik and Ben Jann.

    -makematrix- (SSC) is by Nick Cox.

    A bigger issue here is that of multiple comparisons. It is difficult to sustain the usual interpretation of P-values when numerous tests are being conducted simultaneously.

Leave a Reply