-tabout- and -svy-

Yesterday, I was trying to create tables from a survey dataset. With the number of variables (and the possibility that I will repeat the same process many more times), doing it by hand (i.e., copy-pasting results from -tab- to Excel), is not an option. For this task, I turned to Ian Watson’s -tabout-. This is probably the best Stata code that creates very neat tables and exports them into text files (that  spreadsheets, such as Excel, can read).  Since I am using a complex survey dataset, I checked if -svy- allows -tabout-, i.e, if I can write something like: svy, subpop(var): tabout vars…

This is not possible. But -tabout- has the svy option that makes use of the  survey design variables specified in -svyset-. First problem solved.

My second problem was how to generate estimates for subsamples using -tabout-. In Use subpop() to generate subsample estimates using a survey data, we said that using the subpop() -svy- option, not the -if- qualifier, provides the correct standard errors. But -tabout- does not have a subpop option, only the -if- and -in- qualifiers. Fortunately, when using the svy option in -tabout-, the -if- and -in- qualifiers works the same as the subpop option (see note below). Second problem solved.

To install -tabout-, type: ssc install tabout

The -tabout- command — how to use, problems/erors in using, etc. — is well discussed in Statalist. The best way to start learning about -tabout- is by reading Publications quality tables in Stata: a tutorial for the tabout program (Watson 2007).

Note: Thanks to Ian Watson for pointing out footnote #3 (which I have missed) in Publications quality tables in Stata: a tutorial for the tabout program, page 3.

Use subpop() to generate subsample estimates using a survey data

Suppose you have a complex survey data and you want to generate estimates for a specific subgroup, say females (coded as female==1). The -if- qualifier seems like the obvious choice to exclude the male population (female==0):

svy: tab agegroup if female==1, ci

Unfortunately, this is not correct. The correct way of generating estimates for subpopulations is to use -svy-‘s subpop() option. The difference lies in how Stata treats the excluded category in calculating the standard errors. By using subpop(), the excluded cases (in our example, “male”) are still included in the calculation of the standard errors, which should be the case. Thus:

svy, subpop(female): tab age, ci

For the math of all of these, see Stata’s Survey Data Reference Manual: subpopulation estimation (pp. 53-58, Stata 11 documentation). I also find section 4 of Jeff Pitblado’s Survey Data Analysis in Stata (2009) helpful.