Use subpop() to generate subsample estimates using a survey data


Suppose you have a complex survey data and you want to generate estimates for a specific subgroup, say females (coded as female==1). The -if- qualifier seems like the obvious choice to exclude the male population (female==0):

svy: tab agegroup if female==1, ci

Unfortunately, this is not correct. The correct way of generating estimates for subpopulations is to use -svy-‘s subpop() option. The difference lies in how Stata treats the excluded category in calculating the standard errors. By using subpop(), the excluded cases (in our example, “male”) are still included in the calculation of the standard errors, which should be the case. Thus:

svy, subpop(female): tab age, ci

For the math of all of these, see Stata’s Survey Data Reference Manual: subpopulation estimation (pp. 53-58, Stata 11 documentation). I also find section 4 of Jeff Pitblado’s Survey Data Analysis in Stata (2009) helpful.

Leave a Reply