Suppose you have a complex survey data and you want to generate estimates for a specific subgroup, say females (coded as

*female*==1). The -if- qualifier seems like the obvious choice to exclude the male population (

*female*==0):

**svy: tab ***agegroup *if

*female*==1, ci

Unfortunately, this is not correct. The correct way of generating estimates for subpopulations is to use -svy-‘s subpop() option. The difference lies in how Stata treats the excluded category in calculating the standard errors. By using subpop(), the excluded cases (in our example, “male”) are still included in the calculation of the standard errors, which should be the case. Thus:

**svy**, subpop(

*female*)

**: tab** *age*, ci

For the math of all of these, see Stata’s

*Survey Data Reference Manual: subpopulation estimation* (pp. 53-58, Stata 11 documentation). I also find section 4 of Jeff Pitblado’s

Survey Data Analysis in Stata (2009) helpful.

Filed under: Basic functions, Econometrics / Statistics, Survey | Tagged: if, subpop, svy | Leave a comment »