Putting observations in order


Is it necessary to put observations in a certain order? In a number of cases, yes. The most obvious case is when you are using the qualifier -in- to specify a subset in your data. For example,

drop in 1/100               /* Drops the observations from line 1 to line 100 /
keep in 30/l                   /
Keeps the observations from line 30 to the last line, denoted by small letter l */

If the observations were in arbitrary order, then you wouldn’t know which ones were dropped or kept, would you? This is when -sort- and -gsort- come in handy. These two put the observations in a certain order. The -sort- command put the observations in ascending order based on a specific variable or a set of variables. The basic syntax for -sort- is:

sort varlist

If varlist is only one variable, then Stata will sort the observations in ascending order based on that variable. If there are 2 variables, var1 and var2, after sort, Stata will sort the observations according to var1 first. Then, for observations with common var1, Stata will sort them according to var2. If there are more than 2 variables, then the observations will be sorted by the first variable first, then the second variable second, and so on. -gsort-, on the other hand, can sort the observations in either ascending or descending order. The basic syntax for -gsort- is:

gsort [+ or -] varname [+ or -] varname [+ or -] varname

A plus sign (+) before the varname instructs Stata to order the observations in ascending order, while a minus sign (-) implies descending order of observations. For example, to sort the countries by their geographical region (regn) in alphabetical order and by GDP per capita (gdppc), from highest to lowest:

gsort + regngdppc

The -by varlist:- prefix also requires the observations to be sorted according to the varlist. But, as we have discussed in “_n, its big brother _N, and Super -bysort-,” this can be conveniently written as:

bysort varlist:

or

by varlist, sort:

Leave a Reply