Is it necessary to put observations in a certain order? In a number of cases, yes. The most obvious case is when you are using the qualifier -in- to specify a subset in your data. For example,

**drop** in

*1/100* /* Drops the observations from line 1 to line 100

*/*

**keep** in *30/l* / Keeps the observations from line 30 to the last line, denoted by small letter l */

If the observations were in arbitrary order, then you wouldn’t know which ones were dropped or kept, would you? This is when -sort- and -gsort- come in handy. These two put the observations in a certain order. The -sort- command put the observations in ascending order based on a specific variable or a set of variables. The basic syntax for -sort- is:

**sort** *varlist*
If varlist is only one variable, then Stata will sort the observations in ascending order based on that variable. If there are 2 variables,

*var1* and

*var2*, after sort, Stata will sort the observations according to

*var1* first. Then, for observations with common

*var1*, Stata will sort them according to

*var2*. If there are more than 2 variables, then the observations will be sorted by the first variable first, then the second variable second, and so on. -gsort-, on the other hand, can sort the observations in either ascending or descending order. The basic syntax for -gsort- is:

**gsort** [+ or -]

*varname* [+ or -]

*varname* [+ or -]

*varname* …

A plus sign (+) before the varname instructs Stata to order the observations in ascending order, while a minus sign (-) implies descending order of observations. For example, to sort the countries by their geographical region (

*regn*) in alphabetical order and by GDP per capita (

*gdppc*), from highest to lowest:

**gsort** +

*regn* –

*gdppc*
The

* *-by varlist:- prefix also requires the observations to be sorted according to the varlist. But, as we have discussed in “

_n, its big brother _N, and Super -bysort-,” this can be conveniently written as:

**bysort** *varlist***:** …

or

**by** *varlist*,

**sort: **…

Filed under: Data Management | Tagged: by, bysort, gsort, sort | Leave a comment »