There are at least 3 convenient ways to count the number of distinct values contained in a variable: -tab-, -inspect-, and -codebook-.

**tab ***varname*, nofreq

**display **r(r)

The option nofreq supresses the reporting of the frequency table. Besides displaying output in the results window, Stata stores the results of some commands so that you can use them in subsequent commands. Results of r-class commands, such as -tab-, are stored in r(). In the expample above,

**display **r(r) returns the number of rows in the table, that is, the number of unique observations for variable varname. The problem with using -tab- to count the unique number of values is its row limits: 12,000 rows (Stata/MP and Stata/SE), 3,000 rows (Stata/IC), or 500 rows (Small Stata).

**inspect ***varlist *
**display **r(N_unique)

Besides reporting the number of unique values, -inspect- also reports: the number of negative, zero, positive, and missing values. It also draws a histogram. There is no need for r(N_unique) if the number of unique values is less than or equal to 99 as -inspect- reports the actual number. But if the number of unique values is more than 99, it will return “More than 99 unique values”. In this case, you need to type the second line.

**codebook ***varlist*
-codebook- also provide other summaries besides unique values: type of variable (numeric, etc), the range of values, mean, standard deviation, missing values, and some percentiles.

Note: If

*varlist *is not specified in -inspect- and -codebook-, the commands will return the reports for all variables.

Filed under: Basic functions, Data Management | Tagged: codebook, display, inspect, tab | 12 Comments »