The adventures of tin()


Using the if qualifer with time-series data is tricky. Until you meet tin(). Let us use quarterly German macro data, lutkepohl2, from Stata website to illustrate.
Continue reading

Ways to count the number of unique values in a variable


There are at least 3 convenient ways to count the number of distinct values contained in a variable: -tab-, -inspect-, and -codebook-.

tab varname, nofreq
display r(r)

The option nofreq supresses the reporting of the frequency table. Besides displaying output in the results window, Stata stores the results of some commands so that you can use them in subsequent commands. Results of r-class commands, such as -tab-, are stored in r(). In the expample above, display r(r) returns the number of rows in the table, that is, the number of unique observations for variable varname. The problem with using -tab- to count the unique number of values is its row limits: 12,000 rows (Stata/MP and Stata/SE), 3,000 rows (Stata/IC), or 500 rows (Small Stata).

inspect varlist
display r(N_unique)

Besides reporting the number of unique values, -inspect- also reports: the number of negative, zero,  positive, and missing values. It also draws a histogram. There is no need for r(N_unique) if the number of unique values is less than or equal to 99 as -inspect- reports the actual number. But if the number of unique values is more than 99, it will return “More than 99 unique values”. In this case, you need to type the second line.

codebook varlist

-codebook- also provide other summaries besides unique values: type of variable (numeric, etc), the range of values, mean, standard deviation, missing values, and some percentiles.

Note: If varlist is not specified in -inspect- and -codebook-, the commands will return the reports for all variables.