Ways to count the number of unique values in a variable


There are at least 3 convenient ways to count the number of distinct values contained in a variable: -tab-, -inspect-, and -codebook-.

tab varname, nofreq
display r(r)

The option nofreq supresses the reporting of the frequency table. Besides displaying output in the results window, Stata stores the results of some commands so that you can use them in subsequent commands. Results of r-class commands, such as -tab-, are stored in r(). In the expample above, display r(r) returns the number of rows in the table, that is, the number of unique observations for variable varname. The problem with using -tab- to count the unique number of values is its row limits: 12,000 rows (Stata/MP and Stata/SE), 3,000 rows (Stata/IC), or 500 rows (Small Stata).

inspect varlist
display r(N_unique)

Besides reporting the number of unique values, -inspect- also reports: the number of negative, zero,  positive, and missing values. It also draws a histogram. There is no need for r(N_unique) if the number of unique values is less than or equal to 99 as -inspect- reports the actual number. But if the number of unique values is more than 99, it will return “More than 99 unique values”. In this case, you need to type the second line.

codebook varlist

-codebook- also provide other summaries besides unique values: type of variable (numeric, etc), the range of values, mean, standard deviation, missing values, and some percentiles.

Note: If varlist is not specified in -inspect- and -codebook-, the commands will return the reports for all variables.

12 Responses

  1. i used unique
    type – ssc install unique
    then run – unique (var name)

  2. How can I create a new variable that equals the frequencies of each value within a given variable?

    Example: PhysicianID “001” appears 3 times in my database of 100 patients. This means that physician 001 saw 3 patients in total. For each patient that was seen by physician 001, I want to create a variable that will say: “3”, because 001 is repeated 3 times throughout the database.

    That way, I can identify that patient x was seen by a physician who saw 3 patients in total.

  3. Thank you! I create a state time-varying fixed effect for 51 states and 17 years, resulting in 867 unique combinations, and I needed to check whether or not gen state*year had in fact created that many dummy variables so I could use them as fixed effects in a

    reg ________________ i.stateyear

    regression :)

  4. […] 参考资源:http://statadaily.ikonomiya.com/2010/10/01/ways-to-count-the-number-of-unique-values-in-a-variable/ […]

  5. How can I count the number of occurrences of a particular value (or combo of values) within a string variable?
    Example: I want to gen a variable “awards_SS” that counts the occurrence of “SS” for each unique ID

    ID Awards awards_SS
    1 SS; SS; SM; SG 2
    2 SM; SG 0
    3 SG; SG; SG; SS 1

    • Hi. Try this:
      local tocount SS SM SG
      foreach t of local tocount{
      gen count`t’=0
      local N = _N
      forvalues i = 1/`N'{
      local a = awards[`i’]
      local c : subinstr local a “`t'” “`t'” , all count(local c2)
      replace count`t’ = `c2′ in `i’
      }
      }

  6. Thanks for this tip. -distinct- is indeed very neat and very straight forward!

  7. There is a fairly detailed discussion of this question in

    SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
    (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
    Q4/08 SJ 8(4):557–568
    shows how to answer questions about distinct observations
    from first principles; provides a convenience command

  8. there is also a command called “duplicates report” that does this…

    btw… luv the helmet diving! magkano yun?

    • I think duplicates report will not give you the number of unique values if there are duplicates… but it can be computed by adding the differences between the number of observations and the surplus. Now, there are at least 4 ways… thanks.

      Gabby, the helmet dive was P500/pax (off-peak price) :)

Leave a Reply