Clever way to dummy

Relational operators (>, <. >=, <=, ==, !=) evaluate to 1 if the expression is true and 0 if false. Given this definition, a dummy variable can be created using, for example:

gen newvar = (oldvar <= somethreshold) if !missing(oldvar)

Instead of the longer alternative:

gen newvar = 1 if oldvar <= somethreshold
replace newvar = 0 if oldvar >  somethreshold if !missing(oldvar)

Why bother with the if-not-missing statement?  If this statement is excluded, i.e.,

gen newvar = (oldvar <= somethreshold)

and oldvar contains observations with missing values, newvar will take the value 0 (because a missing value is treated as a very large number) where oldvar is missing, which may not be your intention. Stay safe!

2 Responses

  1. Another way to do it

    gen newvar = cond(missing(oldvar), ., (oldvar <= somethreshold))

    This makes explicit that missings map to missings.

  2. That’s fine, and a good way for really competent people to do things more efficiently. But I often share code with neophytes or near-neophytes, so the simpler and more obvious I can make it, the better. So I’ll stick to the two-line approach. For that matter, to keep them in good habits, I generally would use three lines, the first one being “gen newvar=.” or “gen newvar=.a” then using replace commands. Easier on newbies.

Leave a Reply