*“There are only 10 types of people in the world —*

*those who understand binary, and those who don’t.”*It is almost always the case that dummy variables are defined using the 2 digits of the binary system: 0 and 1. To illustrate how to create dummies, we will use the data

*auto.dta*available in Stata’s website.

**webuse**

*auto*

/* -webuse- loads dataset from Stata web site. Type “help webuse” for more details. */

**gen**

*fuelecon*=(mpg<=20)

/* define

*fuelecon*=1 if the condition

*mpg<=*20 miles holds and

*fuelecon*=0 if it does not */

This is equivalent to:

**gen**

*fuelecon*=1 if

*mpg*<=20 /* define

*fuelecon*=1 if

*mpg*<=20 miles

*/*

and

**replace***fuelecon*=0 if*mpg*>20 /*fuelecon*=0 if

*mpg*>20 miles */

CAUTION: Missing values. Stata treats missing values as very large numbers. See example below.

**tab**

*rep78*, m

## rep78 | Freq. Percent Cum.

1 | 2 2.70 2.70

2 | 8 10.81 13.51

3 | 30 40.54 54.05

4 | 18 24.32 78.38

5 | 11 14.86 93.24

## . | 5 6.76 100.00

Total | 74 100.00

**gen**repmorethan4=(rep78>4)

**tab**

*repmorethan4*, m

## rep~4 | Freq. Percent Cum.

0 | 58 78.38 78.38

## 1 | 16 21.62 100.00

Total | 74 100.00

We have just instructed Stata to code the cars with missing values as if they have been repaired more than 4 times. Not cool. The solution is to add the missing values as condition or use the -if- qualifier:

**replace**

*repmorethan4*=(

*rep78*>4 &

*rep78*~=.)

OR

**replace**

*repmorethan4*=rep78>4 if

*rep78*~=.

OR

**replace**

*repmorethan4*=

*rep78*>4 if

*rep78*<.

The missing values will be coded as 0 instead of 1. But, note that this is correct if we know that a missing value represents that the car is not repaired. Unfortunately, without prior information, a missing value could also mean that the information is indeed missing. ALWAYS KNOW WHAT MISSING VALUES MEAN AND KNOW WERE THEY GO.

If we have many categories, it is easier to use the -tab- command and gen() option. For example, if we want to create dummy variables for each of the 5 values of

*rep78*, we type:

**tab**

*rep78*, gen(

*rep78_*)

This command will create 5 dummy variables:

*rep78_1*,

*rep78_2*,

*rep78_3*,

*rep78_4*, and

*rep78_5*.

*rep78_1*is 1 if

*rep78*==1 and 0, otherwise …

*rep78_5*is 1 if

*rep78*==5 and 0, otherwise. No variable was created for the missing values; but if you want to create a variable for the missing values, just specify the missing option for -tab-:

**tab**

*rep78*, gen(

*rep78_*) m

Filed under: Basic functions Tagged: | dummy, generate, tab

## Leave a Reply