break me


Sometimes we want to break a continuous variable into a smaller set of categories—into evenly spaced or equally sized groups, or groups based on limits we specify, or groups based on another variable or a set of variables.

Let us take for example the variable price of cars in auto.dta.
sysuse auto.dta, clear    // open a Stata built-in data
summ price

Continue reading

The operator + and the egen function rsum


What’s the difference between using the arithmetic operator + and the -egen- function rsum (equivalent to rowtotal)? Both return the row sum of the variables but treat missing values differently. For example, suppose we have the following data:



gen v3 = v1 + v2
The arithmetic operator + returns a missing value if one of the values is missing.

egen v4 = rsum(v1 v2)
The -egen- function rsum, on the other hand, treats missing values as zeros.

But, if the missing option is specified, rsum returns a missing value if all values in the varlist are missing.

egen v5 = rsum(v1 v2), missing