_n and _N are Stata system variables—they exist whether you like them or not. They are also referred to as “underscore variables” for the obvious reason that they are written as

*_variable*. Little _n contains the line number of the current observation, while its big brother _N contains the total number of observations in the data.

_n is often used to generate unique codes for each observation:

**gen** *code*=_n /* generates the variable code that contains the integer 1 (for the first obervation, _n==1) to _N (for the last observation, _n==_N) */

Or to refer to neighboring observations (also called subscripting):

**gen** *gdplag*=

*gdp*[_n-1] /* generates the variable

*gdplag*, which is equal to the preceding observation’s gdp */

**gen** *gdpgrowth*=(

*gdp* /

*gdp*[_n-1] – 1)*100 /* generates the variable

*gdpgrowth, *the growth rate for the variable

*gdp* **/*
[Note: The

*gdplag* and

*gdpgrowth* for the first observation (_n==1) will be missing since observation [_n-1] does not exist for _n==1.]

Make sure, however, that you refer to the right neighbor! For example, if you are calculating the growth rate of variable

*gdp* between 1999 and 2000,

*gdp* must be in order such that the

*gdp* subscripted by [_n-1] is the

*gdp* for 1999. This is easily addressed by invoking the -sort- command, “

**sort** *year*,” before generating the growth rate variable. There is another complication, however, when you are calculating this for different groups of observations, say by country. Will “

**sort** *country year*” before generating the variable suffice? No. Why? Because the [_n-1] for the first observation of country B refers to the last observation of country A. Here is where Super -bysort- comes to the rescue:

**bysort** *country year***:** **gen** *gdplag*=

*gdp*[_n-1]

**bysort ***country year***: gen ***gdpgrowth*=(

*gdp* /

*gdp*[_n-1] – 1)*100

** **
// Another syntax for bysort is:

**by** *country *(

*year*),

**sort:** …

_n may also be used to keep the nth observation by group:

**bysort** *householdid*:

**keep** if _n==1 /* keeps the first observation for each

*householdid **/

Big brother _N, on the other hand, may be used to generate a variable that contains the number of observations by group:

**bysort** *householdid*:

**gen** *householdsize*=_N /* generates the variable

*householdsize*, which is equal to the number of observations for each

*householdid*. */

What we have illustrated above are just a few examples to showcase the potential of underscore variables _n and _N. For sure, you will find other uses of _n and _N. Another underscore variable is the beautiful number π, which, as you would’ve guessed, is written as _pi.

Filed under: Basic functions | Tagged: _n, bysort, subscripting, system variables | 1 Comment »