And we’re rolling, rolling; rolling on the river*

I just learned about -rolling- today. Thanks to a friend for asking about moving averages and standard deviations yesterday. The problem was how to generate a new variable that contains the average and standard deviation of the previous 10 period. For example, the generated data for 1961 would be the average and the standard deviation for the period 1951 to 1960. I knew -tssmooth ma- can be used for moving averages, but I was not aware of a similar command for standard deviations so I did the following exercise for the moving standard deviation yesterday:

/* Create hypothetical data */
set obs 50
gen year = 1951 if _n==1
replace year = year[_n-1] + 1 if _n!=1
set seed 528
gen data1 = runiform()
set seed 285
gen data2 = runiform()

/* Calculate moving standard deviation /
sort year
foreach d of varlist data
qui gen sdd' = .
local N = _N
local i = 1
local j = 10
forvalues k=11/
qui sum d' ini’/j'
qui replace sd
d’ = r(sd) if _n==k'
local i =
i’ + 1
local j = `j’ + 1

I should have googled first. If I had, I should have found Nick Cox’s reply to the Statalist post “calculating moving standard deviation” by Ravi Yatawara where he suggested -rolling-. By reshaping the data into panel format and applying -xtset-, I can now use -rolling-.

/* Create the same hypothetical data as above /
Organize the data and apply xtset */
reshape long data, i(year) j(group 1 2)
xtset group year

/* Calculate moving standard deviation */
rolling sd=r(sd), window(10) keep(group) clear: sum data
gen year = end + 1
keep group year sd

It is also possible to generate more than one statistics. For example, if I also want to calculate the moving average, I can write:

rolling sd=r(sd) mean=r(mean), window(10) keep(group) clear: sum data

See -help rolling- for more options.

The most important advantage of -rolling- (aside from its simplicity), I think, is that you do not have to worry about the order of your data because -xtset- or -tsset- already took charge of that. Note that by using -in- in my unnecessary code, I have to make sure that the data is sorted by year, otherwise I will be getting the standard deviations for the wrong time periods.

Lesson: Google first!

*”Proud Mary” [not “Rolling” as I used to think] by Tina Turner.

15 Responses

  1. Thanks a lot dear Mitch Abdon for the code. I was struggling with this for the last so many days. thanks again

  2. […] And we’re rolling, rolling; rolling on the river* […]

  3. Hi! I followed your instruction but I got invalid syntax in Stata and did’t know why and I need your help. I calculate zscore=(roa+capital ratio)/standard deviation(roa), but this sd(roa) is for 3 years. My data set is from 1984 to 2012. Sd(roa) will be started for calculation from 1986 as sd(roa) in 1986 is standard deviation in 3 years of 1984,1985 and 1986.

    foreach d of roa{
    2. qui gen sd`d’ = .
    3. local N=_N
    4. local j=1
    5. local j=3
    6. for k=3/’N'{
    7. qui sum ‘d’ in ‘i’/’j’
    8. qui replace sd’d’=r(sd) if _n==’k’
    9. local i=i+1
    10. local j=j+1
    11. bysort fdiccert: egen zscore=( roa+ cap)/ sd’d’
    12. drop if k=1/2}}

    Thank you very much!

  4. Update: If speed is important when you have a very large dataset, then mvsumm is not very efficient. See for a note on how to calculate a moving mean or stdev he fastest way (in my case: 5 seconds versus 5+ hours!)

  5. Rolling is not the most optimal way of generating a moving average of standard deviation because (i) it is a “destructive” function that changes your dataset, (ii) it does not handle missing values correctly (e.g. it will calculate the mean of [10;”.”;15] as 8.33 whereas it should be “.” in my opinion) and (iii) is not very efficient (takes a long time to execute on long panels). I suggest you use mvsumm and mvcorr (use findit to install them). An example:
    xtset id year // set panel
    tsfill, full // required for mvsumm
    mvsumm VAR, stat(mean) win(3) gen(meanVAR) end // Will generate a 3-year moving average of the variable VAR and will place this value at the final year of the window considered
    mvcorr VAR1 VAR2, win(5) gen(corVARS) end // Will generate a 5-year moving correlation between the two variables VAR1 and VAR2 and will place this value at the final year of the window considered

  6. hi, i have a months experience with Stata. I’ve been running mainly logisitc and probit models. Now i am at the point of making out-of-sample estimation and goodness-of-fit (mainly MSE). I’ve run rolling recursively [adding one observation at a time and producing an estimate each time] and have been successful generating the coefficient parameters. The last step is to generate predicted values and residuals. Is there a way to this by simple commands other than creating a program?

  7. is there a way to use -rolling- without dropping all the data?

    Reason i’m asking is that i’m trying to calculate rolling standard devitiations of a variable in a large unbalanced panel, and would rather avoid having to re-merge the results back into the main database…

  8. How would you use this command for panel data? I have 200 countries and data from 1960-2009 and I need to calculate a 5-period moving std. dev. for this data. It works well for a single country but when I try to write a do-file for multiple countries it doesnt work

  9. Proud Mary is by John Fogerty (Creedence Clearwater Revival)…

Leave a Reply