I just learned about -rolling- today. Thanks to a friend for asking about moving averages and standard deviations yesterday. The problem was how to generate a new variable that contains the average and standard deviation of the previous 10 period. For example, the generated data for 1961 would be the average and the standard deviation for the period 1951 to 1960. I knew -tssmooth ma- can be used for moving averages, but I was not aware of a similar command for standard deviations so I did the following exercise for the moving standard deviation yesterday:

/* Create hypothetical data */

clear

set obs 50

gen year = 1951 if _n==1

replace year = year[_n-1] + 1 if _n!=1

set seed 528

gen data1 = runiform()

set seed 285

gen data2 = runiform()

/* Calculate moving standard deviation

*/*

sort year

foreach d of varlist data{

sort year

foreach d of varlist data

qui gen sd

`d' = .`

local N = _N

local i = 1

local j = 10

forvalues k=11/

N'{qui sum

`d' in`

i’/`j'`

qui replace sd

d’ = r(sd) if _n==`k'`

local i =

i’ + 1local j = `j’ + 1

}

}

I should have googled first. If I had, I should have found Nick Cox’s reply to the Statalist post “calculating moving standard deviation” by Ravi Yatawara where he suggested -rolling-. By reshaping the data into panel format and applying -xtset-, I can now use -rolling-.

/* Create the same hypothetical data as above

*/*

/Organize the data and apply xtset */

/

reshape long data, i(year) j(group 1 2)

xtset group year

/* Calculate moving standard deviation */

rolling sd=r(sd), window(10) keep(group) clear: sum data

gen year = end + 1

keep group year sd

It is also possible to generate more than one statistics. For example, if I also want to calculate the moving average, I can write:

rolling sd=r(sd) mean=r(mean), window(10) keep(group) clear: sum data

See -help rolling- for more options.

The most important advantage of -rolling- (aside from its simplicity), I think, is that you do not have to worry about the order of your data because -xtset- or -tsset- already took charge of that. Note that by using -in- in my unnecessary code, I have to make sure that the data is sorted by year, otherwise I will be getting the standard deviations for the wrong time periods.

Lesson: Google first!

*”Proud Mary” [not “Rolling” as I used to think] by Tina Turner.

Filed under: Basic functions Tagged: | rolling, standard deviation, tssmooth ma

Hassan Abbas, on 21 April 2014 at 8:07 PM said:Thanks a lot dear Mitch Abdon for the code. I was struggling with this for the last so many days. thanks again

Rolling standard deviations and missing observations | Stata Daily, on 31 March 2014 at 1:01 AM said:[…] And we’re rolling, rolling; rolling on the river* […]

Kim, on 24 November 2013 at 5:24 AM said:Hi! I followed your instruction but I got invalid syntax in Stata and did’t know why and I need your help. I calculate zscore=(roa+capital ratio)/standard deviation(roa), but this sd(roa) is for 3 years. My data set is from 1984 to 2012. Sd(roa) will be started for calculation from 1986 as sd(roa) in 1986 is standard deviation in 3 years of 1984,1985 and 1986.

foreach d of roa{

2. qui gen sd`d’ = .

3. local N=_N

4. local j=1

5. local j=3

6. for k=3/’N'{

7. qui sum ‘d’ in ‘i’/’j’

8. qui replace sd’d’=r(sd) if _n==’k’

9. local i=i+1

10. local j=j+1

11. bysort fdiccert: egen zscore=( roa+ cap)/ sd’d’

12. drop if k=1/2}}

Thank you very much!

Mitch Abdon, on 27 November 2013 at 5:00 PM said:hi kim,

you may use -rolling- first to calculate for the standard deviations before generating your zscore variable.

hth

mitch

Yann de Mey, on 19 October 2012 at 4:50 PM said:Update: If speed is important when you have a very large dataset, then mvsumm is not very efficient. See http://www.stata.com/statalist/archive/2008-08/msg00153.html for a note on how to calculate a moving mean or stdev he fastest way (in my case: 5 seconds versus 5+ hours!)

Hasan, on 30 March 2014 at 5:14 PM said:Hi, after calculating the rolling standard deviation using 4 window, if I want to keep only those values that were calculated using atleast 3 observation, how cna I do so.

Any help/reply will be highly appreciated.

Mitch Abdon, on 1 April 2014 at 10:15 AM said:Hi, Hasan. Checkout http://statadaily.ikonomiya.com/2014/03/31/rolling-standard-deviations-and-missing-observations/

Yann de Mey, on 8 October 2012 at 9:43 PM said:Rolling is not the most optimal way of generating a moving average of standard deviation because (i) it is a “destructive” function that changes your dataset, (ii) it does not handle missing values correctly (e.g. it will calculate the mean of [10;”.”;15] as 8.33 whereas it should be “.” in my opinion) and (iii) is not very efficient (takes a long time to execute on long panels). I suggest you use mvsumm and mvcorr (use findit to install them). An example:

xtset id year // set panel

tsfill, full // required for mvsumm

mvsumm VAR, stat(mean) win(3) gen(meanVAR) end // Will generate a 3-year moving average of the variable VAR and will place this value at the final year of the window considered

mvcorr VAR1 VAR2, win(5) gen(corVARS) end // Will generate a 5-year moving correlation between the two variables VAR1 and VAR2 and will place this value at the final year of the window considered

Thuomas, on 26 August 2015 at 8:49 PM said:Valid points, but the mv-commands do not yield a solution for a rolling sd..

paul palugod, on 1 September 2012 at 10:13 AM said:hi, i have a months experience with Stata. I’ve been running mainly logisitc and probit models. Now i am at the point of making out-of-sample estimation and goodness-of-fit (mainly MSE). I’ve run rolling recursively [adding one observation at a time and producing an estimate each time] and have been successful generating the coefficient parameters. The last step is to generate predicted values and residuals. Is there a way to this by simple commands other than creating a program?

Mika, on 2 August 2012 at 9:21 PM said:is there a way to use -rolling- without dropping all the data?

Reason i’m asking is that i’m trying to calculate rolling standard devitiations of a variable in a large unbalanced panel, and would rather avoid having to re-merge the results back into the main database…

Ghazal, on 24 January 2012 at 10:07 PM said:How would you use this command for panel data? I have 200 countries and data from 1960-2009 and I need to calculate a 5-period moving std. dev. for this data. It works well for a single country but when I try to write a do-file for multiple countries it doesnt work

Thanks!

Mitch Abdon, on 25 January 2012 at 7:58 AM said:Hi, Ghazal. I would think that once you have -xtset- your data, -rolling- will calculate the moving sd by country for a specified window (eg 5). I may be wrong.

Ghazal, on 25 January 2012 at 8:24 AM said:Mitch you are right it did work just fine I had to reshape my data first

Phil Clayton, on 22 June 2011 at 12:45 PM said:Proud Mary is by John Fogerty (Creedence Clearwater Revival)…