Rolling standard deviations and missing observations


In And we’re rolling, rolling; rolling on the river, Hasan asked how he could “keep only those values that were calculated using at least 3 observations” after he calculated the 4 period rolling standard deviation of a set of observations. One solution is to tag the periods when the missing observations within the window (in this case 4) is more than 1 then replace the calculated standard deviations for these periods to missing.

Two things to note are:

(1) rolling requires that your data has been declared as a time-series dataset (see help tsset). Time-series operators, such L. for lags, are allowed.

(2) The keep() option in rolling allows you to keep the date variable, which you can use as an identifier in merging files

Here is an illustration (assuming nonrecursive analysis):
clear
set obs 20
set seed 1
gen date = _n
gen v1 = 1+int((100)*runiform())
gen v2 = v1
replace v2 = . in 1/4
replace v2 = . in 10/12
replace v2 = . in 18/20
tsset date

rolling sd2 = r(sd), window(4) keep(date) saving(f2, replace): sum v2
merge 1:1 date using f2, nogenerate

gen tag = missing(l3.v2) + missing(l2.v2) + missing(l1.v2) + missing(v2) > 1
gen sd = sd2 if tag==0

In the first block, we created an artificial data set of 20 uniformly distributed random integers between 1 and 100, replaced some observations to missing, and told Stata that we are dealing with a time-series data set.

In the second block, we calculated the 4 window rolling standard deviation. By using the saving() option rather than clear, we have not replaced the current data in memory and saved the resulting dataset from the rolling command in f2.dta. We merged this to our current data.

In the last block, we generated the variable tag that returns 1 if the expression missing(l3.v2) + missing(l2.v2) + missing(l1.v2) + missing(v2) > 1 is true, i.e., if the number of missing observations within the 4 period window is more than 1. Otherwise, tag is 0. Finally, in the last line, we created a new variable sd that is missing if the number of observations used in each window is less than 3.