If the TSRV is given by:
$$TSRV = \frac{1}{K} \sum_{i=K}^{n} (S_i - S_{i-K})^2 - \frac{\bar{n}}{n}\sum_{i=1}^n (S_i - S_{i-1})^2 $$
where $\bar{n} = \frac{n - K + 1}{K}$, with $n$ is the number of data points, $K$ is the ticks.
The spot variance on the interval $[t-h,t]$ is given by:
$$\sigma_t^2 = \frac{TSRV_t - TSRV_{t-h}}{h} $$
Let’s say I have $n=10,000$ data points, and I am trying to find $\sigma_t^2$, which is an array at varying times, $t$. Then if the size of $S$ is $n=10,000$, and $K=10$ then we let the TSRV array have size of something like 800.
My python code is below. I am confused with the $n$ in the TSRV. Should $n$ be the number of points up to time $t$, or is it always the number of points up to max time, $n = size(t_{max})$?
Because sometimes I get negative TSRV because the 1st term is bigger than the 2nd, but from what I understand, the variance of $S_i - S_{i-K}$ should always be bigger than $S_i - S_{i-1}$
H = 800 # size of TSRV array
K=10
n = np.size(S) # S is an array wih size 10,000
time_index = np.linspace(0,n,H+1,dtype=int)
time_index = time_index[1:] # new time scale for the TSRV
TSRV = np.empty(H)
idx=0
for j in time_index:
n = n.size(S[K:j])
n_bar = (n-K+1) / K
TSRV_1 = np.sum((S[K:j]- S[:j-K])**2)
TSRV_2 = np.sum((S[1:j] - S[:j-1])**2)
TSRV[idx] = 1/K * TSRV_1 - n_bar / n * TSRV_2
idx+=1
t=0
idx=0
T = int(H/2)
variance = np.empty(T)
# Here we are choosing h = 2, so TSRV_2 - TSRV_0, TSRV_4 - TSRV_2,…
while t < T:
variance[idx] = (TSRV[2*(t+1)] - TSRV[2*t]) / (time_max/T)
t+=1
idx+=1
```