2
$\begingroup$

If the TSRV is given by:

$$TSRV = \frac{1}{K} \sum_{i=K}^{n} (S_i - S_{i-K})^2 - \frac{\bar{n}}{n}\sum_{i=1}^n (S_i - S_{i-1})^2 $$

where $\bar{n} = \frac{n - K + 1}{K}$, with $n$ is the number of data points, $K$ is the ticks.

The spot variance on the interval $[t-h,t]$ is given by:

$$\sigma_t^2 = \frac{TSRV_t - TSRV_{t-h}}{h} $$

Let’s say I have $n=10,000$ data points, and I am trying to find $\sigma_t^2$, which is an array at varying times, $t$. Then if the size of $S$ is $n=10,000$, and $K=10$ then we let the TSRV array have size of something like 800.

My python code is below. I am confused with the $n$ in the TSRV. Should $n$ be the number of points up to time $t$, or is it always the number of points up to max time, $n = size(t_{max})$?

Because sometimes I get negative TSRV because the 1st term is bigger than the 2nd, but from what I understand, the variance of $S_i - S_{i-K}$ should always be bigger than $S_i - S_{i-1}$


H = 800 # size of TSRV array

K=10

n = np.size(S)   # S is an array wih size 10,000


time_index = np.linspace(0,n,H+1,dtype=int)
time_index = time_index[1:]  # new time scale for the TSRV

TSRV = np.empty(H)

idx=0
for j in time_index:
    n = n.size(S[K:j])
    n_bar = (n-K+1) / K
    
    TSRV_1 = np.sum((S[K:j]- S[:j-K])**2)
    TSRV_2 = np.sum((S[1:j] - S[:j-1])**2)

    TSRV[idx] = 1/K * TSRV_1 - n_bar / n * TSRV_2

    idx+=1

t=0
idx=0
T = int(H/2)
variance = np.empty(T)

# Here we are choosing h = 2, so TSRV_2 - TSRV_0,  TSRV_4 - TSRV_2,… 
while t < T:
    variance[idx] = (TSRV[2*(t+1)] - TSRV[2*t]) / (time_max/T)
    t+=1

    idx+=1
```
$\endgroup$

1 Answer 1

2
$\begingroup$

Determining $n$ in your simulation procedure:

$n$ is the amount of intraday data between two days $[t-1, t]$. If we assume a 1-second frequency during NYSE time opening hours, we have $n = 60 \cdot 60 \cdot 6.5 = 23400$ points of intraday data (NYSE time is 9:30AM - 4PM, giving us 6.5 hours). Working with data during the opening hours of the market, is usually what is preferred empirically for many high-frequency financial papers.

For many simulation procedures concerning realized volatility, whether we gather 23400 data points in the span of 6.5 hours or assume their distribution across a full 24-hour duration does not significantly impact the results. Most of the time, the goal with the simulation procedure is to investigate the realized estimator in a manner that is independent by the specific intraday sampling period of the data. This is true if, for example, you want to check whether your realized estimator converges to the "true" volatility of your stochastic process.

As such, you can let $n=23400$ and assume this is distributed over the entire 24 hour period.


Negative TSRV estimate:

For brevity, let's define the TSRV estimator as: \begin{align} TSRV &= \frac{1}{K} \sum_{i=K}^{n} (S_i - S_{i-K})^2 - \frac{\bar{n}}{n}\sum_{i=1}^n (S_i - S_{i-1})^2 \\ &= RV_t^{subavg} - RV_t^{noisy}. \end{align}

The fact that TSRV can give negative estimates is well documented in the literature.

The estimator is designed to work for very high-frequency noisy data, meaning settings where the raw data are sampled every few seconds in case of typical financial data. If this is not the case, the bias-correction term might overcorrect the volatility, leading to negative estimates.

An exempt from Aït-Sahalia, Yacine, and Jean Jacod. High-frequency financial econometrics (2014) on pp. 234 - 235, highlights this issue (I have changed the notation to my own):

Remark 7.12 (Practical considerations) Estimators such as TSRV are designed to work for highly liquid assets. Indeed, the bias correction relies on the idea that RV computed with all the high-frequency observations, $RV_t^{noisy}$, consists primarily of noise. But if the full data sample frequency is low to begin with (for example, a stock sampled every minute instead of every second), $RV_t^{noisy}$, will not be entirely noise, and bias correcting as above may overcorrect, including in extreme cases possibly yielding a negative estimator in (7.38). So care must be taken to apply the estimator to settings which are appropriate. [...]

If you are doing a simulation experiment and do not assume any (additive) noise process on your log-prices you are bound to run into the same problem. I would advise you to assume an additive noise process, $v_t$, such that we observe $Y_t = S_t + v_t$ and calculate TSRV estimator on the noisy log-price process, $Y_t$ instead. This might alleviate the problem.


Note that you can correct the TSRV estimator for finite sample bias:

$$ TSRV^{adjusted}_t = \left(1 - \frac{\bar{n}}{n}\right)^{-1} TSRV, $$ giving you a slightly better version of the TSRV estimator, especially for "smaller" sample sizes.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.