Outline: I am using a vector autoregression (VAR) model from the statsmodel package https://www.statsmodels.org/stable/vector_ar.html#var.
My two time-series, let us call them time-series 1) ts1 and 2) ts2, are each 774 sampling points long. I train the VAR on the first 80% of the data length, and then use remaining 20% for prediciton/forecast. The optimal lag was determined using the Bayesian information criterion (BIC).
Moreover, since both time-series lacked stationarity, I first applied first-order differencing on them and subsequently also applied a linear detrending. The additional linear detrending was used because there was still a minimal linear trend left in the data after differencing.
Code: Here is an example of my code.
from statsmodels.tsa.api import VAR
import pandas as pd
df_train = pd.DataFrame({"ts2": ts2,
"ts1": ts1})
len_train = round(len(df_train) * 0.8)
train = df[:len_train]
# VAR train on the first 80% train data
model = VAR(df_train)
model_fit = model.fit(bic_lag)
# Prediction/forecast on the last 20% test data
len_test = round(len(df) * 0.2)
test = df[len_train:]
# DataFrame to 2d numpy array
test = pd.DataFrame.to_numpy(test)
# Prediction
y_pred = model_fit.forecast(y=test, steps=len_test)
# Select first column (ts2)
y_pred = y_pred[:, 0]
# True values
y_true = var2_var[len_train:]
Problem:
Now, we can plot the predicted y_pred (in red color) and the real values y_true (in blue color), shown in the plot below. As we can see, the prediction y_pred quickly converges to zero and remains there.
Question: Here is my question as someone still new to prediction and forecasting. Why does the model simply not predict properly further than data ~20 points? What are possible reasons for the fact that the prediction converges to zero relatively quickly?
Finally, if required, I can add the two time-series here for reproducibility. I refrained from this for now because I did not want to "spam" my question with a long list of data.
