0

I have the following data table that I want to use to predict DE prices based on the other variables in the data table with the GLM (= Generalized Linear Model).

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
                      'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
                      'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1), 
                      'solarDE' = rnorm(731, 1, 1), check.names = FALSE)


dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)


fromTestDate <- "2019-12-31"
fromDateTest <- base::toString(fromTestDate)      


## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)

## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date

## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]


## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train, 
                       family = quasi(link = "identity", variance = "constant"))


## Train and Test Data PREDICTION with xgbModel: ##
dt.train$prediction <- stats::predict.glm(xgbModel, dt.train)
dt.test$prediction <- stats::predict.glm(xgbModel, dt.test)


## Add date columns to dt.train and dt.test: ##
dt.train <- data.table(date = v.trainDate, dt.train)
dt.test <- data.table(date = v.testDate, dt.test)

Here in this code I train the model with the data from 2019-01-01 to 2019-12-31 and test it with the day-ahead forecast from 2020-01-01. Now I want to create a for-loop so that I run my model 365 in total, as follows:

Run 1:

a) use 01-01-2019 to 31-12-2019 to train my model

b) predict for 01-01-2020 (test data)

c) use the actual data point for 01-01-2020 to evaluate the prediction

Run 2:

a) use 01-01-2019 to 01-01-2020 to train my model

b) predict for 02-01-2020

c) use the actual data point for 02-01-2020 to evaluate the prediction

etc.

In the end, I want to plot e.g. the cumulate sum of the individual prediction performances Or the histogram of the individual prediction performances and some summary statistics (mean, median, sd, etc.)

Unfortunately, I don't know how to start with the loop and where I can save my predictions of each run? I hope someone can help me with this!

1 Answer 1

1

Basically, you have to construct a vector that contains the end dates for each run. Then, you can pick one of the end dates in each iteration of the loop, run the model and predict one day ahead. Using your code, this may look something like this:

set.seed(123)
dt.data <- data.table(date = seq(as.Date('2019-01-01'), by = '1 day', length.out = 731),
                      'DE' = rnorm(731, 30, 1), 'windDE' = rnorm(731, 10, 1),
                      'consumptionDE' = rnorm(731, 50, 1), 'nuclearDE' = rnorm(731, 8, 1), 
                      'solarDE' = rnorm(731, 1, 1), check.names = FALSE)


dt.forecastData <- dt.data
dt.forecastData <- na.omit(dt.forecastData)

Here, I construct a vector holding all days between Dec 31 2019 and Jan 15 2020, adapt as needed:

# vector of all end dates
eval.dates <- seq.Date(from = as.Date("2019-12-31"), 
                       to   = as.Date("2020-01-15"),
                       by   = 1)

Here, I create a storage file for the one-day ahead predictions

# storage file for all predictions
test.predictions  <- numeric(length = length(eval.dates))

Now, run the loop using your code and pick one of the end dates in each iteration:

for(ii in 1:length(eval.dates)){ # loop start

fromTestDate <- eval.dates[ii] # get end date for iteration
fromDateTest <- base::toString(fromTestDate)      


## Create train and test date-vectors depending on fromDateTest: ##
v.train <- which(dt.forecastData$date <= fromDateTest)
v.test <- which(dt.forecastData$date == as.Date(fromDateTest)+1)

## Create data tables for train and test data with specific date range (fromTestDate): ##
dt.train <- dt.forecastData[v.train]
v.trainDate <- dt.train$date
dt.test <- dt.forecastData[v.test]
v.testDate <- dt.test$date

## Delete column "date" of train and test data for model fitting: ##
dt.train <- dt.train[, c("date") := NULL]
dt.test <- dt.test[, c("date") := NULL]


## MODEL FITTING: ##
## Generalized Linear Model: ##
xgbModel <- stats::glm(DE ~ .-1, data = dt.train, 
                       family = quasi(link = "identity", variance = "constant"))






## Train and Test Data PREDICTION with xgbModel: ##
test.predictions[ii] <- stats::predict.glm(xgbModel, dt.test)


# verbose
print(ii)

} # loop end

As you can see, this is a bit of a shortened version of your code and I omitted the predictions for the training set for brevity. They can easily be added along the lines of the code you have above.

You did not specify which measures you want to use to evaluate your out-of-sample predictions. The object test.predictions holds all your one-step-ahead predictions and you can use this to compute RMSEs, LPS or whatever quantification of predictive power that you'd like to use.

Sign up to request clarification or add additional context in comments.

6 Comments

For the evaluation of my day-ahead forecasts I would like to compare these forecasts with the actual DE prices, these are in the data table dt.data in the second column.
How would this work for the train data set? I have already tried it, but I get always the same values for each date of the train data set..
For the train data set, you'd need to construct a matrix or an array, because you do more than one prediction per iteration of the loop, hence a vector is not suitable anymore.
Unfortunately, I don't get it for the train data set....
How do I get the overall RSME of the iterations?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.