Understand the output of LSTM autoencoder and use it to detect outliers in a sequence

Question

I try to build LSTM model that as input receives sequence of integer numbers and outputs probability for each integer to appear. If this probability is low, then the integer should be considered as anomaly. I tried to follow this tutorial - https://towardsdatascience.com/lstm-autoencoder-for-extreme-rare-event-classification-in-keras-ce209a224cfb, particularly this is where my model is from. My input looks like this:

[[[3]
  [1]
  [2]
  [0]]

 [[3]
  [1]
  [2]
  [0]]

 [[3]
  [1]
  [2]
  [0]]

However I can't understand what I gain as an output.

[[[ 2.7052343 ]
  [ 1.0618575 ]
  [ 1.8257084 ]
  [-0.54579014]]

 [[ 2.9069736 ]
  [ 1.0850943 ]
  [ 1.9787762 ]
  [ 0.01915958]]

 [[ 2.9069736 ]
  [ 1.0850943 ]
  [ 1.9787762 ]
  [ 0.01915958]]

Is it reconstruction error? Or the probabilities for each integer? And if so, why they're not in the range of 0-1? I would be grateful if someone could explain this.

The model:

time_steps = 4
features = 1

train_keys_reshaped = train_integer_encoded.reshape(91, time_steps, features)
test_keys_reshaped = test_integer_encoded.reshape(25, time_steps, features)

model = Sequential()
model.add(LSTM(32, activation='relu', input_shape=(time_steps, features), return_sequences=True))
model.add(LSTM(16, activation='relu', return_sequences=False))
model.add(RepeatVector(time_steps)) # to convert 2D output into expected by decoder 3D
model.add(LSTM(16, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(features)))

adam = optimizers.Adam(0.0001)
model.compile(loss='mse', optimizer=adam)

model_history = model.fit(train_keys_reshaped, train_keys_reshaped,
                          epochs=700,
                          validation_split=0.1)

predicted_probs = model.predict(test_keys_reshaped)

Reda El Hail · Accepted Answer · 2020-05-12 09:08:24Z

4

As you said it's an autoencoder. Your autoencoder tries to reconstruct your input. As you see, the output values are very close to the input values, there is not a big error. So the autoencoder is well trained.

Now if you want to detect outliers in your data, you can compute the reconstruction error (Could be Mean square Error between input and output) and set up a threshold.

If reconstruction error is superior than the threshold it's gonna be an outlier, since the autoencoder is not trained on reconstructing outlier data.

This schema reprensents better the idea:

I hope this helps ;)

answered May 12, 2020 at 9:08

Reda El Hail

1,0161 gold badge9 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Annie H. Over a year ago

Thank you, that helped indeed :)

Collectives™ on Stack Overflow

Understand the output of LSTM autoencoder and use it to detect outliers in a sequence

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related