How to use masking layer to mask input/output in LSTM autoencoders?

Question

I am trying to use LSTM autoencoder to do sequence-to-sequence learning with variable lengths of sequences as inputs, using following code:

inputs = Input(shape=(None, input_dim))
masked_input = Masking(mask_value=0.0, input_shape=(None,input_dim))(inputs)
encoded = LSTM(latent_dim)(masked_input)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

where inputs are raw sequence data padded with 0s to the same length (timesteps). Using the code above, the output is also of length timesteps, but when we calculate loss function we only want first Ni elements of the output (where Ni is length of input sequence i, which may be different for different sequences). Does anyone know if there is some good way to do that?

Thanks!

@DanielMöller The length of output is already timesteps, would it be even longer if I pad it with zeros? — username123
– username123, Commented Oct 10, 2017 at 2:07
@DanielMöller Yes, that is what I did, and the problem is related to the padding. For instance, if a specific input has 5 elements, when it is fed into the autoencoder, it is padded with 5 zeros to be of length 10. Ideally when calculating the loss, we only need to care about first 5 elements of output, but due to the presence of last 5 elements (unless they are all zeros, which is almost impossible), the loss will be larger. So I wonder if I could "mask out" last 5 elements of the output when calculating the loss? — username123
– username123, Commented Oct 10, 2017 at 2:33
Now I get it... how about another Masking after "RepeatVector"? I'll write an option... — Daniel Möller
– Daniel Möller, Commented Oct 10, 2017 at 2:44

Daniel Möller · Accepted Answer · 2017-10-10 03:36:09Z

6

Option 1: you can always train without padding if you accept to train separate batches.

See this answer to a simple way of separating batches of equal length: Keras misinterprets training data shape

In this case, all you have to do is to perform the "repeat" operation in another manner, since you don't have the exact length at training time.

So, instead of RepeatVector, you can use this:

import keras.backend as K

def repeatFunction(x):

    #x[0] is (batch,latent_dim)
    #x[1] is inputs: (batch,length,features)

    latent = K.expand_dims(x[0],axis=1) #shape(batch,1,latent_dim)
    inpShapeMaker = K.ones_like(x[1][:,:,:1]) #shape (batch,length,1)

    return latent * inpShapeMaker

#instead of RepeatVector:
Lambda(repeatFunction,output_shape=(None,latent_dim))([encoded,inputs])

Option2 (doesn't smell good): use another masking after RepeatVector.

I tried this, and it works, but we don't get 0's at the end, we get the last value repeated until the end. So, you will have to make a weird padding in your target data, repeating the last step until the end.

Example: target [[[1,2],[5,7]]] will have to be [[[1,2],[5,7],[5,7],[5,7]...]]

This may unbalance your data a lot, I think....

def makePadding(x):

    #x[0] is encoded already repeated  
    #x[1] is inputs    

    #padding = 1 for actual data in inputs, 0 for 0
    padding =  K.cast( K.not_equal(x[1][:,:,:1],0), dtype=K.floatx())
        #assuming you don't have 0 for non-padded data

    #padding repeated for latent_dim
    padding = K.repeat_elements(padding,rep=latent_dim,axis=-1)

    return x[0]*padding

inputs = Input(shape=(timesteps, input_dim))
masked_input = Masking(mask_value=0.0)(inputs)
encoded = LSTM(latent_dim)(masked_input)

decoded = RepeatVector(timesteps)(encoded)
decoded = Lambda(makePadding,output_shape=(timesteps,latent_dim))([decoded,inputs])
decoded = Masking(mask_value=0.0)(decoded)

decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients

def cropOutputs(x):

    #x[0] is decoded at the end
    #x[1] is inputs
    #both have the same shape

    #padding = 1 for actual data in inputs, 0 for 0
    padding =  K.cast( K.not_equal(x[1],0), dtype=K.floatx())
        #if you have zeros for non-padded data, they will lose their backpropagation

    return x[0]*padding

....
....

decoded = LSTM(input_dim, return_sequences=True)(decoded)
decoded = Lambda(cropOutputs,output_shape=(timesteps,input_dim))([decoded,inputs])

edited Oct 10, 2017 at 3:36

answered Oct 10, 2017 at 3:06

Daniel Möller

86.8k24 gold badges202 silver badges222 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

Daniel Möller Over a year ago

But maybe the true best one is combining options 2 and 3 (you spare processing when you have the intermediate mask, and you eliminate nonsense repeated values at the end that would(?) influence your loss function).

Daniel Möller Over a year ago

A test that I won't try now is: create a model with masking and see if the repeated outputs participate in backpropagation.

Daniel Möller Over a year ago

The backend functions often map 1 to 1 with theano or tensorflow functions. They're here: github.com/fchollet/keras/tree/master/keras/backend --- I don't know how the backpropagation works, but I assume Keras leaves it all for tensorflow/theano to do.

Daniel Möller Over a year ago

I always assumed that the results of equal / not_equal are constants. They don't backpropagate, but they don't change the backpropagation of the tensors they modify, unless they're 0, of course. So far, my attempts have been working properly.

Daniel Möller Over a year ago

I mean latent dim.

|

Nathan H · Accepted Answer · 2021-07-19 18:47:00Z

1

For this LSTM Autoencoder architecture, which I assume you understand, the Mask is lost at the RepeatVector due to the LSTM encoder layer having return_sequences=False.

So another option, instead of cropping like above, could also be to create custom bottleneck layer that propagates the mask.

answered Jul 19, 2021 at 18:47

Nathan H

663 bronze badges

Collectives™ on Stack Overflow

How to use masking layer to mask input/output in LSTM autoencoders?

2 Answers 2

Option 1: you can always train without padding if you accept to train separate batches.

Option2 (doesn't smell good): use another masking after RepeatVector.

Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients

16 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Option 1: you can always train without padding if you accept to train separate batches.

Option2 (doesn't smell good): use another masking after RepeatVector.

Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients

16 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related