3

I am trying to use LSTM autoencoder to do sequence-to-sequence learning with variable lengths of sequences as inputs, using following code:

inputs = Input(shape=(None, input_dim))
masked_input = Masking(mask_value=0.0, input_shape=(None,input_dim))(inputs)
encoded = LSTM(latent_dim)(masked_input)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

where inputs are raw sequence data padded with 0s to the same length (timesteps). Using the code above, the output is also of length timesteps, but when we calculate loss function we only want first Ni elements of the output (where Ni is length of input sequence i, which may be different for different sequences). Does anyone know if there is some good way to do that?

Thanks!

6
  • Have you tried to pad the outputs with zeros? Commented Oct 9, 2017 at 23:32
  • @DanielMöller The length of output is already timesteps, would it be even longer if I pad it with zeros? Commented Oct 10, 2017 at 2:07
  • Sorry, pad the "targets" with zeros. Commented Oct 10, 2017 at 2:28
  • @DanielMöller Yes, that is what I did, and the problem is related to the padding. For instance, if a specific input has 5 elements, when it is fed into the autoencoder, it is padded with 5 zeros to be of length 10. Ideally when calculating the loss, we only need to care about first 5 elements of output, but due to the presence of last 5 elements (unless they are all zeros, which is almost impossible), the loss will be larger. So I wonder if I could "mask out" last 5 elements of the output when calculating the loss? Commented Oct 10, 2017 at 2:33
  • Now I get it... how about another Masking after "RepeatVector"? I'll write an option... Commented Oct 10, 2017 at 2:44

2 Answers 2

6

Option 1: you can always train without padding if you accept to train separate batches.

See this answer to a simple way of separating batches of equal length: Keras misinterprets training data shape

In this case, all you have to do is to perform the "repeat" operation in another manner, since you don't have the exact length at training time.

So, instead of RepeatVector, you can use this:

import keras.backend as K

def repeatFunction(x):

    #x[0] is (batch,latent_dim)
    #x[1] is inputs: (batch,length,features)

    latent = K.expand_dims(x[0],axis=1) #shape(batch,1,latent_dim)
    inpShapeMaker = K.ones_like(x[1][:,:,:1]) #shape (batch,length,1)

    return latent * inpShapeMaker

#instead of RepeatVector:
Lambda(repeatFunction,output_shape=(None,latent_dim))([encoded,inputs])

Option2 (doesn't smell good): use another masking after RepeatVector.

I tried this, and it works, but we don't get 0's at the end, we get the last value repeated until the end. So, you will have to make a weird padding in your target data, repeating the last step until the end.

Example: target [[[1,2],[5,7]]] will have to be [[[1,2],[5,7],[5,7],[5,7]...]]

This may unbalance your data a lot, I think....

def makePadding(x):

    #x[0] is encoded already repeated  
    #x[1] is inputs    

    #padding = 1 for actual data in inputs, 0 for 0
    padding =  K.cast( K.not_equal(x[1][:,:,:1],0), dtype=K.floatx())
        #assuming you don't have 0 for non-padded data

    #padding repeated for latent_dim
    padding = K.repeat_elements(padding,rep=latent_dim,axis=-1)

    return x[0]*padding

inputs = Input(shape=(timesteps, input_dim))
masked_input = Masking(mask_value=0.0)(inputs)
encoded = LSTM(latent_dim)(masked_input)

decoded = RepeatVector(timesteps)(encoded)
decoded = Lambda(makePadding,output_shape=(timesteps,latent_dim))([decoded,inputs])
decoded = Masking(mask_value=0.0)(decoded)

decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

Option 3 (best): crop the outputs directly from the inputs, this also eliminates the gradients

def cropOutputs(x):

    #x[0] is decoded at the end
    #x[1] is inputs
    #both have the same shape

    #padding = 1 for actual data in inputs, 0 for 0
    padding =  K.cast( K.not_equal(x[1],0), dtype=K.floatx())
        #if you have zeros for non-padded data, they will lose their backpropagation

    return x[0]*padding

....
....

decoded = LSTM(input_dim, return_sequences=True)(decoded)
decoded = Lambda(cropOutputs,output_shape=(timesteps,input_dim))([decoded,inputs])
Sign up to request clarification or add additional context in comments.

16 Comments

But maybe the true best one is combining options 2 and 3 (you spare processing when you have the intermediate mask, and you eliminate nonsense repeated values at the end that would(?) influence your loss function).
A test that I won't try now is: create a model with masking and see if the repeated outputs participate in backpropagation.
The backend functions often map 1 to 1 with theano or tensorflow functions. They're here: github.com/fchollet/keras/tree/master/keras/backend --- I don't know how the backpropagation works, but I assume Keras leaves it all for tensorflow/theano to do.
I always assumed that the results of equal / not_equal are constants. They don't backpropagate, but they don't change the backpropagation of the tensors they modify, unless they're 0, of course. So far, my attempts have been working properly.
I mean latent dim.
|
1

For this LSTM Autoencoder architecture, which I assume you understand, the Mask is lost at the RepeatVector due to the LSTM encoder layer having return_sequences=False.

So another option, instead of cropping like above, could also be to create custom bottleneck layer that propagates the mask.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.