Pytorch, standard layer to convert sequential output to binary?

Question

I am working on a new Pytorch model which takes sequential data as input and I need to output just a single value, which I will then use a binary cross-entropy function to evaluate as a probability of 1 or 0.

To be more concrete, lets say my sequence is 1000 time steps and only 2 dimensions, like a 2-dimensional sine wave, so the data shape would be 1000 x 2.

I have done something like this before using an RNN, which there is a lot of content online. Because of the recurrent structure of the RNN, in order to do this we just look at final output of the RNN after processing the sequence. In this way the the final step output would be 2 dimensions, then we can apply a linear layer to convert 2 -> 1 dimension, et voila, its done.

MY PROBLEM:

What I am attempting to do now is not using a recurrent network, but instead an encoder with attention (Transformer). So the output of the encoder is now still 1000 steps long and whatever my embedded dimension is, likes say 8. So the output of the sequential encoder is shape 1000 x 8. So my issue is that I need to convert this output to a single value, to which I can apply the binary cross-entropy function. I am not finding an obvious way to do this.

IDEAS:

Traditionally with this kind of sequential model, the encoder feeds into a decoder and the decoder can then output a variable length sequence (this is used to language translation problems). My problem is different in that I don't want to output another sequence but just a single value. Maybe I need to convert the decoder in such a way where this works? The decoder usually takes a target value as well as the output from the encoder as input. The output from the decoder then has the same shape as this target value. An idea would be to use the traditional decoder and give a 1 length target, I would then get a 1 length output and I could use a traditional linear layer to convert this to my desired output. However this doesn't seem entirely logical because I really am not interested in outputting a sequence but just 1 value.

Anyways just looking for some more ideas from the community, if you have any. Thanks!

I believe you need to step back from implementation (i.e. drop the pytorch part) -- and then it's more of a stats.stackexchange.com question ;-) — dedObed
– dedObed, Commented Dec 30, 2020 at 13:38

hkchengrex · Accepted Answer · 2020-12-30 14:27:32Z

1

I think this paper does what you wanted :) (Probably not the first paper that does this but it is the one that I recently read)

Prepend an extra token to your sequence. The token can have a learnable embedding.
After the transformer, discard (or not compute) the output at other positions. We only take the output from the first position, and transform it to the target that you needed.

Image taken from the paper:

answered Dec 30, 2020 at 14:27

hkchengrex

4,85627 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jeffery_the_wind Over a year ago

After posting I was thinking about it and after mulling it over it did seem completely reasonable to just predict the "first position" of the output sequence like this. You get to take advantage of the attention mechanism of the decoder and everything. I am going to try it.

Collectives™ on Stack Overflow

Pytorch, standard layer to convert sequential output to binary?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related