How to concatenate BERT-like sentence representation and word embeddings - Keras & huggingface
I am following this Keras tutorial to combine Hugging Face transformers with other layers: https://keras.io/examples/nlp/text_extraction_with_bert/
I want to concatenate the transformer embedding layer (included in the tutorial) with some regular word embeddings:
encoder = TFBertModel.from_pretrained("bert-base-uncased")
input_ids = layers.Input(shape=(max_transformer_len,), dtype=tf.int32)
token_type_ids = layers.Input(shape=(max_transformer_len,), dtype=tf.int32)
attention_mask = layers.Input(shape=(max_transformer_len,), dtype=tf.int32)
embedding1 = encoder(
input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)[0]`
input_wordembedding = Input(shape=(max_sentence_len,), dtype='int32',
name='we_input')
embedding2 = Embedding(output_dim=wordembedding_VECTOR_SIZE,
input_dim=wordembedding_VOCAB_SIZE,
input_length=max_sentence_len,
weights = [emb_matrix],
name='emb1')(we_input)
z = Concatenate(name='merged')([embedding1, embedding2])
My problem is that the layer embedding1 has sub-words representations and embedding2 has words representations. Then, I want to do a max-pooling of the sub-words in the embedding1 layer. In this way, I will obtain word representations with a transformer model.
Does anyone know how to approach this problem in Keras? If it is impossible to solve it with Keras, is it doable in PyTorch? How?