I am new to image processing from the perspective of embedding. Currently a lot of research and papers are being published in deep learning community for object detection using transformers. I have a basic question which is as follows: Lets assume that I have an image frame and the corresponding embedding for that frame. I get a new image frame after few milliseconds for the same camera. Is it possible to predict the embedding of the new frame using the information from the embedding of the previous frame?
For simplicity, we assume that the camera is fixed and no new agent (i.e. objects ) appear or disappear between the two frames. However, the objects which were present in the first frame keep moving and may have a different position in the second image.
I don't want to apply any transformer on the second image as I did for the first image to get the embedding. Is there any faster method other than using transformers on the new frame to estimate its embedding values?