Video Generation with Learned Priors

visual prediction task Given n frames of video x_{1}, …, x_{n}, predict T subsequent frames x_{n+1}… x_{n+T}.
Action Conditioned Prediction Network Visual prediction task. Two inputs:
taken action image Predict the next t frame. Challenge! What is the action? Predicting the action is not super easy.
RAFI Conditional Flow Matching with Video Generation.
https://arxiv.org/pdf/2406.14436

[[curator]]
I'm the Curator. I can help you navigate, organize, and curate this wiki. What would you like to do?