Recently, a lot of video generation methods have been created. Nevertheless, the ability to control the generated data by the user is necessary for most practical applications.
A recent paper introduces an approach that allows users to generate videos in complex scenes by conditioning the movements of specific objects through mouse clicks.
Firstly, the feature representation is extracted from the first frame and its segmentation map. Then, motion information is predicted from user inputs and image features. A video sequence depicting objects for which movements are coherent with the user inputs is generated as an output. A graph neural network is used to model object interactions and infer plausible displacements, respecting the user’s constraints.
The experiments show that the proposed method outperforms its competitors in terms of video quality and successfully generates videos where object movements follow the user inputs.
This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at this https URL.
Research paper: Ardino, P., De Nadai, M., Lepri, B., Ricci, E., and Lathuilière, S., “Click to Move: Controlling Video Generation with Sparse Motion”, 2021. Link: https://arxiv.org/abs/2108.08815