Papers
arxiv:1812.01717

Towards Accurate Generative Models of Video: A New Metric & Challenges

Published on Dec 3, 2018
Authors:
,
,
,
,
,

Abstract

Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent attempts at formulating generative models of video have had some success, current progress is hampered by (1) the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples, and (2) the wide gap between purely synthetic video data sets and challenging real-world data sets in terms of complexity. To this extent we propose Fr\'{e}chet Video Distance (FVD), a new metric for generative models of video, and StarCraft 2 Videos (SCV), a benchmark of game play from custom starcraft 2 scenarios that challenge the current capabilities of generative models of video. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1812.01717 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1812.01717 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.