Temporal and Contextual Transformer for
Multi-Camera Editing of TV Shows
ECCVW 2022

Anyi Rao¹ Xuekun Jiang² Sichen Wang³ Yuwei Guo² Zihao Liu³
Dai Bo² Long Pang³ Xiaoyu Wu³ Dahua Lin^2,4 Libiao Jin³

Stanford University¹ Shanghai Artificial Intelligence Laboratory²
Communication University of China³ The Chinese University of Hong Kong⁴

Abstract

The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery. But it is hard to figure out the statistical pattern and apply intelligent processing due to the lack of high-quality training data. To solve this issue, we first collect a novel benchmark on this setting with four diverse scenarios including concerts, sports games, gala shows, and contests, where each scenario contains 6 synchronized tracks recorded by different cameras. It contains 88-hour raw videos that contribute to the 14-hour edited videos. Based on this benchmark, we further propose a new approach temporal and contextual transformer that utilizes clues from historical shots and other views to make shot transition decisions and predict which view to be used. Extensive experiments show that our method outperforms existing methods on the proposed multi-camera editing benchmark.

TVMCE Dataset

Overview of TV shows MultiCamera Editing (TVMCE) Datset. We reached out to film and TV production major colleges and follow their professional production team to acquire data covering various scenarios including concert, sports, contest and gala show that happen in universities and city theaters/stadiums. Our dataset holds a balanced coverage ratio among different scenario categories with 39% in gala shows and 14% in sports. Most shots in our dataset have a time duration between 0 to 8 seconds and a few shots are long shots that last longer than 32 seconds.

BibTeX


@article{rao2022temporal,
    title={Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows},
    author={Rao, Anyi and Jiang, Xuekun and Wang, Sichen and Guo, Yuwei and Liu, Zihao and Dai, Bo and Pang, Long and Wu, Xiaoyu and Lin, Dahua and Jin, Libiao},
    journal={arXiv preprint arXiv:2210.08737},
    year={2022}
}

Acknowledgements

The website template was borrowed from Michaël Gharbi.

Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows ECCVW 2022

Paper

Data

Abstract

TVMCE Dataset

BibTeX

Acknowledgements

Temporal and Contextual Transformer for
Multi-Camera Editing of TV Shows
ECCVW 2022