Yudian Zheng · Xiaodong Cun · Menghan Xia · Chi-Man Pun
Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise Bézier Curves. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.
if you only want to optimize the example, run (1.2) and (5).
the total training need projects of layer neural layer atlas and diffvg
# install all and train from beginning
sh scripts/install.sh
# (1.1) install NLA
sh scripts/install_atlas.sh
# (1.2) install diffvg and CLIP(optimize the example models)
sh scripts/install_clipavideo.sh
(2) download Dataset or take your own data(less than 70 frames,and extract masks), put on the folder :
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip
unzip DAVIS-2017-Unsupervised-trainval-Full-Resolution.zip
or using the examples data(car-turn) and extract the masks.
sh scripts/process_dataset.sh
sh scripts/operate_atlas.sh <video_name>
The trained models should be located at 'data/dataset/<video_name>/results/<epoch_num>' and 'data/dataset/<video_name>/results/checkpoint'.
sh scripts/operate_clipavideo.sh <video_name>
Look at arguments.txt to see more arguments
@article{zheng2023sketch,
title={Sketch Video Synthesis},
author={Yudian Zheng and Xiaodong Cun and Menghan Xia and Chi-Man Pun},
year={2023},
eprint={2311.15306},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
The code is borrowed heavily from CLIPasso and CLIPScene, thanks for their wonderful work!