Researchers at the University of Washington have developed a deep learning method that only needs a single photo to make a believable video. If you have a photo of a waterfall, a river, smoke, or clouds, it predicts the previous and the next frame and creates a pretty cool animation.
The team described their method in a paper and will present it at the Conference on Computer Vision and Pattern Recognition on 22 June. Aleksander Hołyński is a doctoral student in the Paul G. Allen School of Computer Science & Engineering and the lead author of the paper, and he spoke a bit about the project for UW News.
“What’s special about our method is that it doesn’t require any user input or extra information,” Hołyński said. “All you need is a picture. And it produces as output a high-resolution, seamlessly looping video that quite often looks like a real video.” Here’s one of the examples the team ended up with:
As you probably know, this isn’t the first program that turns a photo into a video or a cinemagraph. You can do it yourself in Photoshop, Premiere Pro, and After Effects. There have also been Photoshop plugins like Artymate, or other predictive algorithms that are still far from perfect. The key is to make the end video believable, and there are a bunch of challenges for doing it right.
Hołyński explains that turning a photo into a video requires the algorithm to predict the future. “And in the real world, there are nearly infinite possibilities of what might happen next,” he adds. So, he and his team trained a neural network with thousands of videos of waterfalls, rivers, oceans, and other material with fluid motion. They would first ask the network to predict the motion of a video only by the first frame. Then, it would compare its prediction with the actual video, which helped it learn to identify clues that tell it what was going to happen next (such as ripples in a stream, for example).
The researchers tried to use “splatting,” a technique that moves each pixel according to its predicted motion. However, it posed another set of challenges. “Think about a flowing waterfall,” Hołyński told UW News. “If you just move the pixels down the waterfall, after a few frames of the video, you’ll have no pixels at the top!” So they had to come up with a solution for this, and they called it “symmetric splatting.” It doesn’t only predict the future, but also “the past” of an image, creating a seamless animation.
For now, the algorithm works best with flowing materials, like rivers, waterfalls, smoke, or clouds. Essentially anything that has predictable and flowing motion. But in the future, the team would like to extend its possibilities and animate other things, for example, someone’s hair blowing in the wind.
[via UW News]