AI-generated imagery and 3D content have come a long way in a very short space of time. It was only two years ago that Google researchers revealed NeRF, or Neural Radiance Fields, and less than two weeks ago NVIDIA blew us away with almost real-time generation of 3D scenes from just a few dozen still photographs using their “Instant NeRF” techniques.
Well, now, a new paper has been released by the folks at Waymo describing “Block-NeRF“, a technique for “scalable large scene neural view synthesis” – basically, generating really really large environments. And as a proof of concept, they recreated the city of San Francisco from 2.8 million photographs. And this video, Károly Zsolnai-Fehér at Two Minute Papers explains how it all works.
It’s a very impressive achievement, and while it’s massively ahead of where NeRF technology was just two years ago, it still isn’t quite perfect. According to Waymo, the images using a camera mounted on a self-driving vehicle. The 2.8 million images were then fed into their Block-NeRF code to generate a 3D representation of the city that they could freely explore, without being convinced to the vehicle’s path.
Waymo says that the images were created over several trips in a 3-month period, both during the day and at night. This wide range of imagery at different times and in different lighting conditions allows Block-NeRF to simulate the look of any part of the environment at any hour of the day or night. And for any little gaps in the image sequences, the AI is smart enough to figure out what’s likely in those spots and is able to fill them in for a pretty seamless look.
You can see in the video above – and in the original footage – that the representations aren’t perfect. There are definitely some resolution and detail limits. But this technology could easily be used even today in its current form to reproduce large outdoor locations for virtual sets in a studio – like those from The Mandalorian. As the exterior view through a car window, for example, you might recognise the buildings, but you’re not going to spot the AI artifacts as they’re whizzing by the window at a simulated 50mph. You’re also not going to spot them when they’re acting as the background of a static scene behind live actors, blurred slightly out of focus with a wide aperture lens, either.
I can see this making virtual sets of real-world locations a LOT more common in the future. It might even get cheaper than flying the crew to those locations. And in the case of locations like San Francisco, probably a whole lot safer, too!