On Tuesday, Google announced Lumiere, an AI video generator that it calls "a space-time diffusion model for realistic video generation" in the accompanying preprint paper. But let's not kid ourselves: It does a great job at creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.
According to Google, Lumiere utilizes unique architecture to generate a video's entire temporal duration in one go. Or, as the company put it, "We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve."
In layperson terms, Google's tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.