YouTube’s Real-Time Generative AI Effects on Mobile Devices

Google Cloud engineers Andrey Vakunov and Adam Svystun have developed a solution to deliver real-time generative AI effects on mobile devices using knowledge distillation and on-device optimization with MediaPipe.

The challenge of applying large generative AI models, such as cartoon style transfer, on creators’ phones is addressed by creating a compact model that can run directly on the device, processing video frame-by-frame. The approach uses data curation, training, and on-device setup to achieve this.

High-quality data is essential for building effects that work well for everyone. A face dataset was built using properly licensed images, filtered to ensure diversity and uniformity across different demographics.

The model employs knowledge distillation, using a “teacher-student” method. The teacher is a large, powerful pre-trained generative model, while the student is a smaller, more efficient model that runs on the device. The process involves iteratively teaching the student through data generation, training, and fine-tuning to preserve user identity.

To solve the inversion problem, pivotal tuning inversion (PTI) technique is used. This approach preserves face identity and details by generating a new generator that performs better for specific faces and their embedding neighborhoods.

The trained model is integrated into an on-device pipeline using MediaPipe, which detects faces in the video stream, applies the effect, and seamlessly composites it onto the original frame in real-time. The pipeline executes faster than 33 milliseconds per frame to meet the 30 frames per second requirement for responsive experiences.

This technology has enabled YouTube Shorts to launch numerous popular features since 2023, expanding creative possibilities for creators. The team is working on integrating newer models and reducing latency for entry-level devices, further democratizing access to cutting-edge generative AI in YouTube Shorts.

Source: https://research.google/blog/from-massive-models-to-mobile-magic-the-tech-behind-youtube-real-time-generative-ai-effects