Rhymes AI releases Allegro
๐ฌ Rhymes AI just released Allegro, the very first open-source commercial-grade video generation model!
It matches or beats most commercial solutions, ranking just behind Hailuo and Kling in overall quality.
Key insights:
๐ Innovative two-stage architecture
โธ Video VAE converts pixels to compressed visual tokens (4x8x8 compression)
โธ Video Diffusion Transformer predicts frames with self & cross attention
๐ Training dataset
โธ 106M images, 48M videos with highly associated captions
โธ Complex filtering pipeline for top quality: aesthetics, motion, semanticsโฆ
๐ Progressive training in 4 stages
โธ Image pre-training
โธ Basic video at 360p
โธ HD video at 720p
โธ Fine-tuning on premium data
๐ Results put it among the very best
โธ Beats all open-source models on VBench benchmark
โธ Only behind Hailuo & Kling in commercial space
โธ 91% of users preferred it to Dream Machine, 96% preferred it to OpenSora
This is huge: we now have detailed insights into what makes high-end video generation tick, from data curation to architecture choices. And itโs all open-source under Apache 2.0, available on GitHub and HF!
๐ธ Demo gallery ๐ rhymes.ai/allegro_gallery
Model on the Hub ๐ https://huggingface.co/rhymes-ai/Allegro