Hunyuan-Large
๐ Hunyuan-Large just released by Tencent: Largest ever open MoE LLM, only 52B active parameters but beats LLaMA 3.1-405B on most academic benchmarks
Key insights:
โก Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input
๐งช Trained on 7T tokens, including 1.5T tokens of synthetic data
๐๏ธ Architecture : Novel โrecycle routingโ prevents token dropping when experts are overrloaded
๐ Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters
โฃ Impressive perf on MATH: 77.4
๐ย Large context length: up to 256K tokens
๐ License:
โฃ Commercial use allowed, except if your products have >100M monthly active users
โฃ No access in the EU
๐คย Model weights available on HF!
Read the full paper here ๐ย https://huggingface.co/papers/2411.02265