Hunyuan-Large

less than 1 minute read

🚀 Hunyuan-Large just released by Tencent: Largest ever open MoE LLM, only 52B active parameters but beats LLaMA 3.1-405B on most academic benchmarks

Key insights:

⚡ Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input

🧪 Trained on 7T tokens, including 1.5T tokens of synthetic data

🏗️ Architecture : Novel “recycle routing” prevents token dropping when experts are overrloaded

📊 Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters

‣ Impressive perf on MATH: 77.4

🐋 Large context length: up to 256K tokens

🔒 License:

‣ Commercial use allowed, except if your products have >100M monthly active users

‣ No access in the EU

🤗 Model weights available on HF!

Capture d’écran 2024-11-05 à 10.29.37.png

o1