Hunyuan-Large

less than 1 minute read

๐Ÿš€ Hunyuan-Large just released by Tencent: Largest ever open MoE LLM, only 52B active parameters but beats LLaMA 3.1-405B on most academic benchmarks

Key insights:

โšก Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input

๐Ÿงช Trained on 7T tokens, including 1.5T tokens of synthetic data

๐Ÿ—๏ธ Architecture : Novel โ€œrecycle routingโ€ prevents token dropping when experts are overrloaded

๐Ÿ“Š Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters

โ€ฃ Impressive perf on MATH: 77.4

๐Ÿ‹ย Large context length: up to 256K tokens

๐Ÿ”’ License:

โ€ฃ Commercial use allowed, except if your products have >100M monthly active users

โ€ฃ No access in the EU

๐Ÿค—ย Model weights available on HF!

Read the full paper here ๐Ÿ‘‰ย https://huggingface.co/papers/2411.02265

Capture dโ€™eฬcran 2024-11-05 aฬ€ 10.29.37.png