Qwen 2.5

less than 1 minute read

๐Ÿ”ฅ Qwen releases their 2.5 family of models: New SOTA for all sizes up to 72B!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:

  • Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
  • Qwen2.5-Math: 1.5B, 7B, and 72B.

Key insights:

๐ŸŒ All models have 128k token context length

๐Ÿ“š Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

๐Ÿ’ช The flagship Qwen2.5-72B is ~competitive with Llama-3.1-405B, and has a 3-5% margin on Llama-3.1-70B on most benchmarks.

๐Ÿ‡ซ๐Ÿ‡ท On top of this, it takes the #1 spot on multilingual tasks so it might become my standard for French

๐Ÿ’ป Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct)

๐Ÿงฎ Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by โ€œaggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles.โ€

๐Ÿ“„ Technical report to be released โ€œvery soonโ€

๐Ÿ”“ All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning โ€œyou can use it for free EXCEPT if your product has over 100M usersโ€

๐Ÿค— All models are available on the HF Hub! โžก๏ธ https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e