Blog post for Claude 3 5 Sonnet
Categories: 🤖 Agents Complete: Terminé Key insights: Claude-3.5-Sonnet is really strong on agentic behaviour, much stronger than GPT-4o. Publication: 2024/10/22 Rating: ⭐️⭐️ Read: 2024/10/22
New agentic powerhouse: upgraded Claude 3.5 Sonnet leaves GPT-4o in the dust 🚀
Anthropic just announced two extremely impressive releases, with an improved Claude 3.5 Sonnet and a new Claude 3.5 Haiku ! (Haiku previously only existed in version 3)
⚡️ Claude 3.5 Sonnet: a reasoning powerhouse
▸ On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding.
▸ Yet on hard problems like Math olympiad problems (AIME ‘24), even though it’s in front of GPT-4o (16% vs 5%), it seems far behind the score of 74% announced by OpenAI o1.
▸ Available for the same price and speed as the previous Claude 3.5 Sonnet
🏎️ New Claude 3.5 Haiku:
▸ As blazing fast as previous Haiku but matches Claude 3 Opus performance
▸ Amazing 40.6% on SWE-bench (better than original 3.5 Sonnet!)
🖥️ On top of this, they release a revolutionary “computer use” feature:
▸ Claude can now control computers like humans: by clicking, scrolling and typing!
▸ Evaluated on OSWorld, an environment made fo revaluating end-to-end agents that use an OS with mouse and keyboard, Claude reached 22% score, leaving the previous best of 8% in the dust
The new Claude-3.5-Sonnet sounds like a game-changer for agentic applications. I really look forward to trying it in agentic workflows, especially on GAIA.
Game is on - now if OpenAI doesn’t react, they’ll start having a really hard time selling GPT-4o! 👀
