Blog post for Claude 3 5 Sonnet

less than 1 minute read

Categories: 🤖 Agents Complete: Terminé Key insights: Claude-3.5-Sonnet is really strong on agentic behaviour, much stronger than GPT-4o. Publication: 2024/10/22 Rating: ⭐️⭐️ Read: 2024/10/22

New agentic powerhouse: upgraded Claude 3.5 Sonnet leaves GPT-4o in the dust 🚀

Anthropic just announced two extremely impressive releases, with an improved Claude 3.5 Sonnet and a new Claude 3.5 Haiku ! (Haiku previously only existed in version 3)

⚡️ Claude 3.5 Sonnet: a reasoning powerhouse

▸ On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. 

▸ Yet on hard problems like Math olympiad problems (AIME ‘24), even though it’s in front of GPT-4o (16% vs 5%), it seems far behind the score of 74% announced by OpenAI o1.

▸ Available for the same price and speed as the previous Claude 3.5 Sonnet

🏎️ New Claude 3.5 Haiku:

▸ As blazing fast as previous Haiku but matches Claude 3 Opus performance

▸ Amazing 40.6% on SWE-bench (better than original 3.5 Sonnet!)

🖥️ On top of this, they release a revolutionary “computer use” feature:

▸ Claude can now control computers like humans: by clicking, scrolling and typing!

▸ Evaluated on OSWorld, an environment made fo revaluating end-to-end agents that use an OS with mouse and keyboard, Claude reached 22% score, leaving the previous best of 8% in the dust

The new Claude-3.5-Sonnet sounds like a game-changer for agentic applications. I really look forward to trying it in agentic workflows, especially on GAIA.

Game is on - now if OpenAI doesn’t react, they’ll start having a really hard time selling GPT-4o! 👀

image.png
image.png