Anthropic Claude 3 5 Sonnet Launched Beats Chatgpt 4O
Anthropic has kept the API price the same for the Sonnet 3.5 model with a context window of 200K tokens. For general users, it’s available for free on claude.ai (visit) and supports both image and document uploads. Keep in mind that there is a rate limit for free users.
Coming to benchmarks, Claude 3.5 Sonnet beats GPT-4o in nearly all benchmarks except MMLU and MATH, but the difference is very marginal. In HumanEval that tests coding abilities, Claude 3.5 Sonnet scores 92% whereas GPT-4o scores 90.2%. In GPQA Diamond which evaluates graduate-level reasoning, the new Sonnet model achieves a score of 59.4% whereas GPT-4o stands at 53.6%.
With 0-shot prompting in the MMLU test, Claude 3.5 Sonnet gets 88.3% and OpenAI’s GPT-4o model gets 88.7%. From the table, you can infer that Anthropic has developed a highly capable model that outranks both GPT-4o and Gemini 1.5 Pro models.
Next, Claude 3.5 Sonnet is also a powerful vision model and again does better than GPT-4o in various visual reasoning tests. It’s very good at understanding and transcribing texts from illegible images. It’s also excellent at interpreting charts, graphs, and illustrations.
Moreover, Anthropic has announced a new Artifacts tool for Claude which works like OpenAI’s Code Interpreter tool. The Artifacts tool generates the code and creates AI-generated content in a separate interface. It’s not just limited to Python as it can work with other programming languages as well. For example, I created an SVG image of the Taj Mahal with the Artifacts tool on Claude Chat.
Anthropic says Claude 3.5 Haiku and Claude 3.5 Opus are coming later this year. Overall, I am very impressed with Claude 3.5 Sonnet’s speed and intelligence. It seems I can finally replace ChatGPT 4o with Anthropic’s new model for my everyday tasks.