Opus 4.5’s Coding Woes: 50% Failure Rate, Glitchy File Handling

I tested Anthropic’s Opus 4.5 to see if it’s truly the best in the world for coding, but things quickly got weird. Despite its bold claims, Opus 4.5 failed half my coding tests, struggled with basic file handling, and still had reliability issues.

My experience with Claude Code was different. It worked well when used in an agentic environment with a professional programmer’s supervision. However, testing Opus 4.5 in a chatbot interface resulted in a 50% failure rate. The model’s inability to download files correctly, combine them into a single file, and run the code led to frustration.

Opus 4.5 passed some tests, such as identifying bugs in code, but its performance was inconsistent. It struggled with simple tasks like rewriting a string function or fixing basic errors in JavaScript. The model’s limitations became apparent when it couldn’t handle edge cases, such as allowing for two-digit decimal numbers or rounding off values.

While Anthropic claims Opus 4.5 is the “best model in the world for coding,” I disagree. Its performance was far from perfect, and further improvement is needed before it can be considered reliable.

Opus 4.5’s issues highlight the limitations of AI-powered coding tools. While they show promise, they still require significant development to overcome their current shortcomings. As a coder, I’ll continue to test these models and provide feedback to help improve their performance.

Source: https://www.zdnet.com/article/i-tested-opus-4-5-to-see-if-its-really-the-best-in-the-world-at-coding-and-things-got-weird-fast