Currently, working with large language models like Claude on consumer-grade PCs is challenging due to hardware constraints. However, recent advancements in model size and efficiency have led to smaller and more efficient LLMs suitable for local development. The Qwen3.5 model series is a prime example of this trend.
To test these compact models, I set up my desktop system with an AMD Ryzen 5 processor, 32GB RAM, and an RTX 5060 GPU. I used the Continue extension to hitch VS Code to the LM Studio provider.
The Qwen3.5 models come in various sizes, including qwen3.5-9b@q5_1 (6.33GB), qwen3.5-9b-claude-4.6-opus-reasoning-distilled (4.97GB), and qwen3.5-4b (3.15GB).
I ran each model on LM Studio, testing different token lengths and GPU offload settings to optimize performance. The results showed that smaller models can provide comparable performance to larger ones.
The Qwen3.5 models delivered constructive suggestions for improving the codebase, including refactoring main entry points and adding support for environment variables. However, there were some issues with applying these changes, particularly when attempting to autonomously modify the code.
In conclusion, using compact local models for code development and analysis can be effective but requires careful consideration of hardware constraints and model performance limitations.
Source: https://www.infoworld.com/article/4144487/i-ran-qwen3-5-locally-instead-of-claude-code-heres-what-happened.html