GitHub’s claim that its AI-powered coding tool, Copilot, produces high-quality code has been challenged by software developer Dan Cîmpianu. Cîmpianu argues that the statistical rigor of GitHub’s study on Copilot’s quality is questionable.
The study found that developers using Copilot had a 56% greater likelihood of passing unit tests and wrote 13.6% more lines of code without errors. However, Cîmpianu questions the validity of these results, pointing out that the sample size was small and that certain tasks, such as writing a basic CRUD app, may be included in training data used by Copilot.
Furthermore, Cîmpianu criticizes GitHub’s graph showing that 60.8% of developers using Copilot passed all unit tests, arguing that this number is likely due to the selection bias of the reviewers. He also argues that GitHub’s claim of improved code quality is misleading, as it only refers to coding style issues and not actual error reduction.
Cîmpianu cites a 2023 report from GitClear that found GitHub Copilot reduced code quality and another paper that found errors in ChatGPT, GitHub Copilot, and Amazon Q Developer. He concludes that if developers can’t write good code without AI tools like Copilot, then they shouldn’t use them.
The criticism highlights the need for more rigorous testing and evaluation of AI-powered coding tools to ensure their accuracy and reliability. While some developers see value in using AI tools as an alternative to web searching, others are concerned about the potential impact on code quality and security.
Source: https://www.theregister.com/2024/12/03/github_copilot_code_quality_claims