The Huge Power and Potential Danger of AI-Generated Code

Programming can be faster when algorithms help out, but there is evidence AI coding assistants also make bugs more common.
Humanoid silhouette made up of binary code
Illustration: MARK GARLICK/Getty Images

In June 2021, GitHub announced Copilot, a kind of auto-complete for computer code powered by OpenAI’s text-generation technology. It provided an early glimpse of the impressive potential of generative artificial intelligence to automate valuable work. Two years on, Copilot is one of the most mature examples of how the technology can take on tasks that previously had to be done by hand.

This week Github released a report, based on data from almost a million programmers paying to use Copilot, that shows how transformational generative AI coding has become. On average, they accepted the AI assistant’s suggestions about 30 percent of the time, suggesting that the system is remarkably good at predicting useful code.

The striking chart above shows how users tend to accept more of Copilot’s suggestions as they spend more months using the tool. The report also concludes that AI-enhanced coders see their productivity increase over time, based on the fact that a previous Copilot study reported a link between the number of suggestions accepted and a programmer’s productivity. GitHub’s new report says that the greatest productivity gains were seen among less experienced developers.

On the face of it, that’s an impressive picture of a novel technology quickly proving its value. Any technology that enhances productivity and boosts the abilities of less skilled workers could be a boon for both individuals and the wider economy. GitHub goes on to offer some back-of-the-envelope speculation, estimating that AI coding could boost global GDP by $1.5 trillion by 2030.

But GitHub’s chart showing programmers bonding with Copilot reminded me of another study I heard about recently, while chatting with Talia Ringer, a professor at the University of Illinois at Urbana-Champaign, about coders’ relationship with tools like Copilot.

Late last year, a team at Stanford University posted a research paper that looked at how using a code-generating AI assistant they built affects the quality of code that people produce. The researchers found that programmers getting AI suggestions tended to include more bugs in their final code—yet those with access to the tool tended to believe that their code was more secure. “There are probably both benefits and risks involved” with coding in tandem with AI, says Ringer. “More code isn't better code.”

When you consider the nature of programming, that finding is hardly surprising. As Clive Thompson wrote in a 2022 WIRED feature, Copilot can seem miraculous, but its suggestions are based on patterns in other programmers’ work, which may be flawed. These guesses can create bugs that are devilishly difficult to spot, especially when you are bewitched by how good the tool often is.

We know from other areas of engineering that humans can be lulled into overreliance on automation. The US Federal Aviation Authority has repeatedly warned that some pilots are becoming so dependent on autopilot that their flying skills are atrophying. A similar phenomenon is familiar from self-driving cars, where extraordinary vigilance is required to guard against rare yet potentially deadly glitches.

This paradox may be central to the developing story of generative AI—and where it will take us. The technology already appears to be driving a downward spiral in the quality of web content, as reputable sites are flooded with AI-generated dross, spam websites proliferate, and chatbots try to artificially juice engagement.

None of this is to say that generative AI is a bust. There is a growing body of research that shows how generative AI tools can boost the performance and happiness of some workers, such as those who handle customer support calls. Some other studies have also found no increase in security bugs when developers use an AI-assistant. And to its credit, GitHub is researching the question of how to safely code with AI assistance. In February it announced a new Copilot feature that tries to catch vulnerabilities generated by the underlying model.

But the complex effects of code generation provide a cautionary tale for companies working to deploy generative algorithms for other use cases.

Regulators and lawmakers showing more concern about AI should also take note. With so much excitement about the technology’s potential—and wild speculation about how it could take over the world—subtler and yet more substantive evidence of how AI deployments are working out could be overlooked. Just about everything in our future will be underpinned by software—and if we’re not careful, it might also be riddled with AI generated bugs.