Researches have discovered that Copilot can leak the content of private GitHub repositories if they were originally posted as public.
The problem can be attributed to the cache mechanism in Bing. Essentially, Copilot would index web pages when these repositories were public, but would not remove them if they were privated later.
Microsoft’s Copilot AI assistant is exposing the contents of more than 20,000 private GitHub repositories from companies including Google, Intel, Huawei, PayPal, IBM, Tencent and, ironically, Microsoft.
These repositories, belonging to more than 16,000 organizations, were originally posted to GitHub as public, but were later set to private, often after the developers responsible realized they contained authentication credentials allowing unauthorized access or other types of confidential data. Even months later, however, the private pages remain available in their entirety through Copilot.
AI security firm Lasso discovered the behavior in the second half of 2024. After finding in January that Copilot continued to store private repositories and make them available, Lasso set out to measure how big the problem really was.
Privacy-violations aside, this can also be a major security risk for your project.
Developers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.