Article

Code Snippet Detection is Critical for Software Supply Chain Risk Management

Nov 5, 2024

Software Composition Analysis (SCA) has long been a key tool in managing software supply chain risk , particularly open source license compliance and security vulnerabilities. As software dependencies have grown more complex, while developers leverage community and contribution sites, and now also adopt of GenAI coding assistants, a critical capability of SCA tooling is the detection of code fragments, also known as “snippets.” Our extensive software audit experience at FossID has revealed an increasing prevalence of open source that would otherwise go undetected by basic dependency analysis tools.

The Prevalence of Code Fragments

Looking just at the past four years of software audits conducted by FossID we have found that 89% of the codebases we analyzed contained snippets or fragments of open source software libraries. That’s nearly 9 out of 10 codebases containing pieces of code that could pose significant risks if not properly identified, inventoried and managed.

While entire open source components are typically scrutinized for license compliance, code snippets often fly under the radar. Yet they are frequently the result of copy-paste coding practices, using example code found in user contribution sites and documentation, or leveraging generative AI coding tools like GitHub Copilot (which sometimes suggest code fragments from the open source repositories that they are trained under). These code fragments whether they be a small subset or significantly modified from their original, can have profound legal and security implications.

The Legal Risk: Copyleft and License Propagation

A key risk associated with snippets is the potential legal exposure, especially when they come from open source libraries with copyleft licenses. Copyleft licenses, like the GPL family, impose obligations that can trigger the sharing of proprietary code under the same license if the conditions (for what is considered derivative work) are met. In our experience, 25% of the snippets we find in open source audits come from libraries with copyleft licenses, posing significant legal risk. This means that 1 in 4 code snippets can lead to license propagation, potentially forcing companies to open their proprietary software.

FossID Audit Findings

Considering that 89% of codebases audited over the last four years included snippets of open source software, and 25% of those found had copyleft licenses, on average you have a 22.5% chance that your codebase is at legal risk you would otherwise be unaware of without code snippet detection.

FossID Audit Findings

Growing Industry Awareness: GitHub’s Code Referencing for Copilot

FossID isn’t the only one raising the alarm on snippets. GitHub recently introduced Code Referencing for GitHub Copilot, a feature designed to improve the transparency around code suggestions generated by AI. This tool identifies the origins of code snippets (provided that the origin is a GitHub repository created before November 2021), emphasizing the importance of tracking the licensing and copyright information even in automatically generated code. GitHub’s move to highlight the source and license of snippets used by developers only reinforces the need for SCA tools to effectively identify and manage these fragments.

Future Growth in Snippet Detection

The software industry is trending towards increased reliance on AI-assisted development tools, open source components, and collaborative platforms like GitHub. This growth will only amplify the prevalence of code snippets in software projects. The scope of SCA must include robust snippet detection capabilities to ensure that legal, security, and compliance risks are mitigated early in the development process.

Conclusion

The significance of detecting snippets during software audits is no longer debatable – it’s a necessity. The risk of copyleft license propagation through small fragments of code can have wide-reaching consequences for companies, and the need to manage these risks is only growing as software development evolves.

Jon Aldama, Chief Product Officer

Jon Aldama, Chief Product Officer

Jon Aldama, Chief Product Officer and co-founder of FossID, enjoys speaking and writing on topics related to open source software license compliance and security vulnerabilities, software development lifecycle management, and user experience (UX).

Table of Contents

    Sushi Bytes Podcast

    Talk to a Software Supply Chain Ninja

    Book a discovery call with one of our experts to discuss your business needs and how our tools and services can help.