Article

What You Don’t Know Can Hurt You: Open Source Compliance Surprises from Real Audits

Aug 21, 2025

The Hidden Dangers Beneath the Surface

Many teams believe they’re on top of open source compliance. The SBOM is in place, packages are tracked, and internal policies are being followed. But real-world audits often tell a different story. What’s in the codebase and what teams think is in there rarely match up completely.

At FossID, we conduct open-source audits across different industries, company sizes and even development cultures (i.e. agile/waterfall, open/closed-source, centralized/decentralize). Despite that diversity, the same patterns appear time and again: overlooked components, reused code fragments, mismatched licenses and missing obligations. The surprises are not just technical; they also carry legal, operational and reputational risks.

This article walks through five of the most common and surprising compliance issues we’ve uncovered during real audits. These aren’t theoretical risks. They are practical examples of how even well-intentioned teams can miss the mark and what you can do to address those gaps before they become problems.

Surprise 1: Hidden Licenses in Unexpected Places

What happens: Many assume that if a project does not include an obvious license in the root directory or on the main code files, then it is safe to use. In reality, open-source licenses can be hidden in subdirectories, meta files, documentation, or even embedded in comments within code. Some licenses also apply by reference, meaning the legal obligation exists even if the license text is not in the main files.

Real-world example: A common source of snippets in audits is GitHub Gists, where licenses are non-existent, unclear or specified separately. For example, a Python module shared via the Gist s-zeid/transparentwindow.py provides code for creating transparent application windows but includes no license in the file itself (see Figure 1.0). The author has another Gist containing a license file stating that all their Gists are licensed under the X11 license (see Figure 1.1). While permissive, the code still carries legal obligations, demonstrating that posting a snippet on Gist does not place it in the public domain.

Figure 1.0 – Screenshot of s-zeid/transparentwindow.py showing no license included in the file.

Figure 1.1 – Screenshot of separate license file in the same author’s Gists stating X11 license applies.

The clearest way for authors to declare a license is to include the full text directly in the source file, typically at the top in a comment block with copyright information. This removes ambiguity and ensures users understand the permissions and obligations. For guidance, see Jeff Luszcz’s article Getting the Gist of GitHub Gist Licensing.

Why it matters: Even permissive code carries obligations. Licenses can be hidden in subdirectories, comments, documentation, or separate files, and some apply by reference rather than being included in the main source. Without careful review, development teams can inadvertently introduce legal obligations into projects. That’s why audits need experts who know where to look and SCA tools that can detect licenses and copyright notices within source files.

Surprise 2: Reused Code That No One Realized Was Open Source

What happens: Developers often copy snippets from Stack Overflow, GitHub repositories or even old internal projects without preserving the context or license. Over time these fragments get modified, extended and embedded deep into a codebase, and eventually no one remembers they were originally open source.

Real-world example: In a recent audit, we identified a snippet taken from a Stack Overflow discussion titled “How can I make sticky headers in RecyclerView (without external libraries)?” (see Figure 2.0). The question asked how to implement sticky headers in mobile development without bringing in external libraries, possibly to avoid license complications. Ironically, the accepted answers contained code that was copied directly into a project, and because all contributions on Stack Overflow are published under the Creative Commons Attribution ShareAlike license (CC-BY-SA), a weak copyleft license, this resulted in licensing obligations for the project.

Figure 2.0 – Screenshot of the Stack Overflow question: “How can I make sticky headers in RecyclerView (without external libraries)?”

Stack Overflow makes this clear in their site footer, which states: “user contributions licensed under CC BY-SA.” The link directs users to the licensing help page, “What is the license for the content I post?” (Figure 2.1), which explains that contributions posted today are licensed under CC BY-SA 4.0, while earlier posts remain under the 3.0 or 2.5 versions.

Figure 2.1 – Screenshot of the Stack Overflow licensing help page: “What is the license for the content I post?”

Why it matters: Even small fragments of code carry copyright and licensing obligations. In this case, a developer set out to avoid external dependencies but still introduced a license obligation through copied code. Reusing snippets without recognizing their license can result in missing attribution, incompatible obligations or reputational exposure. This is a reminder that even the smallest fragments require the same level of diligence as full components.

Surprise 3: Deprecated or Abandoned Components Still in Use

What happens: During audits, it’s common to uncover open-source components that are no longer maintained or even hosted by their original project. These often remain in a codebase because they were added years ago and never removed, or because teams didn’t even realize they were there. In some cases, the upstream repository has been archived or deleted entirely.

Real-world example: In a recent audit, our team identified full matches to the Wappalyzer component. Originally an open-source project licensed under the MIT License and later transitioning to the GNU General Public License v3.0. The project was deprecated and went private in August 2023, and it is no longer hosted on GitHub and shows as deprecated on npm (see Figure 3.0), with historical pages accessible only via the Internet Archive Wayback Machine (see Figure 3.1).

Figure 3.0 – Wappalyzer project page on npm showing deprecation notice.

Figure: 3.1 – Archived GitHub repository for Wappalyzer via the Internet Archive Wayback Machine

Despite the original source code no longer being publicly available, our findings show that it can still be used or sourced from other places, illustrating how abandoned or deprecated components still appear in active codebases.

Why it matters: Using deprecated or abandoned components does not just create license risk, it also introduces security and operational risk. If a vulnerability is discovered, there is no upstream community to issue patches. From a compliance standpoint, license obligations still apply, yet you are relying on unmaintained code that could expose your business to serious consequences. This example also highlights the importance of using an SCA tool backed by a comprehensive knowledge base, such as FossID’s OSS Knowledge Base, which references billions of open-source projects, files and snippets, allowing it to detect components that are no longer publicly hosted.

Surprise 4: Declared Licenses Don’t Always Match the Code

What happens: Projects often include multiple signals about licensing, such as in a README, LICENSE file, package.json, or elsewhere. These signals do not always align. We have seen cases where a README claims MIT, while the actual license file is GPLv3, or where embedded files carry different or conflicting licenses from the top-level declaration.

Real-world example: The component @pkgjs/parseargs on npm has a declared license of MIT, taken from the package.json, while the LICENSE file specifies the Apache License 2.0 (see Figure 4.0).

Figure 4.0 – @pkgjs/parseargs on npm showing declared license of MIT license pulled from package.json and LICENSE file containing Apache License 2.0.

Earlier this year, issues were raised on the project’s GitHub, showing the mismatch and asking contributors if they were happy to release their code under MIT.

The change has been made in the main branch (see Figure 4.1), but as there has not been a tagged release, the project continues to show a mismatch on npm. So what license is this project actually under? Without a tagged release, the previous versions remain ambiguous, leaving users unsure which license applies. This discrepancy has been identified in multiple audits, highlighting how easily such conflicts can persist unnoticed in active codebases.

Figure 4.1 – GitHub page for pkjs/parseargs showing license change in the main branch

Why it matters: A license declared incorrectly can mislead your compliance process and create serious downstream risk. You may assume a component is permissively licensed when it is not, or worse, ship copyleft code without fulfilling its obligations. These inconsistencies often go unnoticed until a detailed audit exposes the conflict.

Surprise 5: “Unknown” Licenses Aren’t Neutral

What happens: During audits, tools often flag files with “unknown” or “unclassified” licenses. These are sometimes overlooked or treated as low risk. But “unknown” doesn’t mean “safe”, it usually means the project hasn’t declared a license at all, the license text has been stripped out or the terms are ambiguous or unclear.

Real-world example: In a recent audit, our team identified a snippet from the component jQuery.connectingLine. The project had no declared license on GitHub. There was no license file and no mention of licensing in the README or elsewhere in the repository (see Figure 5.0). This lack of clarity made it impossible to know whether the code could be used safely, highlighting the legal uncertainty that “unknown” licenses create.

Figure 5.0 – jquery.connectingLine on GitHub with no license file, no declared license and no mention of licensing in the readme or elsewhere in the repository

Why it matters: You cannot comply with a license if you do not know what it is. When a project lacks a declared license, it creates legal uncertainty and, in most jurisdictions, using code without clear permission can expose you to risk. “Unknown” should never be treated as low risk; these components often need closer investigation rather than being overlooked.

Generative AI: A New Twist on Old Risks
Generative AI doesn’t create new categories of compliance risk, but it can make all five surprises more common. AI-generated code may reuse open-source fragments without attribution, copy deprecated libraries, or introduce “unknown” licenses. In other words, AI accelerates development, but it can also accelerate risk.

What You Can Do: From Risk to Readiness

Whether these risks are introduced by human developers or by AI tools, they can be managed with the right practices.

Use SCA tooling with a comprehensive knowledge base: Choose tools that go beyond package managers and can detect embedded or copied code, even if the original project is no longer hosted. This improves visibility and reduces the chance of hidden surprises.
Improve your SBOM coverage: Don’t just rely on package managers. Scan source code to uncover hidden or copied components.
Investigate unknowns: Any component with an unclear license should be reviewed and documented. If you can’t verify the origin, consider removing it.
Schedule internal audits: Even once a year can make a difference. Codebases drift, and so does compliance.
Train developers: Make sure the team knows what they can use, what they can’t and what to watch out for when copying code.
Integrate into CI/CD: Compliance needs to be part of your workflow. Don’t just scan – triage and resolve.

Conclusion: Visibility Comes First

The biggest failure in open-source compliance is assuming everything has already been handled. Most surprises aren’t caused by recklessness, but by a lack of visibility and follow-up.

At FossID, we help companies uncover what’s actually in their code. From reused snippets to legacy components, our audits reveal what build tools often overlook. An advanced SCA with a comprehensive knowledge base can even detect components whose original source is no longer hosted, helping you identify risks before they become issues. Because if you don’t know what’s there, you can’t manage the risk. And with open source, what you don’t know really can hurt you.

Generative AI makes all these risks more complex. AI tools can suggest code that reuses open-source fragments without attribution, recommend libraries that are deprecated, or introduce components with unclear licensing. In practice, this means every one of the five surprises can appear more often and with less clarity. That’s why visibility and careful review are more important than ever.

Gary Armstrong is dedicated to empowering businesses to harness the benefits of open source software while ensuring legal compliance and security confidence. Backed by more than a decade of experience delivering open source security and compliance services, Gary shares his insights and best practices through writing and speaking engagements with the open source community.