While planning the short- and long-term goals for your business, there are four open source compliance challenges that the software industry at large needs to address; Scale, Accuracy, Cost and Speed.
Depending on your company size, your open source policies, your types of products etc, the solutions to these challenges may vary, but regardless, they are oftentimes intertwined and manifest themselves in similar ways.
As a company who offers compliance and security products and services, we are constantly identifying current compliance challenges, and thinking about their remediation, and this is what we found:
If you are a small startup company, you may not experience this challenge as much as larger, established companies. We work with Fortune 100 companies with tens of thousands of developers who wake up every day and do two things:
- Develop new software, and
- Reuse and repurpose open source software for deployment in the company’s products and services. This re-use comes in two forms: reuse open source components as a whole or parts of in the form of code snippets (i.e. copying a varying number of lines from an open source component to reuse in another component)
With thousands of developers actively focusing on these two tasks, the scaling concern is a real one.
- Can we scale the organization’s processes to manage this large influx of open source software?
- Can we scale the organization’s processes to manage the contribution aspect to open source software?
- Are the tools deployed able to handle the level of open source activity?
- Are there any unnecessary checkpoints that need to be optimized or even eliminated to allow more fluidity in how the organization deals with open source software?
- Do the compliance tools provide the facilities to integrate with build systems with and make the process of identifying source a bit more seamless and transparent?
- Are the compliance tools in place programming language agnostic?
All of these are valid questions that organizations need to ask and challenge themselves to optimize their open source compliance operations in a way to be able to scale up or down depending on their use of (and contribution to) open source software.
If you are an open source compliance professional, one of your top concerns is the accuracy of identifying the origin and license of source code. At its core, the primary goal of the open source compliance effort is to identify the origins of the code and the license and from there, plan to fulfill the license obligations accordingly. However, accuracy remains a challenge due to the following reasons:
- Some of the existing tools do not maintain what is commonly called a “knowledge base” – a database of all known open source code. They rely on scanning the code and presenting the discovered license and copyright information.
- Other tool providers (those who maintain a knowledge base) have the challenge of keeping their knowledge base up to date with the fast pace of open source development. However, many only update their knowledge base every few months and this is often too late for early adoption of open source software.
- Many tool providers do not support snippet search functionality. They are only able to discover open source components that are used as is. Generally speaking, such tools are unable to discover any code snippets that may have been copied from one open source component to another or to a component licensed under a proprietary license.
- Which code match is the right one? This is a familiar question for the compliance professional scanning code and getting dozens to hundreds of hits as possible matches for the code in question. As an example, let’s use zlib, a very popular data compression library whose code is re-used in thousands of other open source components. If you scan a software component whose code is originating from zlib, most tools on the market today have a real problem identifying the originating source of that code and its license. They typically provide the user of the tool with hundreds of matches with different licenses (zlib has a liberal license and code copied from zlib often gets re-licensed under the license of the target component). How can a compliance professional, who may not be a developer, be aware of these scanned software components and be able to resolve this issue? It’s a challenge that tool providers need to address.
Open source compliance is often regarded as a supporting function and every engineering executive wants to limit spending on it while at the same time, ensuring that they are in compliance with applicable open source licenses. But how do you keep the cost of ensuring compliance at reasonably low levels? There are many considerations which include:
- Cost of the licensing fees for the tool
- Cost of any specialized customization and integration within the organization’s infrastructure
- The initial cost of the server hardware required to run the tool and the ongoing cost of maintaining these servers
- Cost of resources dedicated to ensuring compliance, which includes the staff who must go through all the false positives and clear them out
Ensuring compliance can get costly and one of the challenges going forward is how we as an industry can make sure that the costs remain in check.
Similar to real estate where “location, location, location” is the common saying, “speed, speed, speed” is a common theme in open source compliance. How can your compliance effort be on par with your development efforts and not behind? When you have thousands, and in some cases, tens of thousands of developers writing original code and re-using open source code, how can you ensure compliance with that huge body of code that need to be scanned, identified and tracked?