Protect Yourself from Litigation due to Unexpected Python License Agreements
Did you know approximately 13.6% of packages on PyPI have a GPL-3 license? That means there are over 45,000 packages operating under a Copyleft practice – which when violated can lead to a lawsuit.
Copyleft material freely allows the distribution and modification of the associated intellectual property and requires derivative works to hold the same Copyleft policy. Basically, you’re free to use the Copylefted material, but anything you produce using the original material must also be freely distributed and open to modification.
A Korean software company suffered real-world financial consequences for violating a Copyleft clause: The case Artifex vs. Hancom states the defendant violated the open-source GPL-3 License agreement by selling software using GPL-3 licensed code.
Casually browsing and downloading from PyPI.org can have major consequences. Verifying which license is included in open-source Python packages could save you a trip to court.
Incorporating Packages into Established Policies
Most companies already enforce policies about downloading third-party software without prior approval from managers or IT. Obviously, no one wants employees to download malicious software.
Expanding this policy will help the Development team find and download packages from PyPI and other open-source Python sites.
Consider pip; it’s built-in to Python and can install packages from multiple sites (usually PyPI.org and local). A simple command like
pip install can source a generic package (
my-package) from either internal or public servers. This leads to huge vulnerabilities as anyone can upload a malicious package to a public server.
To up security, it’s highly recommended an organization extend its third-party software policy to open-source packages. This will help avoid packages with unwanted licenses.
Python Packages are Legally Binding
Even a trusted package, that’s clearly not malicious, can pose a threat due to its license. The GPL-3 license could cost your company legal fees and potentially cost you the Copyright of your proprietary software.
Python packages are legally Copyrighted material; when someone creates a package, they own its copyright – except when they publish to the Public Domain. The author decides how others can use their Python package through the license agreement.
By downloading a Python package, you, or you on behalf of your company, are agreeing to the terms – especially if the contents of the package make it into your final application.
On a PyPI project page, the license is found under “Meta” along with other information like package authors and requirements.
Over 10% of packages on PyPI don’t outright list the license used; listing a license isn’t required when adding a package to PyPI. An unlisted license means a user would have to download the package and investigate further, opening the chance of a malicious attack.
Your Licensing Policy Needs To Include Packages
Adding packages to an existing licensing policy is easy. Just ask for permission.
When an unapproved package is deemed necessary, an authorized person or “approver” – like a manager, the legal department, or even a CEO – can check the contents of the package’s license and approve or disapprove its use. Once approved, the developers can immediately download the use the package.
Fortunately, many Python packages on PyPI.org use the same license agreements. So, once one package with a certain license is approved (ex. MIT License Agreement), any package with the same license is good to go.
It sounds easy enough to “check a package’s license,” but there are multiple places a Python author can list a license, so an approver must check all these locations to ensure there’s no foul play.
How Licenses are Expressed in Python Packages
When creating a Python package, there are three license options:
- License classifier (most common)
- And file
The License Classifier
A “trove classifier” helps categorize and describe each package released. They can be used to state what a package does, who it’s directed towards, and various metadata points like framework, language, and license.
The license has several classifiers available, for example:
- License :: OSI Approved :: Apache Software License
- License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
- License :: OSI Approved :: MIT License
- License :: Free To Use But Restricted
- License :: Freeware
Most “License :: OSI Approved” licenses are open-source and made explicitly for packages that are also open-source. MIT and Apache are two of the most popular.
The license classifier helps point a package user toward the license. By including the classifier, the author doesn’t have to include the license text in the package metadata. Instead, a user could browse the license themselves, or recognize the classifier and already know the contents.
The License Metadata Field
Authors who choose to use the
license field can input the text of a license within their Python package.
license field is typically used by authors who have special or proprietary licenses that cannot be categorized by a pre-set trove classifier.
As seen on the Python.org site, you can write whatever you wish:
License: This software may only be obtained by sending the author a postcard, and then the user promises not to redistribute it.
Most authors, however, will use the field to duplicate the classifier and copy/paste the open-source license text.
The LICENSE file
Finally, a python author can include a
LICENSE file within the package containing the license text. Many open-source projects on PyPI include a copy of the open-source license.
⚠Don’t Always Trust the License Listed
A best practice while downloading packages from PyPI is treating the classifier as the package’s ‘true’ license, but be sure to check all potential license fields (e.g. classifier, field, and file) because someone may act maliciously.
Consider this example: a package has a “License :: OSI Approved :: MIT License” classifier and the
license field will say “see license file,” but the
LICENSE file says “by using this package you agree to send the authors $1K.”
This type of package isn’t traditionally malicious (like “colourama” vs. “colorama”), but it can still be reported as malicious by a user. Packages uploaded to PyPI.org are not vetted before the project page is published, so users are advised to confirm contents and licenses before downloading.
Most likely, however, licenses like our example would be unenforceable, as the terms are vague and there could never be a mutual agreement between the two parties.
It’s best to consult a stakeholder or the company’s legal entity about a Python license. In fact, a license approval process can easily be a part of an overall package approval workflow for your third-party Python packages.
Automatically Scan and Approve Package Licenses
A manual workflow can be automated with a tool that can scan a package, check what license it has, and approve or disapprove its download.
Take ProGet, for example. It can recognize the metadata of a package and set parameters to allow or disapprove package downloads based on said metadata.
For licenses, ProGet ships with a comprehensive list of SPDX codes. You can configure ProGet to filter licenses at the feed level and block inappropriate licenses, like GPL-3, from downloading.
Python operates on classifiers so ProGet will not inherently recognize a package’s license. Instead, you can manually enter the licenses and every time thereafter ProGet will recognize the license going forward.
An automatic system configured by a team lead, with approval by an authorized person, will greatly cut down on time spent manually shifting through Python package licenses.
Small Steps to Better Efficiency
Integrating Python packages into a company’s existing third-party software policy allows developers to download from PyPI.org without constantly checking metadata – because it’s already been done for them!
A team lead or department head can set standards via a manual approval process or an automatic tool like ProGet and know they’re not at risk of legal trouble.
Filtering by licenses, along with other best Python practices like package approval workflows and SemVer for Python packages, will boost your team’s Python development.