Identifying and Managing Vulnerabilities in Python Packages
by Crista Perlton, on Feb 11, 2022 8:26:00 AM
Did you hear about the malicious PyPI package that collects environment details and sends the info to an unknown webservice? Or the Trojan horse one that installs, fetches a .exe file from an non-descript domain, then attempts to execute? What about the package that targeted British-English speakers and tries to steal bitcoin?
Malicious packages must be avoided while browsing open-source sites. They can cause serious harm to your organization and your systems.
But did you know you should also be on the lookout for vulnerable packages?
You may be surprised there are two separate types! Don’t worry, I was the same before diving headfirst into Python package security.
It was difficult to separate the forest from the trees on this one, but I talked with my team and we agreed there was a lot of value in writing an article discussing the difference between a malicious and vulnerable package.
Especially since there’s two noticeably different ways to fight against them.
Python, Malicious Packages, and Vulnerabilities
A malicious package and a vulnerable package are similar but not equal.
A malicious package will contain malware. These packages use tricks to hide themselves, like through typosquatting, and once installed do bad things (like log keystrokes).
Look at the colorama and colourama typosquatting case for example.
colorama first released in 2010 to “make ANSI escape character sequences work under MS Windows.” It is a perfectly fine and normal package meant to help Python users.
The latter, colourama, is a form of typosquatting that was deliberately made to trick British-English users looking for "colorama." It released in 2017, copied the original’s code and added malware that checks the Windows clipboard for bitcoin addresses.
Python is the second most popular coding language, drawing in non-developers thanks to its low barrier of entry and the huge amount of libraries available on open-source sites.
This convenience of hundreds of free packages opens Python up to serious security breaches. Users new to Python will likely fall for typosquatting – not realizing they’ve used pip to install the wrong package because of a simple typo.
You’re probably already avoiding malicious packages (Typosquatters, Wheel Jackers, and Dependency Confusion). Obviously no one wants malware on their system. You can build up defense against these types of packages using tools like PyPI-scan or via a package approval workflow.
In a nutshell, a package approval workflow will ensure you or your team have a system to vet and verify any packages downloaded to your system. This can be done manually, via human intelligence, or with a tool and programmed rules.
Vulnerable packages, meanwhile, are normal packages designed with integrity and no malware, which have had reported vulnerabilities (usually long after publication), that can then be exploited. CVE or GitHub advisories collect, evaluate, and categorize these vulnerabilities.
As a WhiteSource paper reported, a large majority of Python vulnerabilities are low-severity, some are even trivial, but high-severity still exist.
All Python users (non-developer or veteran) will suffer from a vulnerable package through no fault of their own. Vulnerabilities naturally occur, and often they’re discovered by researchers or NGOs proactively looking for them.
The real danger of vulnerable packages is not being aware of their effect. Consider Log4Shell on Java. The remedy is simple enough: upgrade to the latest version. But which of your applications were affected and which other libraries depend on them? Not only that, how can you patch all the affected libraries and ensure nothing in your application breaks?
Protecting Against Vulnerable Packages
It is impossible to fully “protect” against vulnerabilities, since many are discovered long after a package is published. kramerica scanner is fine today, but in two years someone may discover and report a vulnerability.
No team or tool can predict this.
Instead, Python users must scan packages, both at the beginning of a build and regularly after pushed to production and incorporated into an application or library.
Googling “Python package scanner” gets results for static analysis – scanning your own written code for errors – or the tool pypi scan which can check PyPI.org for potential typosquatters.
You may stumble across pip audit: a great open-source tool for during a build. It scans “Python environments for packages with known vulnerabilities” that have been reported to the GitHub Python Advisory Database.
pip-audit could be considered the new standard. It’s Google-backed, requires no paid subscription, and operates well in both user and automated workflows. The authors even hope to integrate it fully into pip one day.
There are a few shortcomings, however. pip-audit doesn’t include the vulnerability severity rating, so a user must further verify themselves. It can only scan for known vulnerabilities at build. E.g. it can’t scan in-use packages in published applications.
Automate Vulnerability Scanning with ProGet
pip audit is great for a build, but you need a tool that will routinely scan new and old packages for vulnerabilities. Otherwise, you could be operating a time-bomb application with a flagrantly vulnerably package.
ProGet’s Vulnerability Scanning feature can automate both your scanning and assessment. You can set rules so packages with high-severity vulnerabilities will be blocked from being downloaded. ProGet also remembers previously made decisions on specific vulnerabilities, so it automatically applies the same decisions to newly scanned packages.
ProGet uses CVE and NVD, referring to a much broader range of vulnerabilities compared to pip-audit. Tools like ProGet can also allow users to scan for vulnerabilities and identify which applications have been affected with the package consumers feature.
For more details on using ProGet to scan for vulnerabilities check out our step by step guide.
Vulnerability Scanning is not a Catch-All
Tools are great for cutting down on manual process and automating routine, pesky tasks, but it shouldn’t be considered the end-all-be-all. Packages without reported vulnerabilities today does not mean no vulnerabilities tomorrow.
Routinely scanning, along with human-intelligence assessment of results – like how to act post-vulnerability identification – can help you maintain secure libraries and applications.
Malicious packages are easy to avoid; it’s the vulnerable packages, unknown until discovered, that can seriously impact your applications.
Sign up for our upcoming Python in the Enterprise eBook for more best practices: