PyPI admins try their best to identify and remove malicious Python packages, but many inevitably slip through. Python package aws-login0tool was recently discovered to be a malicious package attempting to typo-squat and has already been downloaded 600 times. Meanwhile, dpp-client was exposed as a malicious package attempting to collect environment details during installation and send them to an unknown web service. dpp-client is especially concerning as it had a good reputation (indicated by its large number of GitHub stars and forks) and had been downloaded 10,000 times in 2021!
Pretty scary right? Well here’s the real kicker, even if you’re an ultra-responsible DevOps team with a great package approval workflow, malicious packages can often sneak into your code because of package dependencies. This can also lead to the all too familiar “It worked when I built it yesterday” situation.
This article will explain how unwanted packages sneak into your code, describe how to use requirements.txt files to ensure repeatable builds, and show how Package Consumers can quickly identify which applications are using a specific package.
The Unintended Side Effects of Dependency Resolution
Python packages often depend on other packages known as dependencies. These dependencies can have their own dependencies, resulting in a complicated dependency tree.
If you’re building an application with Python and two packages require different versions of the same package, then Python will have a version conflict and your project may not build!
When downloading projects from pypi.org, the package dependencies are not listed. Instead, package dependencies can be seen using one of the following commands in Python:
- pip show: List dependencies of Python packages that have already been installed.
- pipdeptree: List the dependencies in a tree form.
- Pip list: List installed packages with various conditions.
When installing a Python package using pip install, pip will attempt to automatically work out the dependencies of the requested packages. During this process, pip will make assumptions about the needed package versions and then check if the assumptions are correct. If any assumptions are discovered to be incorrect, backtracking will be used.
Backtracking reduces the risk of a new package install accidentally breaking an existing package and messing up your environment. Pip will try its best to make sure it installs the ideal package dependencies, but it’s far from perfect and version conflicts will
inevitably occur. This is normally solved by using version specifiers.
Version specifiers are used to dictate what versions of a package are acceptable. This allows you to have more control over what packages are automatically added to your project, but it can have some undesired side effects.
Let’s say you install urllib3 from pypi.org. The required dependencies for urllib3 can be seen below.
One of the packages urllib3 depends on is cryptography. A version specifier is used to tell urllib3 to accept any version of cryptography that is greater than or equal to 1.3.4. This ensures that most version conflicts will be avoided, but as mentioned earlier it comes with some unintended consequences.
The screenshot above shows some of the updates for package cryptography. Since version specifiers were used to tell urllib3 to accept any version of cryptography that is greater than or equal to 1.3.4., your application would automatically include a new package anytime cryptography is updated.
The automatic acceptance of unverified third-party packages can lead to:
- Unpredictable builds that lead to “it worked when I built it yesterday” situations.
- Malicious or unsafe packages sneaking into your code.
Thankfully, Python has a tool that allows you to have more control over your package dependencies and help mitigate the risks that come with version specifiers: Requirements.txt files.
Best Practice: Use Requirements.txt for Repeatable Builds
When sharing a project with others, using a build system, or copying a project to any other location where you need to restore an environment, you need to specify the external packages that the project requires.
This is normally done by using pip freeze > requirements.txt, which records an environment’s current package list into requirements.txt. Since requirements.txt contains a pinned version of everything that was installed when pip freeze was run, it can be used to ensure that the packages originally used in development are the same ones used when someone else builds the application.
Utilizing requirements.txt files can be done in two simple steps:1. Use pip freeze to output installed packages suitable for a requirements file:
C:\> py -m pip freeze
2. Generate a requirements file and then install it into another environment:
env1\bin\python -m pip freeze > requirements.txt
env2\bin\python -m pip install -r requirements.txt
Requirements.txt files ensure predictable builds, but they don’t eliminate dependency problems completely.
Watch out for New Vulnerabilities
Even with version specifiers, requirement.txt files, and a package approval workflow, it’s possible that a new vulnerability will be discovered in a dependency. This vulnerability could then impact any applications that use it. So, what can you do?
You could manually inspect the requirement.txt file of all your applications to see which ones are using the unwanted dependency, but it’s a lot of work. Alternatively, you could use ProGet’s Package Consumer feature to do this automatically.
Best Practice: Use Package Consumers to Track Dependencies
ProGet’s Package Consumers feature shows all the applications that are “consuming” or using a specific package. To illustrate how this feature works let’s look at package urllib3 again.
Urllib3 is a very popular and powerful, user-friendly HTTP client for Python. As of the time of writing 4780 packages depend on urllib3. In 2021 a denial of service vulnerability was discovered in version 1.26.4. Any application using a package that depends on urllib3 and automatically updated to version 1.26.4, would then unknowingly have a vulnerable package introduced into their code!
This is a perfect use case for ProGets Package Consumer. After building your application, Package Consumers uses pgscan to scan the build output, search for the specific package versions consumed by the application and publish that data to ProGet along with your application’s name and version.
Thanks to the Package Consumer feature we can see that urllib3 1.26.4 is a dependency in applications ThatOtherApp, MyApp, & OtherApp. Now that we know which applications are being affected by urllib3 1.26.4 we can quickly make the relevant changes!
When a vulnerable package is discovered, finding out which applications are using this package as quickly as possible will be vital. This is exactly why we created Package Consumers 😀.
Python in the Enterprise
Utilizing requirements.txt files, a package approval workflow, and ProGet’s Package Consumer will help ensure predictable builds and keep unwanted packages out of them.
However, Python can be a complex language to work with and there is still a lot more to learn if you want to effectively use Python in the enterprise. Read our guide to level up your Python skills and master Python in the enterprise!