How to Vet and Approve PyPI Packages
Open-source packages are a minefield. For Python alone, nearly 150 vulnerabilities have been found in over 40% of packages in PyPI. Some of these vulnerabilities are mild and logistically impossible to exploit. Others allow hackers to run bash commands on the PyPI codebase to find its way through. Without a barrier between your organization and the minefield of PyPI, it’s easy for these packages to sneak into your codebase, possibly wreaking havoc.
A lot of organizations have discovered that setting up a simple package review and approval process has reduced the amount of time they spend debugging and rewriting. Creating a simple package approval pipeline isn’t just easy, it’s your codebase’s biggest defense against vulnerabilities, unacceptable licenses, and low-quality PyPI packages.
But while the actual review part is easily trained on, how do you actually set up a package approval workflow? After a team member receives a package request, how do they let the team know the package is good to use? Post in the group chat? Start a forum? Just pass by and give them a thumbs up?
This article will outline how to create and operate a package approval workflow for PyPI packages your team plan to use.
What is Package Approval Workflow
A package approval workflow is just like a code review but for your open-source packages from PyPI.
Basically, if someone asks to use a package, it’s reviewed by a trained “Approver”, then after the package is reviewed and approved it’s made available to Python users. This ensures that a trained set of eyes get on every single PyPI package before it makes its way to Python coders, and eventually production. The PyPI package is then put into a repository like ProGet so all Python users can easily access all the packages they need.
Many organizations find that setting up a package approval workflow is their most effective method of stopping vulnerabilities and unacceptable licenses before they ever find their way to production.
Since both Python coders and build servers access your repositories, it’s important to create an easy-to-follow standard process for both. When creating your PyPi package approval workflow, make sure to make it easy to request and easy to use so Python coders will actually use it and not go around it.
How to Set Up Your PyPI Package Approval Workflow
To create a PyPI package approval workflow, you first need a minimum of 2 feeds (Ideally 3.) One for unapproved packages and approved packages. Of course, the necessary number of feeds varies by organization. If you have multiple teams/groups that require separate feeds with separate permissions, you may want to do one feed per team/group (e.g. cleveland-python, chicago-python.) While this lets you restrict teams and manage packages better, this makes the approval workflow much more complex.
But generally, 3 is enough for small teams: “Unapproved”, “Approved”, and “Internal + Approved”
|Unapproved: Packages from PyPI||Read-Only For Devs||Raw packages from PyPI.|
|Approved: Packages that have been reviewed||Read-Only For Devs|
Approvers can Promote Here
|Only approved PyPI packages.|
|Internal + Approved Packages||Read-Only For Devs|
Read-Write For Build Servers
|All available packages.|
You can use any package manager you’d like but for this example, we’ll create a feed using ProGet.
A feed in ProGet is used to store Python packages, and Python coders can connect to different feeds to see what’s packages are available to use. They can also “proxy” packages from another feed using connectors.
Setting up a Python feed in ProGet is simple. Navigate to “Feeds” in the ribbon and select “Create New Feed.” After creating your new feeds, click on Python and fill in the feed details including name, description, and feed usage.
Unapproved Third-Party Python Packages
This feed is where an approver will review requested packages. Only approvers should have access to edit/promote packages from this feed to the “Approved” feed.
This is the only feed that should have a connector to PyPI.
Approved Third-Party Python Packages
The package has now been approved and promoted to this feed. This feed should still be read-only for most devs since approved packages are connected and available in the Internal + Approved Package feed.
Internal + Approved Third-Party Packages
This feed should contain both pre-release and stable internal packages.
In this example, we’ve created a feed clearly labeled mycompany-python that all devs should have access to. Having only 1 feed makes it convenient and easy for devs and your build servers to access all the packages they need/are permitted to use.
Now that your feeds are set up, it’s important to make sure the right team members have access to the right feeds. In ProGet, this is done through “permissions” and “package promotion.”
Package promotion is the process of…well, promoting packages between feeds to ensure that only approved and verified packages are used in the right environments. We recommend only Python users that have been trained to verify and approve packages have access to promote packages from “Unapproved” to “Approved.”
Python Users/Coders (Read-Only)
Python users should generally only have access to the Internal + Approved Packages feed. Providing Python users with a single feed with all packages they need (e.g. mycompany-python) allows them to work without worrying whether a package is “safe” or not.
Approvers (Promote to Approved)
Approvers (i.e. Python users who’ve been training on how to verify PyPI packages) should be the only ones that have promote-access in ProGet. Approvers are responsible for reviewing packages and promoting them from “unapproved” to “approved.”
Before a Python user becomes an “approver”, there are three things they should be trained to review:
- Licensing; does this package have an appropriate license
- Vulnerabilities; is this package generally safe to use
- Quality; is this a high-quality, Enterprise-ready library
Build Server (Publish to Internal)
Just like your Python users, Build Servers access your feeds. So, it makes sense to create rules and standards that your build servers should follow as well.
Build Servers should use an API Key that has permission to consume from and publish to the Internal + Approved Third-Party Packages feed (mycompany-python) and should only be able to access that feed. In order to publish, the build server will have to receive special access to that feed.
This way builds will fail if unapproved third-party packages are used and give Python users immediate feedback. When that happens, a Python user can recognize package is not approved, and submit approval. Following CI/CD best practices, only the machine can continue to create builds, not Python users.
Connectors allow ProGet feeds to include packages from an external source, whether it’s another ProGet feed, a public gallery like PyPI, or another third-party package source like Azure DevOps Packages or Artifactory.
Your “Unapproved” feed should be the only feed with an external connector. Packages are then promoted by your approvers from “Unapproved” to “Approved,” which has an internal connector to “Internal + Approved Third-Party Packages.”
Setting up connectors in ProGet is straightforward. After creating your feed, ProGet will prompt you to create a feed. After filling in the popup, your unapproved feed should be connected to your public Python gallery of choice.
You can easily automate part of the approval process, allowing “pre-approved” packages through and blocking “pre-rejected” packages through the use of connector filters. Connector Filters allow you to implicitly allow/block:
- All Versions of a Specific Package
- All Versions With a Specific Name
Setting Up a Connector Filter in ProGet
Navigate to your Python feed in ProGet and click on “Manage Feed” > “Connectors & Replication” > and click on your connector.
By default, ALL packages are allowed. So, in order to only allow packages, you want you must first create a rule that blocks all packages. To do this, click “Add Filter”
From here, you can simply add all the packages you want “pre-approved.”
It’s also possible to configure what it means when a package is “blocked” by clicking “configure filtering.” In all cases, a blocked package cannot be downloaded. But you have the option to partially hide (i.e. you can navigate to the package in the UI directly by URL), or totally hide (you can’t even browse to it).
What To Allow/Block
Don’t Add Rules for Infrequently Updated Packages (e.g. requests)
requests is a package your Python users need access to. However, since it’s only updated once or twice a year, it’s much more reasonable to have it manually reviewed.
Allow Integral Packages That are Frequently Updated (e.g. constructs)
constructs is an integral package Python users need access to and is updated up to ten times a week. That would be a lot of manual approval. Using a Connector Filter is the perfect way to balance package verification and an approver’s schedule (and sanity.) And since constructs is a package by AWS, you can trust that whatever they release will most likely be of acceptable quality.
Block Packages with “Trusted” Prefixes (e.g. mycompany.*)
This one might seem counter-intuitive but let me explain. Yes, you should use your company’s name as a prefix. But internal packages should be kept in your internal repositories. There’s almost no reason you should be sharing your proprietary packages on PyPI. Downloading and using these packages open up organizations to dependency confusion attacks.
Block Commonly Misspelt Packages
Typosquatting is a type of attack that targets Python users who incorrectly type a package name. Attackers give malicious packages similar names to already existing legitimate ones for example:
Which one is the malicious package? A quick manual check is necessary. While prompt-toolkit seems to be alive and well, prompt-tool-kit seems to have only 1 update…fishy…
It’s often difficult (if not impossible) to determine if a package is typosquatting inline. When doing your verifications and approvals, check to see any possible misspellings and add them to your “Block” list as you go along.
Python coders shouldn’t decide on which packages to use nor when to upgrade packages in use. If Python users need to use a new package or upgrade an existing package, they need to ask an approver for a package review. Moreover, workstations (Visual Studio, etc.) should connect to ProGet and only ProGet.
This is just the tip of the iceberg when it comes to learning Python. Becoming a Python master typically takes years, but we’re working on a guide full of tips and tricks to help you improve your Python practices and codebase without having to go back to school.