Python Package Managers Explained
by Crista Perlton, on Jan 22, 2022 10:14:27 PM
Python has become one of the most popular programming languages, thanks to its ease of use and extreme versatility. It has an extensive standard library that comes with "batteries included" making it a powerful tool for all kinds of Python users. From data scientists to network engineers, there's a Python library for everyone.
In this article, I'll explain where all these great packages can be found, how Python's standard package manager works, and some challenges and solutions to be aware of when using Python.
PyPI: The Package Index
Similar to NuGet.org & Npmjs.org, Python also has its own official third-party software repository. The Python Package Index (PyPI) is a repository of software that hosts an extensive collection of Python packages, development frameworks, tools, and libraries.
PyPI packages allow developers to share and reuse code rather than having to reinvent the wheel. As PyPI grew, the need for a package manager became so apparent that Python eventually created its own standard package manager: pip.
Pip: The Standard Package Manager
Pip is built-in to Python, and can install packages from many different sources. But PyPI.org is the primary and default package source used.
By default, pip installs packages onto a project’s global Python environment resulting in packages being accessible by all projects. This can be an issue due to packages being dependent on specific versions of other packages. Since all packages are in a global environment, its easy to run into a dependency conflict that may prevent your application from building.
Thankfully, pip automates package management by first resolving all dependencies then proceeding to install the request packages. However, the standard method for preventing dependency conflicts is to create separate Python environments for each project.
Virtual Environments & Virtualenv
In the Python world, a virtual environment is a folder containing packages and other dependencies that a Python project needs. The purpose of these environments is to keep projects separate and prevent dependency, version, and permission conflicts.
Imagine a script relies on 1.10 of the package NumPy, but a different script requires version 1.20. This is a slight problem, because there's a breaking change in 1.19. If you install everything into a global python environment (e.g. the default pip setting) then one of these scripts might not work.
Virtualenv is a tool that allows the creation of named virtual environments where you can install packages in an isolated manner. Each environment has its own installation directories and doesn’t share libraries with other virtual environments (including globally installed libraries).
For example, one environment for web development and a different environment for data science can be created with their own set of libraries.
Pip Alternatives (Pipenv & Poetry)
Pip is the “original” python package manager that others have attempted to improve upon. Pipenv & Poetry are two package managers that have done this with great success.
Pipenv is a package management tool that “aims to bring the best of all packaging worlds” to Python. Pipenv is similar in spirit to Node.js’s npm and Ruby’s bundler. It’s popular among the Python community because it merges virtual environments and package management into a single tool. While pip is sufficient for personal use, Pipenv is recommended for collaborative projects as it’s a higher-level tool that simplifies dependency management for common use cases and can create virtual environments.
Poetry prides itself on making Python packaging and dependency management “easy”. Besides package management, it can help build distributions for applications and deploy them to PyPI. It also allows the declaration of the libraries a project depends on and installs/updates them avoiding any conflicting package requirements. Furthermore, Poetry isolates development versus production dependencies into separate virtual environments.
Conda: Alternative Package Management
Conda is a multi-purpose package management tool. It manages package dependencies, can create virtual environments for applications, installs compatible Python distributions, and packages applications for deployment to production. It originated from Anaconda, which started as a data science package for Python. Conda installs packages from Anaconda rather than PyPI and can be used with multiple programming languages.
Compared to Pip, the package selection is much smaller, but what Conda lacks in quantity it makes up for in quality. Anyone can publish to PyPI, but only packages curated by Anaconda are published in its repository. While Anaconda requires a paid subscription, it grants access to thousands of curated packages and provides support as well. Conda is an ideal package manager for those that are willing to pay to not worry about license, quality, and vulnerability issues when dealing with third party/open-source packages.
Getting Started with Python Package Managers
Pip is the ideal starting place. It comes with Python, is easy to understand, and has an abundance of related resources. However, if you’re working on anything more than a personal project you will likely need to create virtual environments. For that, Pipenv and Poetry are more convenient options than using pip and Virtualenv together.
Alternatively, Conda can be used as a swiss army knife package management tool. It has everything you need in one tool and access to packages curated by Anaconda. However, it requires a paid subscription, its repository (Anaconda) has significantly less packages, and it has fewer related resources available than PyPI and its package managers.
Managing Python packages is only the tip of the iceberg when it comes to using Python in the enterprise. Read our guide to learn how to master Python in the enterprise!