Best Authoring Practices for Creating Python Packages
by Crista Perlton, on Jan 21, 2022 12:00:28 AM
What’s the easiest way to create a Python package that makes code-reusability across multiple teams and project a cinch?
(It’s a trick question.)
If developers in private organizations want to share their proprietary modules or code libraries, they should rely on Python packages. But there’s a noticeable lack of information on this use-case online.
How can teams use Python packages within a private business when everyone codes in their own unique way?
You can stop googling. We’ve compiled a brief outline and four best practices for you to get started using Python packages effectively.
Python Package Types
Most tutorials online, like the official Python “Packaging Python Projects” explain how to package your Python module and upload it to a site like PyPI, or a private repository like Conda. The problem is tutorials could be three, four, maybe six years old and outdated.
Python has been around since 1998, and pip was only introduced in 2008 – 10 years after the language debuted! It’s been constantly evolving; the evidence is clear in the three package formats available today:
- Source Packages (.tar.gz): a snapshot of the source code with a manifest file that includes metadata like Name, Version, Summary, Author, etc.
- Egg Packages (.egg): added standardization, file structure, and dependencies to source packages.
- Wheel Packages (.whl): an improvement from the Egg format and now most recommended format.
.egg is deprecated (pip won’t install them by default), but there are hundreds of older, popular packages on PyPI still using this format.
Python recommends creating new packages in a .whl format since they make installations faster and more efficient. In theory, a wheel could distribute any type of software, but they’re usually used for just Python.
Creating a Python Package
A Python package that will be distributed around a team or organization starts with an import package (typically referred to as just 'package' and would have a name like kramerica_package), configuring the metadata, adding some text files like README, and then generating the distribution package.
A “distribution package” - the final package that will be sent out - is the import package's archive (i.e. zip file) that contains a library of reusable modules (i.e. .py files) and metadata about the library (version, license, etc).
Making a Python package is like making a zip file with metadata. There are multiple tools available to make a package: the most common is setuptools and then uploading via twine, but ultimately it depends on the developer’s preferences.
Anyone familiar with Python or coding in general can create a Python package:
- layout files in a directory in a certain way
- create pyproject.toml to instruct setuptools to create a package
- create setup.cfg file
- run python3 -m build to create a .whl file
- upload the created .whl file using twine
As mentioned, .whl is current the most common format since .egg has been deprecated (pip won’t install them by default), but there are still hundreds of older, popular .egg packages on PyPI.
Using a Wheel helps distribute packages across teams or projects because it allows users to bypass the build stage. Wheels are a "built distribution" type format, it can be moved and installed quickly compared to a "source distribution" type format that requires a build stage.
We recommend using the following four best practices when creating Python packages. Since Python users are so diverse, writing in their own style, setting these standards can help make distribution of packages more efficient.
Best Practices for Python Packages
Use One Repository & Wiki Per Library
Keep your Python Packages as Single Projects. E.g. having its own source code (git) repository and issue tracker (even if it’s a small library).
The same goes for an internal Wiki page. A wiki page is important even if it’s just a README file inside your repository. It helps other developers use the library and build-upon the documentation started.
It should, at least, have a brief description of the library, how to use/maintain the project, and have links to the issue tracker. As you continue to make major new versions of the package, you can add upgrade notes to the wiki page.
Keep Metadata Simple
This best practice applies when creating a package via static metadata (setup.cfg) as opposed to dynamic metadata (setup.py). Static is the recommended metadata since it is guaranteed to be the same every time and is easier to read.
Keep the setup.cfg file as simple as possible, following the core metadata specifications.
The name of fields in setuptools (called “Key”) and the name of fields as specified in the core metadata specifications are different. Keep things simple to avoid confusion and use your internal wiki to capture other information.
Here are the recommended metadata fields to input. All others are optional.
|name||name||Prefix using a company name like "kramerica_"; try to keep the name the same as your "import" package|
|version||version||Use a SemVer approximation|
|home-page||url||Use instead of Project-url; do not use download-url|
|requires-python||python_requires||Good to use, especially if you maintain compatibility for older versions of python|
|requires-dist||install-requires||See "Best Practices: Versioning and Distribution"|
NOTE: it is possible to use a "dynamic" setupcfg file, but that's quite complex and we don't recommend it.
Use SemVer to help with Versioning and Dependency Tracking
Up until 2009, there were no standards on how distribution packages could be versioned.So when packages had versions like
there was no way to know what the “latest” version of that package was, let alone whether it was pre-release or stable.
Although Python doesn’t fully support SemVer, you can create three part-versions in the same manner.
Be careful with third-party libraries. They may not follow SemVer so you’ll need to look at each library individually and confirm contents and version numbers.
Using SemVer helps you use requires-dist. If your package requires other packages, this field is how you specify the dependencies. Follow SemVer practices and be sure to specify a compatible range while creating the package. For example, ~=3.2 means any version between 3.2 and 4.0.
If you were writing in the dependencies of a package from multiple source libraries, it would look like this:
docutils ~= 0.15
kramerica_utils ~= 3.4
kramerica_datamodel ~= 1.1
For more details about specifying dependencies in Python packages, check out the setuptools documentation and read our article on Versioning Python packages.
Use Wheels for Built Distribution
Do not use source files when making your Python Package. Just the .whl file is sufficient.
Source Packages aren't deprecated yet, but they're practically obsolete. They're created by the the build tool by default, but pip won't install them by default.
Eggs are also built distribution, so they don't require a build stage like Wheels, but Python themselves say Eggs have been replaced by Wheels.
Wheels, as we've described above, are the recommended distribution format. They let you adapt to CI/CD practices and repackaging much easier, so stick with .whl files.
CI/CD for Python Packages
Python is an interpreted language, so its “build” mainly revolves around test execution and creating a package (.whl) file. However, every commit creates a new package which leaves teams with two major headaches.
- A torrent of python packages that will most likely never be used.
- Most of these packages will be “unstable” and not ready for production use.
The CI/CD and Python Disconnect
This means that a Python package is basically unusable until it’s at the last stage of the pipeline and ready to publish. The disconnect your team is facing comes from these three best practice principles your team is following:
- Packages are Immutable (Read-only). Once published, a package file cannot be modified. You can't "edit" a version number of a package, or change its status, because the version number is part of the metadata embedded in the file.
- Untested code shouldn't be deployed. Rebuilding a Python package can produce different software due to wildcard version dependencies, and that means you need to test code you've just built before you can deploy it.
- Deploy only stable (non-prerelease) packages. By the name alone, it doesn't make sense to deploy prerelease packages to a production environment. Only stable versions should ever be released for use.
And that’s not a great feeling; following best practices and being punished for it. Because of this, many teams look at working around CI/CD completely and applying one of these four "non-solutions."
Four Non-options for Python CI/CD
1.❌ Use New Version Numbers at Build Time
Every time you make a new build, you create a new version. Use three-digit versioning (e.g. 3.4.2, 3.4.3, 3.4.112) or Use four-digit versioning (e.g. 184.108.40.206, 220.127.116.11, 18.104.22.168) for every new unstable package version.
While it’s clearly communicated which version is the latest, neither of these communicates which package is stable.
2.❌ Overwrite Packages When You Publish
Download your package (e.g. 3.4.2), overwrite every time you build, and then re-upload it.
Although this solves the problem of an overwhelming amount of new packages and builds, this breaks the immutability rule. With this method, there could be many "Versions" of "version 3.4.0" and it’s impossible to know when it’s stable. On top of this, overwriting creates issues with caching. Visual Studio (and CI servers) generally won't download a package already downloaded. Team members will have to clear the package caches to use the most recent v3.4.0.
3.❌ Deploy Prerelease Packages
Use pre-release segments (e.g. beta1, beta2, beta11) A package is tested in a test staging environment and when it passes, is ready to be released to production.
Using this method, the package's quality is clearly labeled and you can apply all your standard CI/CD best practices to this pipeline. However, once you have a "stable" version of your package, you now have to create a whole new "stable version" of the package and send it through the pipeline all over again.
However, now your team is frustrated and inundated with an overwhelming amount of packages. They're forced to waste time creating multiple packages that will never be consumed.
4.❌ Ignore Versioning Completely
When creating a Python package, tell it to pull the most recent version of any dependencies.
While this simplifies the entire process and expedites the publishing of packages, as the number of packages and dependencies increases; things will eventually break. While Python doesn’t require specific versions, your packages do.
The Secret to CI/CD for Python
"But wait, I thought you said Python doesn't fully support SemVer!?" you might say. You're right, it doesn't fully support SemVer. But you can approximate using Pre-release segments! Read our article on Python package versioning for more details on how to use SemVer in Python.
Avoid Other Versioning Features
Due to the complexity, it’s advised that teams avoid other versioning features.
It's best to strictly stick to a 3-part version with Pre-release Segments forgoing post/development releases. This, unfortunately, is as close to SemVer as Python users can get.
While it may seem difficult to reconcile package immutability, Semantic Versioning, and Continuous Integration for your Python packages, using a technique called "repackaging" will let you use these best practices.
Repackaging creates a new package from an existing package, using exactly the same content but changing the name. For example, Build 8 yields 1.0.0-ci.8 for testing; once approved and repackaged, a release candidate (1.0.0-rc.8) is created. Then finally, a stable version, 1.0.0 will be created for deployment in production packages.
Master Python Packages
The low barrier of entry to Python can be deceiving. While many people are comfortable writing in Python, less are familiar with creating Python packages – and how to do it well.
The excessive information for Python can also be confusing; although there is an official guide to creating a Python package by Python itself, it is not entirely relevant to all use-cases like private, internal packages.
Following best practices will help your team communicate across the organization with whoever may be using their Python packages.
Sign up for our upcoming Python for the Enterprise book to learn more about how to master Python best practices.