Anathomy of a Python Package

发布于 2014/01/28 12:21
字数 2498
阅读 11
收藏 0

Anathomy of a Python Package 2014-01-27 23:00 Over the course of several past months and years I was coding in Python, I’ve created quite a few Python packages: both open source and for private projects. Even though their most important part was always the code, there are numerous additional files that are necessary for the package to correctly serve its purpose. Rather than part of the Python language, they are more closely related to the Python platform.

But if you look for any definite, systemic info about them, you will at best find some scattered pieces of knowledge in various unrelated places. At worst, the only guidance would come in the form of a multitude of existing Python package sources, available on GitHub and similar sites. Parroting them is certainly an option, although I believe it’s much more advantageous to acquire firm understanding of how those different cogs fit together. Without it, following the modern Python’s best development practices – which are all hugely beneficial – is largely impossible.

So, I want to fill this void by outlining the structure of a Python package, as completely as possible. You can follow it as a step-by-step guide when creating your next project. Or just skim through it to see what you’re missing, and whether it’d be worthwhile to address such gaps. Any additional element or file will usually provide some tangible benefit, but of course not every project requires all bells and whistles.

Without further ado, let’s see what’s necessary for a complete Python software bundle.

  1. LICENSE file

Very early, possibly even before you write a single line of code, configuration or documentation, I’d like you first to take a moment and decide how you’re going to distribute your work. It’s fine to belay that decision if it’s not intended to be public, but for any kind of open source projects, establishing a license is of paramount importance. In fact, I wrote about this very topic some time ago.

As for practical matters, generating the actual license text turns out to be extremely easy if you use the incredibly useful, small program called lice. I recommend installing it directly into your global Python interpreter space:

$ sudo pip install lice Now you can just navigate to your project’s directory and whip out a license file with one simple command:

$ lice bsd2 >LICENSE Variable parts of license text, such as project’s and author’s name, are filled in automatically. Refer to $ lice --help for more details, and a list of available licenses.

  1. Main Python file

Now, with clear conscience, you can commence coding. But wait! There is a small piece of boilerplate that you may find tremendously useful to put right on top of your main Python file. “Main” means, of course, either the sole .py module, or init.py of the top level package.

Also, it’s not really a boilerplate. It’s more like… an introduction:

""" dogelchemist :: Turns spam emails into Dogecoins """ version = "0.0.1" author = "John Doe" license = "WTFPL" It happens that version number, author’s name and program’s license are often needed in at least few parts of the code, and beyond. Examples include:

setup.py file (see below) usage/--help documentation, for command line programs content of About dialog, for GUI programs text in footer, for web applications User Agent (e.g. for urllib.urlopen) and other kind of client identification strings Exposing version in code also means that dependent packages may possibly adjust to different releases of your bundle, at least in somewhat systematic way. Should you deprive them of this possibility, they’ll inevitably resort to much nastier hacks.

  1. requirements-test.txt

Most projects rely on existing packages to handle some of their necessary functionality; those are referred to as dependencies or requirements. By no means this is a necessity, though. Since Python’s standard library is rich and powerful, it’s sometimes possible to build valuable packages based entirely upon standard modules.

The tests, however, are pretty much given to require some assistance. There is little reason not to use a third-party test runner, for example. Indeed, something like nose or py.test makes running tests a breeze and simplifies writing them, too. Likewise for the various mocking libraries, not to mention more specialized tools for e.g. benchmarking or load testing. Testing can be complicated affair sometimes, especially in a language such as Python where it’s absolutely crucial.

Therefore the recommendation is to put all dependencies required strictly by tests – and tests only! – into a separate requirements-test.txt file. Here’s an example:

-e .

py.test==2.5.0 mocktest==0.7 Like traditional requirements.txt described later, this file is in the standard format, understood by Pip. With a properly configured Python package, it is also complete enough to allow a single command:

$ pip install -r requirements-test.txt to install everything what’s necessary for tests to run – and hopefully pass! Being capable of such a level of automation can unlock quite powerful rewards later on, as it’s a prerequisite for any kind of continuous integration.

  1. setup.py

Python has no “package description” file per se, akin to package.json from Node.js or *.cabal files for Haskell. Instead, it takes an approach not entirely dissimilar to Ruby’s, where the bundle’s “specification” is also executable code. But Python’s setup.py is not just that: it’s an actual installation script.

Although not strictly necessary for executable programs, having a setup.py is still recommended for all Python packages. For libraries, it is absolutely mandatory. In either case, setup.py is the file which enables a package to be installed into the interpreter, and hence imported by any other Python code it runs.

Most of the content of setup.py is typically an invocation of setuptools.setup function. The majority of its parameters are in fact fields in the package’s resume: name, description, author, etc. Rest is more substantial: they describe what Python files the package consists of, and what are its installation requirements.

Let’s have a look at a “real world” example:

#!/usr/bin/env python """ dogelchemist

Turns spam emails into Dogecoins """

from setuptools import setup, find_packages

import dogelchemist

setup( name="dogelchemist", version=dogelchemist.version, description="Turns spam emails into Dogecoins", long_description=doc, author=dogelchemist.author, url="http://example.com/dogelchemist", license=dogelchemist.license,

    "Intended Audience :: End Users/Desktop",
    "License :: Freely Distributable",
    "Operating System :: OS Independent",
    "Programming Language :: Python",
    "Programming Language :: Python :: 2.6",
    "Programming Language :: Python :: 2.7",
    "Topic :: Office/Business :: Financial",
    "Topic :: Security :: Cryptography",


    'console_scripts': ['dogelchemist=dogelchemist.main:main'],

) While this by far doesn’t exhaust the breadth of setup parameters, it demonstrates some of the more common arguments. This includes classifiers=, a complete list of which can be found on PyPI website; and entry_points= with executable commands.

But more importantly, we have packages= argument which is almost always used in conjunction with find_packages function. (Should you only have loose modules, you’d use py_modules= instead). We typically exclude tests from packages to be installed, as they are not relevant for end users who just want to use our code. Finally, install_requires= lists all the external packages we depend on. For short dependency lists, it’s fine to enumerate them like that, though the usual practice nowadays is to use a dedicated requirements.txt file.

  1. MANIFEST.in

Among the purposes of setup.py is to tell where are .py files. It’s not as good for pointing to all the other necessary files – basically those we’re talking about here.

To maintain tighter control over what goes into final distribution package, we should use a manifest file called MANIFEST.in. It acts as a fine grained filter, allowing to include or exclude directories, wildcard groups, or individual files:

include LICENSE include README.rst include requirements-test.txt recursive-exclude * *.pyc The manifest file is critically important, if only for the first line of the above example. It ensures that even when installing from PyPI, the user still receives a copy of the license.

  1. requirements.txt (optional)

As mentioned previously, it’s a common practice to extract the dependency list into separate file named requirements.txt. Sometimes it’s even an obligatory, pardon the pun, requirement; that’s how Heroku cloud platform recognizes Python apps, for example.

The file itself is rather straightforward:

-e .

requests>=1.0 quantum-gravity>=1.0 unobtanium>=0.4 What’s less obvious is finding a way to tie it back to setup.py, replacing a literal content of install_requires= argument value. For that, I find the following function quite useful: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 def read_requirements(filename='requirements.txt'): """Reads the list of requirements from given file.

:param filename: Filename to read the requirements from.
                 Uses ``'requirements.txt'`` by default.

:return: Requirments as list of strings.
# allow for some leeway with the argument
if not filename.startswith('requirements'):
    filename = 'requirements-' + filename
if not os.path.splitext(filename)[1]:
    filename += '.txt'  # no extension, add default

def valid_line(line):
    line = line.strip()
    return line and not any(line.startswith(p) for p in ('#', '-'))

def extract_requirement(line):
    egg_eq = '#egg='
    if egg_eq in line:
        _, requirement = line.split(egg_eq, 1)
        return requirement
    return line

with open(filename) as f:
    lines = f.readlines()
    return list(map(extract_requirement, filter(valid_line, lines)))

view rawgistfile1.py hosted with ❤ by GitHub

as it makes everything very straightforward setup( ... install_requires=read_requirements(), ... tests_require=read_requirements('test'), ) Of course, you need to embed it verbatim in your setup.py first, which may diminish the appeal of it somewhat. On the plus side, your requirements’ files may now contain -X flags for Pip (like -e or -r) and comments, which is useful for bigger projects but impossible with simple:

install_requires=open('requirements.txt').readlines(), 6. tox.ini (optional)

How many Python versions does your package support? The answer is obvious for standalone applications: the only one it’s currently running on. But writing a library, we need to treat that issue with a little more gravity. As of now, there are still multiple versions of the language powering thousands of working, production apps. Not just 2.7, but perhaps even 2.6 is not going anywhere anytime soon. Meanwhile, 3.x is increasingly viable option. What do we target?

Whatever we do, it’s important to be explicit and honest about. Explicit means stating it upfront in the README or elsewhere, so that potential users are not left wondering or set up for disappointment. Honest, on the other hand, means fulfilling the promises and actually testing against all these different versions (and implementations) of Python.

This is where wonders of test automation technology come into play. The de facto standard solution for cross-interpreter testing in Python is a tool called tox. Projects that employ it include a tox.ini configuration file, where they list what Python environments they are expected to work in. “Work”, of course, is defined as having the test suite run without any failure.

Simplest tox.ini might look like this:

[tox] minversion=1.4 envlist=py26,py27,pypy,py33

[testenv] deps=-rrequirements-test.txt commands=py.test and, I believe, is pretty self-explanatory. This is also where it clearly shows how extracting requirements-test.txt is a worthwhile endeavor. tox will use it to install test dependencies in virtualenvs for each Python version we’ve specified. Then, it will run the test command (here, py.test) and accumulate results for all environments.

  1. .travis.yml (optional)

Running tox is practical equivalent of “building” a Python project from the continuous integration point of view. I’m pretty sure, though, that no one would fancy keeping up a CI server just for their little open source pet project. Fortunately, this doesn’t mean you need to forgo the benefits of CI anymore.

If you haven’t heard of Travis, it’s a free, hosted continuous integration platform for open source projects. As long as your package is publicly available on one of the code hosting providers (like GitHub or Bitbucket), you can configure a hook that’ll make Travis build it after every code push, and notify immediately should a failure occur.

There’s some configuration involved, of course, but it’s limited to providing a fairly basic .travis.yml file:

language: python python: - "2.6" - "2.7" - "pypy" - "3.3"

install: - pip install -r requirements-test.txt --use-mirrors script: - py.test Its content is eerily similar to that of tox.ini, which is no coincidence. In practice, you can treat running tox as local substitute of a Travis CI build. If the former goes all right, it’s almost certain you can safely push your code upstream, and the latter will build just fine, too.

  1. requirements-dev.txt (optional)

Almost all of the other files described there, both very mandatory and those quite optional, are provided for the benefit of some (more or less) automated process. I say that, for a change, we should finish off with something that’ll be helpful for humans instead.

The file – which I propose to call requirements-dev.txt – is yet another listing of packages in Pip-compatible format. But they are not any sort of actual dependencies; neither should the project need them to work correctly, nor its test suite require them to pass. What I suggest to put into requirements-dev.txt are packages necessary for the development process itself. The goal is to streamline the initial setup for the new contributors to our project. Ideally, all they have to do before starting to code would consist of:

cloning the project’s repository creating a virtualenv running $ pip install -r requirements-dev.txt Not every project would be complex enough to require some auxiliary tools. For those that don’t, requirements-dev.txt would be reduced to simple delegation:

-e . -r requirements-test.txt which hardly justifies its existence. But in reality, software tends to quickly spread its tendrils far and wide, whilst developers are eager to automate any task that appears cumbersome or mundane. Soon enough, those handcrafted helpers start to surround the core project like growth rings.

In essence, requirements-dev.txt is mostly there to support them. The exact packages that are worthy of putting there will vary from project to project, but common examples would include:

configuration and deployment tools, such as like Fabric database migration utilities , e.g. alembic debuggers and other development aids, like IPython and ipdb test environment managers, including tox that was presented before tools for measuring test coverage log analyzers …and so on Automation?…

OK, I think I know what you’re thinking now. You didn’t sign up for this! You just wanted to write some Python code. Why do you need to bother with all these asides? And since we spoke of automation, why they cannot be taken care of… automatically?

Alas, there exist only some incomplete attempts to tackle this issue. Picnic.py, for example, is currently making rounds, but it focuses more on documentation and version control rather than Python-specific artifacts. When it comes to procuring the latter, we are largely on our own. I recommend therefore to closely balance the benefits they may provide with efforts required to create and maintain them.


粉丝 1
博文 5
码字总数 0
作品 0
私信 提问
Windows系统安装Sublime Text 3 配置Python开发环境,萌新第一步

安装阶段: 1.sublime text3的安装: 百度下载 下载完成后 ,点击安装即可。 2.安装Package Control: 点击 Tools -> install Package Control 3.安装anaconda: ctrl+shift+p -> 输入install......

Python application 的打包和发布——(上)

Packaging and Distributing Projects 介绍了 python application 打包和发布的规范,只有满足这种规范的 package 才能被最为常用的包管理工具 pip 所管理。setuptools 是常用的打包工具,其...

koala bear
apt-get install 和 pip install的区别

在UBUNTU系统下,使用python进行语言开发时,在必要时需要安装不同的python包进行扩展。那么,通常用到的两种方式:pip install和ubuntu系统独有的apt-get install有什么区别呢?这里略作记录...

yum update出错

yum update出错 百度弄了很久了一直弄不成功 系统是 centos6.9 报错如下 -> Processing Dependency: libdb-4.7.so()(64bit) for package: python-libs-2.6.6-66.el6_8.x86_64 --> Processin......

python __init__.py

python中的Module是比较重要的概念。常见的情况是,事先写好一个.py文 件,在另一个文件中需要import时,将事先写好的.py文件拷贝 到当前目录,或者是在sys.path中增加事先写好的.py文件所在...





Knative Service 之流量灰度和版本管理

本篇主要介绍 Knative Serving 的流量灰度,通过一个 rest-api 的例子演示如何创建不同的 Revision、如何在不同的 Revision 之间按照流量比例灰度。 部署 rest-api v1 代码 测试之前我们需要...


22.5/22.6 单机上使用git 22.7 简历远程仓库 22.8 克隆远程仓库 22.9 分支管理 22.10 远程分支管理 22.11 标签管理 22.12 git别名 22.13 搭建git服务器 22.14/22.15 安装gitlab 22.16 使用g...


遇到的问题:当多个线程组公用一个公共模块功能的时候(登录生成的cookies信息,文件的写入操作),此时出现跨域问题 场景分析: 跨域:指的是浏览器不能执行其他网站的脚本。它是由浏览器的...


对部署流水线有了个大致的认知之后,开始学习k8s k8s的基本架构 kubernateskubernates这个单词起源于古希腊,是舵手的意思,所以它的logo既像一个渔网,又像一个罗盘。K8S是它的缩写,用“8...