pip-pop源码学习 #1

hongmaoxiao · 2017-12-23T23:56:11Z

今年我的主要学习任务是抄优秀框架的源代码！

最近想爬点数据玩一玩，所以打算把python好好的学一学，我就先从小的项目开始抄吧。pip-pop是python代码界的男神Kenneth Reitz写的一个关于python安装依赖requirements.txt的命令行操作的库。这里是关于Kenneth Reitz的中文介绍。

我是按照他github上的项目一个一个commit敲的，这个项目的学习代码放在pip-pop-source上。

我抄完的代码结构是下面这样的，跟作者的不全一样，因为他自己的库是作为python第三方库可以安装使用的，所以一些配置文件和README我就不去写了，只要核心的功能实现了就行:

~/source/backend/python/pip-pop master
❯ tree
.
├── bin
│   ├── pip-diff
│   └── pip-grep
├── README.md
├── requirements.txt
├── setup.py
├── tests
│   ├── test-requirements2.txt
│   └── test-requirements.txt
└── tox.ini

2 directories, 8 files

先看一下作者的README，了解一下这个库能做什么事情。

pip-pop

Working with lots of requirements.txt files can be a bit annoying. Have no fear, pip-pop is here!

(work in progress)

Planned Commands

$ pip-diff [--fresh | --stale] <reqfile> <reqfile>

Generates a diff between two given requirements files. Lists either stale or fresh packages.

$ pip-grep <reqfile> <package>...

Takes a requirements file, and searches for the specified package (or packages) within it. Essential when working with included files.

Possible Future Commands

Install with blacklisting support (wsgiref, distribute, setuptools).
Development

To run the tests:

1. pip install -r requirements.txt
2. tox

主要有两个命令，pip-diff是查看两个requirements.txt的依赖的不同，命令可以加--fresh参数，这个参数会列出新增的依赖列表，--stale则列出不再使用的依赖的列表。
pip-grep是查找依赖文件requirements.txt里是否有特定依赖库。

接下来直接放源码进行学习吧！先来看看pip-diff:

#!/usr/bin/env python
# -*- coding: utf-8 -*

"""Usage:
  pip-diff (--fresh | --stale) <reqfile1> <reqfile2> [--exclude <package>...]
  pip-diff (-h | --help)

Options:
  -h --help     Show this screen.
  --fresh       List newly added packages.
  --stale       List removed packages.
"""
import os
from docopt import docopt
from pip.req import parse_requirements
from pip.index import PackageFinder
from pip._vendor.requests import session

requests = session()


class Requirements(object):
    def __init__(self, reqfile=None):
        super(Requirements, self).__init__()
        self.path = reqfile
        self.requirements = []

        if reqfile:
            self.load(reqfile)

    def __repr__(self):
        return '<Requirements \'{}\'>'.format(self.path)

    def load(self, reqfile):
        if not os.path.exists(reqfile):
            raise ValueError('The given requirements file does not exist.')

        finder = PackageFinder([], [], session=requests)
        for requirement in parse_requirements(reqfile, finder=finder, session=requests):
            if requirement.req:
                if not getattr(requirement.req, 'name', None):
                    # Prior to pip 8.1.2 the attribute `name` did not exist.
                    requirement.req.name = requirement.req.project_name
                self.requirements.append(requirement.req)


    def diff(self, requirements, ignore_versions=False, excludes=None):
        r1 = self
        r2 = requirements
        results = {'fresh': [], 'stale': []}

        # Generate fresh packages.
        other_reqs = (
            [r.name for r in r1.requirements]
            if ignore_versions else r1.requirements
        )

        for req in r2.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['fresh'].append(req)

        # Generate stale packages.
        other_reqs = (
            [r.name for r in r2.requirements]
            if ignore_versions else r2.requirements
        )

        for req in r1.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['stale'].append(req)

        return results





def diff(r1, r2, include_fresh=False, include_stale=False, excludes=None):

    include_versions = True if include_stale else False
    excludes = excludes if len(excludes) else []

    try:
        r1 = Requirements(r1)
        r2 = Requirements(r2)
    except ValueError:
        print('There was a problem loading the given requirements files.')
        exit(os.EX_NOINPUT)

    results = r1.diff(r2, ignore_versions=True, excludes=excludes)
    if include_fresh:
        for line in results['fresh']:
            print(line.name if include_versions else line)

    if include_stale:
        for line in results['stale']:
            print(line.name if include_versions else line)




def main():
    args = docopt(__doc__, version='pip-diff')

    kwargs = {
        'r1': args['<reqfile1>'],
        'r2': args['<reqfile2>'],
        'include_fresh': args['--fresh'],
        'include_stale': args['--stale'],
        'excludes': args['<package>']
    }

    diff(**kwargs)



if __name__ == '__main__':
    main()

#!/usr/bin/env python
#　-*- coding: utf-8 -*

第一行前面的指的是执行代码的时候去env设置里查找python的安装路径，再调用对应路径下的解释器程序完成操作。因为python的默认安装在/usr/bin/python，如果用户不是安装在该路径下，那么程序可能会因为找不到python解释器而无法执行，而用/usr/bin/env　python这个方式就能兼容性的解决这个问题。第二行指的是编码方式是utf-8，至于前面的#!和#　-*- ，python解释器在执行的时候能够正确识别这些符号，以和正文代码区别开来，是必须的。

"""Usage:
  pip-diff (--fresh | --stale) <reqfile1> <reqfile2> [--exclude <package>...]
  pip-diff (-h | --help)

Options:
  -h --help     Show this screen.
  --fresh       List newly added packages.
  --stale       List removed packages.
"""

这一块的代码是要输出在命令行界面上的友好的命令行操作提示，比如pip命令的界面提示如下:

~/source/backend/python/pip-pop master
❯ pip --help

Usage:   
  pip <command> [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and
                              CRITICAL logging levels).
  --log <path>                Path to a verbose appending log.
  --proxy <proxy>             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort.
  --trusted-host <hostname>   Mark this host as trusted, even though it does not have valid or any HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied
                              with --no-index.

当然了，这个得配合docopt库的使用才能达到这个效果。

import os
from docopt import docopt
from pip.req import parse_requirements
from pip.index import PackageFinder
from pip._vendor.requests import session

先导入一些后续需要用到的库。

requests = session()

来看看这个session是什么，在ipython中输出它的_doc_：

In [9]: from pip._vendor.requests import session

In [10]: print(session.__doc__)

    Returns a :class:`Session` for context-management.

    :rtype: Session

这里说的是返回一个Session类作为上下文管理用。单单这样看并不太明白是不是，那直接去看它的源码吧，看了一下，我的天呐一共有757行，作为一个python未入门的人是费不小功夫的，我暂且先看Session类的__doc__吧，先知道它干什么用：

A Requests session.

    Provides cookie persistence, connection-pooling, and configuration.

    Basic Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> s.get('http://httpbin.org/get')
      <Response [200]>

    Or as a context manager::

      >>> with requests.Session() as s:
      >>>     s.get('http://httpbin.org/get')
      <Response [200]>

这里说是一个请求的session，提供持久的cookie，连接和配置。

接下来创建了一个Requirements的类，现在具体来看看这个类:

def __init__(self, reqfile=None):
    super(Requirements, self).__init__()
    self.path = reqfile
    self.requirements = []

    if reqfile:
        self.load(reqfile)

先来看看这个类里的_init_，这到底是什么玩意，一眼看大约觉得是个初始化函数，那么在stackoverflow（查找资料的时候最好搜英文的，最优先的是官网，其次就是google和stackoverflow了）查找了一下这个_init_的问题。高票答者是这么说的:

In this code:

class A(object):
    def __init__(self):
        self.x = 'Hello'

    def method_a(self, foo):
        print self.x + ' ' + foo
... the self variable represents the instance of the object itself. Most object-oriented languages pass this as a hidden parameter to the methods defined on an object; Python does not. You have to declare it explicitly. When you create an instance of the A class and call its methods, it will be passed automatically, as in ...

a = A()               # We do not pass any argument to the __init__ method
a.method_a('Sailor!') # We only pass a single argument
The __init__ method is roughly what represents a constructor in Python. When you call A() Python creates an object for you, and passes it as the first parameter to the __init__ method. Any additional parameters (e.g., A(24, 'Hello')) will also get passed as arguments--in this case causing an exception to be raised, since the constructor isn't expecting them.

高票答者说self代表了对象的实例本身。大部分面向对象语言把this作为隐含的参数传递给对象方法，this代表的就是对象的实例本身，但是python不一样，你必须明确的去声明它。当你创建了一个对象A的实例并调用它的方法，self会自动传入，就如：a = A(), a.method_a('Sailor!')会输出'Hello　Sailor！'，就是说在对象A实例化的时候就自动执行了__init__的方法，这时候如果输出print(a.x)的话，那就是'Hello'了，当调用method_a方法的时候会自动把self.x也就是a.x传到方法里，最终结果就是'Hello　Sailor！'了。

__init__就是一种大致上的python构造函数的体现（但它并不是python的构造函数），调用A()的时候，python会创建一个A的实例，并把这个实例作为__init__的第一个参数，也就是self，所以__init__的第一个参数不能是其它，必须是self，当然__init__除了第一个参数以外还可以传入多个参数，但是得需要自己去处理这些参数，否则会报错。

~/learn/lpython
❯ cat test1.py 
#!/usr/bin/env python
# encoding: utf-8

class A(object):
    def __init__(self, a, b, c):
        self.x = 'Hello'
        self.a = a
        self.b = b
        self.c = c

    def method_a(self, foo):
        print(self.x + ' ' + foo)

a = A(1,2,3)
print('x:', a.x)
print('a:', a.a)
print('b:', a.b)
print('c:', a.c)

执行看结果:

~/learn/lpython
❯ python test1.py 
x: Hello
a: 1
b: 2
c: 3

如果改成这样:

~/learn/lpython
❯ cat test1.py 
#!/usr/bin/env python
# encoding: utf-8

class A(object):
    def __init__(self):
        self.x = 'Hello'

    def method_a(self, foo):
        print(self.x + ' ' + foo)

a = A(1,2,3)
print('x:', a.x)

执行结果如下:

~/learn/lpython
❯ python test1.py 
Traceback (most recent call last):
  File "test1.py", line 11, in <module>
    a = A(1,2,3)
TypeError: __init__() takes 1 positional argument but 4 were given

解释器说只需要一个参数，但是传入了４个参数。实际上我们传入的是３个参数，调用__init__的时候self也是一个参数，所以是４个。

super(Requirements, self).__init__()

接下来看这个super又是干什么用的，还是继续看stackoverflow上的关于这个super的问题。

super() lets you avoid referring to the base class explicitly, which can be nice. But the main advantage comes with multiple inheritance, where all sorts of fun stuff can happen. See the standard docs on super if you haven't already.

Note that the syntax changed in Python 3.0: you can just say super().__init__() instead of super(ChildB, self).__init__() which IMO is quite a bit nicer.

super()的好处就是避免精确调用父类，实际上这里也是可以这么用的object._init_()，这是属于精确调用了，譬如:

class A(object):
  def __init__(self):
   print("A")

class B(A):
  def __init__(self):
   super(B, self).__init__()
   print("B")

和下面的一个写法效果一样：

class A(object):
  def __init__(self):
   print("A")

class B(A):
  def __init__(self):
   A.__init__()
   print("B")

以上两种写法都是可以的，如果在继承不多不复杂的时候这么用是没问题的，但是如果存在多重继承，多个子类继承相同的父类，第二种方法就可能造成多次去调用父类__init__的方法等问题。这里边的深层次逻辑一下子讲不清楚，主要就是super()是按照MRO的顺序来执行的，MRO全称是Method Resolution Order，它代表了类继承的顺序。这块还是得去看Raymond Hettinger写的经典文章Python’s super() considered super!，如果看英文比较吃力可以先看这篇理解 Python super，当然高票答案还推荐看Things to Know About Python Super和官方文档的super。

我自己对这一块的理解还不是特别透彻，回头再捋一捋。

self.path = reqfile
self.requirements = []

if reqfile:
    self.load(reqfile)

这里把传进来的reqfile变量存到实例的path，初始化一个实例的requirements空列表。如果reqfile不为None的话执行实例的load方法。

def __repr__(self):
    return '<Requirements \'{}\'>'.format(self.path)

这里是定义了一个__repr__方法，先看看官网的介绍_repr_是怎么说的:

Called by the repr() built-in function and by string conversions (reverse quotes) to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned. The return value must be a string object. If a class defines __repr__() but not __str__(), then __repr__() is also used when an “informal” string representation of instances of that class is required.

This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

我想呢，大意是说__repr__这个方法可以通过调用这个repr()或者通过字符串转换来生成一个”官方“的字符串来描述对象。如果可能的话定义的方法应当是在给予适当的环境条件下可以用来重新生成一个对象的一个合法的python表达式（不知道翻译得对不对）。如果没有定义__repr__的话，调用repr()或者通过字符串转换返回的是类似'<...some useful description...>'这样的结果，返回的结果必须得是一个字符串对象。如果定义了_repr_()，而没有定义_str_()，那么调用str()或者其对应的字符串转换方法会使用__repr__的结果。__repr__通常用于调试，所以定义__repr__方法时提供丰富的信息并且是清楚的，不含糊的就显得很重要了。

stackoverflow上还有关于__repr__和__str__的比较。这一块就不详细说了。光说理论没用，还是得上个例子:

~/learn/lpython
❯ ipython
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: class A(object):
   ...:     def __init__(self):
   ...:         self.x = 'Hello'
   ...: 
   ...:     def method_a(self, foo):
   ...:         print(self.x + ' ' + foo)
   ...:         

In [2]: a = A()

In [3]: a
Out[3]: <__main__.A at 0x7f1e7db5f5f8>

In [4]: print(a)
<__main__.A object at 0x7f1e7db5f5f8>

In [5]: "%s" % a
Out[5]: '<__main__.A object at 0x7f1e7db5f5f8>'

In [6]: "%r" % a
Out[6]: '<__main__.A object at 0x7f1e7db5f5f8>'

打开ipython，定义一个名为A的类，实例化对象a，不论是直接输出a，print(a)，"%s" % a还是"%r" % a最终结果都一样，由于没有定义__repr__和__str__方法，返回的就是上面说的类似'<...some useful description...>'这样的结果。那么定义了的话又有什么不同呢?

In [7]: class A(object):
   ...:     def __init__(self):
   ...:         self.x = 'Hello'
   ...:     def __str__(self):
   ...:         return 'A __str__'
   ...:     def __repr__(self):
   ...:         return 'A __repr__'
   ...:     def method_a(self, foo):
   ...:         print(self.x + ' ' + foo)
   ...:         
   ...:         

In [8]: a = A()

In [9]: "%r" % a
Out[9]: 'A __repr__'

In [10]: "%s" % a
Out[10]: 'A __str__'

In [11]: print(a)
A __str__

In [12]: a
Out[12]: A __repr__

很显然，直接打印a和"%r" % a调用的都是_repr_()方法，print(a)和"%s" % a调用的是__str__的方法，实际上呢__str__和__repr__方法的功用差不多，只是__str__主要是为了可读性更好，定义的内容也可以是含糊的，不确定性的。__repr__和__str__就像是"正式"和"非正式"一样。还是得放上官网的关于_str_的介绍。

Called by the str() built-in function and by the print statement to compute the “informal” string representation of an object. This differs from __repr__() in that it does not have to be a valid Python expression: a more convenient or concise representation may be used instead. The return value must be a string object.

具体就不翻译了，主要的一点是，使用str()和print方法的时候走的是_str_()方法。还有一种情况是只定义__repr__而不定义__str__的时候：

In [18]: print(a)
A __repr__

In [19]: class A(object):
    ...:     def __init__(self):
    ...:         self.x = 'Hello'
    ...:     def __repr__(self):
    ...:         return 'A __repr__'
    ...:     def method_a(self, foo):
    ...:         print(self.x + ' ' + foo)
    ...:         
    ...:         
    ...:         

In [20]: a = A()

In [21]: a
Out[21]: A __repr__

In [22]: print(a)
A __repr__

In [23]: "%s" % a
Out[23]: 'A __repr__'

In [24]: "%r" % a
Out[24]: 'A __repr__'

结果就是都走的是__repr__方法，跟上面的理论对上了。

接下来看看load的方法:

def load(self, reqfile):
    if not os.path.exists(reqfile):
        raise ValueError('The given requirements file does not exist.')

    finder = PackageFinder([], [], session=requests)
    for requirement in parse_requirements(reqfile, finder=finder, session=requests):
        if requirement.req:
            if not getattr(requirement.req, 'name', None):
                # Prior to pip 8.1.2 the attribute `name` did not exist.
                requirement.req.name = requirement.req.project_name
            self.requirements.append(requirement.req)

先用os.path.exists判断提供的路径下是否存在requirements文件，如果文件不存在或者没有访问权限就抛出一个类型为ValueError的异常。然后来看看PackageFinder方法，在ipython中查看它的_doc_：

In [7]: from pip.index import PackageFinder

In [8]: print(PackageFinder.__doc__)
This finds packages.

    This is meant to match easy_install's technique for looking for
    packages, by reading pages and looking for appropriate links.

这个方法的意思是通过匹配easy_install的技术，借助查找页面和寻找合适的链接来寻找包（不知道翻得对不对）。总之，目标就是寻找包。然而，这样看还是不懂是不是？还得看源码，又是1093行，我想还是等我python入门以后有时间再单开一个学习pip源码的文章好了。。。

接下来看看Requirements类的diff函数：

def diff(self, requirements, ignore_versions=False, excludes=None):
        r1 = self
        r2 = requirements
        results = {'fresh': [], 'stale': []}

        # Generate fresh packages.
        other_reqs = (
            [r.name for r in r1.requirements]
            if ignore_versions else r1.requirements
        )

        for req in r2.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['fresh'].append(req)

        # Generate stale packages.
        other_reqs = (
            [r.name for r in r2.requirements]
            if ignore_versions else r2.requirements
        )

        for req in r1.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['stale'].append(req)

        return results

函数有requirements、ignore_versions、excludes三个参数，然后是把self赋给r1，实际上就是类的实例给了r1，requirements赋值r2，results就是返回的最终结果，是一个字典，字典里两个键就是新增（fresh)和删除了（stale），每个键的值先初始化为一个空列表。

看other_reqs变量：

other_reqs = (
    [r.name for r in r1.requirements]
    if ignore_versions else r1.requirements
)

这个变量是一个返回列表的一个运算表达式，貌似这种写法是函数式，对函数式不太了解，就不展开了。这里判断ignore_versions是否为True，如果不是直接返回r1.requirements，这本身就是一个列表，如果是，就返回一个r1.requirements里边的键为name的值的列表。生成列表主要靠的是[for...in...]这样的表达式。外面的[]是不能去掉的，稍微想一想就明白。这里看ignore_versions这个变量从字面上的意思猜可能是是否忽略版本，但综合全部代码来看，这个变量全是True，而且使用帮助的Usage里也没有这个选项，实际上看起来对整个代码没有任何影响。看到的时候直接当做True来看就行了。

看下一个循环:

for req in r2.requirements:
    r = req.name if ignore_versions else req

    if r not in other_reqs and r not in excludes:
        results['fresh'].append(req)

这里的处理前半部分和other_reqs的处理一样，不展开了。唯一不同的是，它是在for循环里去单个处理，不是直接用函数式的方式返回一个列表。主要看里面的判断语句。如果r不在other_reqs和excludes（排除项）里。那么就把req附加到结果列表的的fresh对应的列表中。这里要注意的一点是要单独存下一个r变量再other_reqs变量的比较是，存到r的变量必须经过和other_reqs相同方法处理才能保证他们的形式一样，才能用in方法进行判断。但是results结果需要的不是r，而是r2.requirements列表里的值。

接下来看类似的stale的处理方式：

other_reqs = (
    [r.name for r in r2.requirements]
    if ignore_versions else r2.requirements
)

for req in r1.requirements:
    r = req.name if ignore_versions else req

    if r not in other_reqs and r not in excludes:
        results['stale'].append(req)

这次的other_reqs变量的主体变成了r2的，然后是在r1的循环里去处理，如果r不在other_reqs和excludes（排除项）里。那么就把req附加到结果列表的的stale对应的列表中。

最后的最后return results，把结果返回。没啥可说的了。是不是还没看懂？没关系，给个例子说明一下就好了。

~/source/backend/python/pip-pop/tests master
❯ cat test-requirements.txt 
cffi
django
requests[security]

~/source/backend/python/pip-pop/tests master
❯ cat test-requirements2.txt 
Django==1.9.6
gunicorn
requests[security]

可以看到有两个文件test-requirements.txt和test-requirements2.txt。这如果以test-requirements.txt为主体，那么test-requirements2.txt就是被比较的未来依赖文件，diff函数最终存到result里的fresh列表中的就是test-requirements.txt没有而test-requirements2.txt有的那些，就是gunicorn和Django==1.9.6，这里其实两个文件都有django，但是我猜ignore_versions为True的话就是导致了当做两个不同的依赖，这个不急，等到后边调试进行验证就知道了。相反，stale是指删除的，废弃不用的，上面的cffi就是要存到result里的stale列表中的值。到现在应该很明了了，当然这是宏观来看，其实这两个依赖文件是经过load函数处理了的，处理细节以我目前的水平还不能细究。。。

接下来看独立的diff函数：

def diff(r1, r2, include_fresh=False, include_stale=False, excludes=None):

    include_versions = True if include_stale else False
    excludes = excludes if len(excludes) else []

    try:
        r1 = Requirements(r1)
        r2 = Requirements(r2)
    except ValueError:
        print('There was a problem loading the given requirements files.')
        exit(os.EX_NOINPUT)

    results = r1.diff(r2, ignore_versions=True, excludes=excludes)
    if include_fresh:
        for line in results['fresh']:
            print(line.name if include_versions else line)

    if include_stale:
        for line in results['stale']:
            print(line.name if include_versions else line)

参数r1和r2就代表两个requirements文件，include_fresh和include_stale是判断是否要查看fresh和stale。excludes是指要把哪些依赖库排除在比较之外。如果include_stale为True，include_versions也为True，否则都为False。如果excludes含有大于或等于一个元素是就把excludes付给excludes，否则excludes就是空列表。然后实例化r1和r2，如果实例化不成功，则抛出一个类型为ValueError异常，然后程序退出，退出的类型是os.EX_NOINPUT，这个指的是文件不存在或者文件不可读。接下来调用了Requirements类实例的diff方法，r1作为主体，r2被比较。返回结果存到results中。如果include_fresh为True，就去循环results['fresh']打印出结果。当然如果是include_versions为True，则打印该值的name属性的值，否则该值本身。如果include_stale为True，同理，打印出相应的results['fresh']的结果。

现在来看Usage和main函数：

"""Usage:
  pip-diff (--fresh | --stale) <reqfile1> <reqfile2> [--exclude <package>...]
  pip-diff (-h | --help)
Options:
  -h --help     Show this screen.
  --fresh       List newly added packages.
  --stale       List removed packages.
"""

def main():
    args = docopt(__doc__, version='pip-diff')

    kwargs = {
        'r1': args['<reqfile1>'],
        'r2': args['<reqfile2>'],
        'include_fresh': args['--fresh'],
        'include_stale': args['--stale'],
        'excludes': args['<package>']
    }

    diff(**kwargs)

这回docopt终于派上用场了，__doc__指代的就是开头的三引号包起来的使用说明Usage，version这个应该就是命令的版本了，这个可以通过pip-diff ---version来查看。在kwargs字典中，args['< reqfile1>']就是指在命令行中输入的时候reqfile1位置对应的值，然后把它作为键r1的值。其它同理变量同理。最后执行diff函数，把kwargs字典作为参数传进去。

diff(**kwargs)

这里用**（两个星）的方式把字典作为变量传进去，解释器会自动解析好的，如果传的参数是list，则可以通过*（一个星）的方式。这时候回头去看看diff函数的参数，就豁然开朗了，整个程序就能串起来了。

def diff(r1, r2, include_fresh=False, include_stale=False, excludes=None)

最后的最后：

if __name__ == '__main__':
    main()

来看stackoverflow关于if name == 'main':的解释。举个例子，有两个文件，一个名字叫a.py：

# a.py
import b

另一个是b.py:

# b.py
print("Hello World from %s!" % __name__)

if __name__ == '__main__':
    print("Hello World again from %s!" % __name__)

如果直接在命令行中去执行python a.py：

$ python a.py
Hello World from b!

它是从b.py文件导入，这种情况下__name__实际上就是模块的名称b，这块对应的名字应该就是执行的文件名。所以输出的是Hello World from b!

如果直接在命令行中去执行python b.py：

$ python b.py
Hello World from __main__!
Hello World again from __main__!

直接去执行该文件而不是通过别的文件导入来执行的时候，Python会把__name__设置为_main_，所以除了输出一个print以外还会执行if name == 'main':这个代码块里的代码。

现在应该很清楚了，如果直接运行pip-diff文件，__name__就等于_main_，立马就执行main()函数。如果是别的文件把pip-diff导入的，则不走这个流程。

现在来做一个简单的测试，用的呢就是作者在这个项目里所用的tox，这个的做法首先需要在当前项目根目录里的依赖文件requirements.txt有tox：

#requirements.txt

~/source/backend/python/pip-pop master*
❯ cat requirements.txt
docopt==0.6.2
tox==2.3.1

用的是tox的2.3.1版本。然后还要在项目根目录下有tox.ini文件：

#tox.ini

~/source/backend/python/pip-pop master*
❯ cat tox.ini 
[tox]
# To reduce the size of the test matrix, tests the following pip versions:
#  * all of the latest major version releases (8.x)
#  * only the first and last point release of each older major version
#  * plus the latest distro version available via LTS Ubuntu package managers:
#    (1.5.4 for Trusty, 8.1.1 for Xenial).
envlist = pip{901}

[testenv]
# Speeds up pip install and reduces log spam, for compatible versions of pip.
setenv = PIP_DISABLE_PIP_VERSION_CHECK=1

deps =
    pip901: pip==9.0.1

# TODO: Replace with something like https://scripttest.readthedocs.io or else
# rework the pip-grep and pip-diff scripts so they can more easily be unit-tested.
commands =
    pip-diff --fresh tests/test-requirements.txt tests/test-requirements2.txt

为了方便测试，我改了作者的原文件，只保留了版本号为9.0.1的pip。

然后还得在pip-diff文件中print一些需要看到输出的结果：

def load(self, reqfile):
    if not os.path.exists(reqfile):
        raise ValueError('The given requirements file does not exist.')
    print("requests:", requests)
    finder = PackageFinder([], [], session=requests)
    print("finder:", finder)
    i = 0
    for requirement in parse_requirements(reqfile, finder=finder, session=requests):
        print(i, ": ", requirement)
        i = i + 1
        if requirement.req:
            if not getattr(requirement.req, 'name', None):
                # Prior to pip 8.1.2 the attribute `name` did not exist.
                requirement.req.name = requirement.req.project_name
            self.requirements.append(requirement.req)

实际上我就是想看requests，finder还有for循环里的requirement，因为load这一部分的细节不是很清楚。

最后来执行测试，在当前项目根目录的命令行中输入tox：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='320611245'
pip901 runtests: commands[0] | pip-diff --fresh tests/test-requirements.txt tests/test-requirements2.txt
requests: <pip._vendor.requests.sessions.Session object at 0x7fc418d86ac8>
finder: <pip.index.PackageFinder object at 0x7fc41424dfd0>
0 :  cffi (from -r tests/test-requirements.txt (line 1))
1 :  django (from -r tests/test-requirements.txt (line 2))
2 :  requests[security] (from -r tests/test-requirements.txt (line 3))
requests: <pip._vendor.requests.sessions.Session object at 0x7fc418d86ac8>
finder: <pip.index.PackageFinder object at 0x7fc4141f2ac8>
0 :  Django==1.9.6 (from -r tests/test-requirements2.txt (line 1))
1 :  gunicorn (from -r tests/test-requirements2.txt (line 2))
2 :  requests[security] (from -r tests/test-requirements2.txt (line 3))
Django==1.9.6
gunicorn
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

测试通过，可以看到requests和finder都是实例对象的形式，其实对于常用javascript的人来说很不友好，这个输出对调试来说帮助不大，现在我还不知道有什么别的什么友好的输出形式，但就之前学习到的_repr_()和_str_()来看，此处应该是没有去单独定义这两个函数了。无所谓了，现在我们主要还是看parse_requirements的结果。

0 :  cffi (from -r tests/test-requirements.txt (line 1))
1 :  django (from -r tests/test-requirements.txt (line 2))
2 :  requests[security] (from -r tests/test-requirements.txt (line 3))

0 :  Django==1.9.6 (from -r tests/test-requirements2.txt (line 1))
1 :  gunicorn (from -r tests/test-requirements2.txt (line 2))
2 :  requests[security] (from -r tests/test-requirements2.txt (line 3))

这是解析了test-requirements.txt和tests/test-requirements2.txt的结果，从结果看是把他们的名字输出来，并且交代了解析哪个文件的哪一行。其实这样也行了，毕竟我们这个pip-diff命令操作也只需要他们的名字。

接下来讲讲这最后的两行结果：

Django==1.9.6
gunicorn

我们还是再来调试一下，现在之前load里的调试代码去掉。在diff函数里加上如下的调试信息：

def diff(r1, r2, include_fresh=False, include_stale=False, excludes=None):

    include_versions = True if include_stale else False
    excludes = excludes if len(excludes) else []

    try:
        r1 = Requirements(r1)
        r2 = Requirements(r2)
    except ValueError:
        print('There was a problem loading the given requirements files.')
        exit(os.EX_NOINPUT)

    results = r1.diff(r2, ignore_versions=True, excludes=excludes)
    print("results: ", results)
    if include_fresh:
        for line in results['fresh']:
            print("include_versions: ", include_versions)
            print("line: ", line)
            print("line.name: ", line.name)
            print(line.name if include_versions else line)

    if include_stale:
        for line in results['stale']:
            print(line.name if include_versions else line)

一个是想了解results、include_versions，还有line和line.name的不同之处。看调试结果：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='365264289'
pip901 runtests: commands[0] | pip-diff --fresh tests/test-requirements.txt tests/test-requirements2.txt
results:  {'fresh': [<Requirement('Django==1.9.6')>, <Requirement('gunicorn')>], 'stale': [<Requirement('cffi')>, <Requirement('django')>]}
include_versions:  False
line:  Django==1.9.6
line.name:  Django
Django==1.9.6
include_versions:  False
line:  gunicorn
line.name:  gunicorn
gunicorn
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

include_versions这个为False其实很明显了，因为include_stale不为True。另外line和line.name的区别也是很明显了。line带版本号，而line.name不带版本号。这就能够给忽略版本号比较带来很大的方便。以下的代码也终于豁然开朗了：

def diff(self, requirements, ignore_versions=False, excludes=None):
        r1 = self
        r2 = requirements
        results = {'fresh': [], 'stale': []}

        # Generate fresh packages.
        other_reqs = (
            [r.name for r in r1.requirements]
            if ignore_versions else r1.requirements
        )

        for req in r2.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['fresh'].append(req)

        # Generate stale packages.
        other_reqs = (
            [r.name for r in r2.requirements]
            if ignore_versions else r2.requirements
        )

        for req in r1.requirements:
            r = req.name if ignore_versions else req

            if r not in other_reqs and r not in excludes:
                results['stale'].append(req)

        return results

之前不太理解ignore_versions是怎么个起作用法，现在清楚了。看other_reqs里的表达式如果ignore_versions为True，则只比较它的名字，忽略版本号，否则比较全部。当然从测试结果看，比较名字的时候是区分大小写的。如果我把tests/test-requirements2.txt下的Django==1.9.6改成django==1.9.6。如下：

#tests/test-requirements.txt

cffi
django
requests[security]

#tests/test-requirements2.txt

django==1.9.6
gunicorn
requests[security]

运行测试：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='2016573932'
pip901 runtests: commands[0] | pip-diff --fresh tests/test-requirements.txt tests/test-requirements2.txt
gunicorn
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

结果只有gunicorn是属于fresh（新增的）。而之前的测试结果是Django==1.9.6和gunicorn。现在把tox.ini改成测试stale的：

# tox.ini

# 改动前
commands =
    pip-diff --fresh tests/test-requirements.txt tests/test-requirements2.txt

#　改动后
commands =
    pip-diff --stale tests/test-requirements.txt tests/test-requirements2.txt

看结果：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='381521045'
pip901 runtests: commands[0] | pip-diff --stale tests/test-requirements.txt tests/test-requirements2.txt
cffi
django
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

废弃的有cffi和django，没错吧，原理和fresh基本相同，不再赘述。

到目前为止已经解决了pip-diff了，现在学习pip-grep，这比pip-diff要简单些，上代码：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Usage:
  pip-grep [-s] <reqfile> <package>...

Options:
  -h --help     Show this screen.
"""
import os
from docopt import docopt
from pip.req import parse_requirements
from pip.index import PackageFinder
from pip._vendor.requests import session

requests = session()

class Requirements(object):
    def __init__(self, reqfile=None):
        super(Requirements, self).__init__()
        self.path = reqfile
        self.requirements = []

        if reqfile:
            self.load(reqfile)

    def __repr__(self):
        return '<Requirements \'{}\'>'.format(self.path)

    def load(self, reqfile):

        if not os.path.exists(reqfile):
            raise ValueError('The given requirements file does not exist.')

        finder = PackageFinder([], [], session=requests)
        for requirement in parse_requirements(reqfile, finder=finder, session=requests):
            if requirement.req:
                if not getattr(requirement.req, 'name', None):
                    # Prior to pip 8.1.2 the attribute `name` did not exist.
                    requirement.req.name = requirement.req.project_name
                self.requirements.append(requirement.req)




def grep(reqfile, packages, silent=False):

    try:
        r = Requirements(reqfile)
    except ValueError:

        if not silent:
            print('There was a problem loading the given requirement file.')

        exit(os.EX_NOINPUT)

    for req in r.requirements:

        if req.name in packages:

            if not silent:
                print('Package {} found!'.format(req.name))
            exit(0)

    if not silent:
        print('Not found.')

    exit(1)


def main():
    args = docopt(__doc__, version='pip-grep')

    kwargs = {'reqfile': args['<reqfile>'], 'packages': args['<package>'], 'silent': args['-s']}


    grep(**kwargs)



if __name__ == '__main__':
    main()

这块的代码很多和前面的一样，只讲不一样的部分：

def grep(reqfile, packages, silent=False):

    try:
        r = Requirements(reqfile)
    except ValueError:

        if not silent:
            print('There was a problem loading the given requirement file.')

        exit(os.EX_NOINPUT)

    for req in r.requirements:

        if req.name in packages:

            if not silent:
                print('Package {} found!'.format(req.name))
            exit(0)

    if not silent:
        print('Not found.')

    exit(1)

grep函数呢有三个参数reqfile（依赖文件）、packages（查找的包名）、silent（这个猜想是是否是安静模式）。然后就是Requirements(reqfile)类实例化为r。如果不成功，则抛出一个类型为ValueError的异常，如果是非安静模式，打印出无法加载依赖文件的结果。然后退出，退出原因是文件不存在或者文件不可读。然后循环实例化的requirements结果，如果这里边有依赖库的名称在所要查询的包名里，并且是非安静模式，则打印出'Package xxx found!'，说明找到了要查询的结果。如果选择安静模式，则成功退出。exit(0)表示成功退出。如果循环结束没找到，下一个流程。此时如果选择非安静模式，会打印出'Not found.'，否则exit(1)非正常退出。好了，整个流程非常简单。现在直接测试：

# tox.ini

# 改动前
commands =
    pip-diff --stale tests/test-requirements.txt tests/test-requirements2.txt

# 改动后
commands =
    pip-grep tests/test-requirements.txt cffi

测试结果：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='4036656098'
pip901 runtests: commands[0] | pip-grep tests/test-requirements.txt cffi
Package cffi found!
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

结果是tests/test-requirements.txt存在cffi包。如果silent为True呢：

# tox.ini

# 改动前
commands =
    pip-grep tests/test-requirements.txt cffi

# 改动后
commands =
    pip-grep tests/test-requirements.txt cffi　-s

运行测试：

~/source/backend/python/pip-pop master*
❯ tox
GLOB sdist-make: /home/mao/source/backend/python/pip-pop/setup.py
pip901 inst-nodeps: /home/mao/source/backend/python/pip-pop/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='3814042974'
pip901 runtests: commands[0] | pip-grep tests/test-requirements.txt cffi -s
__________________________________________________________________ summary __________________________________________________________________
  pip901: commands succeeded
  congratulations :)

查找成功，因为选择安静模式，什么都没有输出。从测试的完备性来讲，我们还应该测试不存在要查找的库的情况：

# tox.ini

# 改动前
commands =
    pip-grep tests/test-requirements.txt cffi

# 改动后
commands =
    pip-grep tests/test-requirements.txt flask

运行结果：

# fxm @ bogon in ~/source/backend/python/pip-pop-source on git:master x [13:16:17]
$ tox
GLOB sdist-make: /Users/fxm/source/backend/python/pip-pop-source/setup.py
pip901 create: /Users/fxm/source/backend/python/pip-pop-source/.tox/pip901
pip901 installdeps: pip==9.0.1
pip901 inst: /Users/fxm/source/backend/python/pip-pop-source/.tox/dist/pip-pop-0.1.0.zip
pip901 installed: docopt==0.6.2,pip-pop==0.1.0
pip901 runtests: PYTHONHASHSEED='3632824289'
pip901 runtests: commands[0] | pip-grep tests/test-requirements.txt flask
Not found.
ERROR: InvocationError: '/Users/fxm/source/backend/python/pip-pop-source/.tox/pip901/bin/pip-grep tests/test-requirements.txt flask'
_______________________________________________________________________________ summary ________________________________________________________________________________
ERROR:   pip901: commands failed

因为不存在flask，测试未通过，输出了一个Not found。

到此为此，这个小项目算是完全过了一遍了。

总结

写文章确实是一件不容易的事情，相当的费时间。但确有很大好处，就我个人而言，我在写的过程中，对之前抄过的代码有了一个更深入的理解。
这是我写的关于python的第一篇文章，很多东西如_init_，_repr_，__name__等等都不太清楚，因此还是得费多点笔墨，所以感觉有些啰嗦，当然这也跟个人的表达能力有关，多写多练来提升吧。
查资料的优先顺序是:

官方文档 > 谷歌 > stackoverflow > 知乎 > segmentfault > 掘金 > 博客园 == CSDN... > 国内的个人网站

最好不要用百度，别问我为什么。国内的个人网站很多都不是原创的，抄来抄去的，很多错误，翻译也很垃圾，最好也别看，其实选择前3种基本上都能解决问题，如果你英语足够好的话。另外，阮一峰老师的博客慎看，个人比较推荐的是他的《ES6 标准入门》和《黑客与画家》的翻译，其它部分见仁见智。。。

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pip-pop源码学习 #1

pip-pop源码学习 #1

hongmaoxiao commented Dec 23, 2017

pip-pop源码学习 #1

pip-pop源码学习 #1

Comments

hongmaoxiao commented Dec 23, 2017

总结