用Python写CLI：参数解析

2024年1月29日 232次阅读来源: 匿蟒

对CLI程序来说，参数解析大概是一个首要的问题。

当然，也有例外。

无参数脚本

许多常用命令，不需要输入参数，就可以按照我们的预想去执行，比如ls。

以Python的Helloworld为例，它就是一个无参数脚本。

print('Hello world!')

这个脚本的作用很明确，就是打印Hello world!字样到sys.stdout。默认情况下，也就是Terminal的回显中。它不需要任何参数。

无参数脚本虽然使用方便，但是通用性差。没有参数，是因为执行内容与环境高度依赖，或者一些可以成为参数的变量被写死。这样的脚本，往往只是一次性用品，或者常用工具的雏形。

单个参数脚本

如果我们希望传入单个参数，那么也比较简单。

比如，在Helloworld的基础上，我们增加一个参数，让脚本打印我们传入的参数。脚本的名称就叫echo.py。

import sys

print(sys.argv[1])

如果我们执行python echo.py hello，就会打印出hello。

sys.argv是一个保存命令行参数的列表，而其中用[1]索引到的的第二个元素，就是我们输入的那个参数hello。

sys.argv
The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string ‘-c’. If no script name was passed to the Python interpreter, argv[0] is the empty string.

如果打印整个列表，改为print(sys.argv)，会更明白它的涵义。

$ python echo.py hello
['echo.py', 'hello']

$ ./echo.py hello world
['./echo.py', 'hello', 'world']

似乎，这个东西也能支持多个命令行参数？且慢，我们之前的脚本还有bug呢！

假如我不输入任何参数，结果会如何？

$ python echo.py
Traceback (most recent call last):
  File "echo.py", line 3, in <module>
    print(sys.argv[1])
IndexError: list index out of range

没错，打印之前，需要做长度检查，echo.py需要修改。

import sys

if len(sys.argv) > 1:
    print(sys.argv[1])

这样，一个单参数的脚本，总算是没问题了。至于多个参数，别想了。

这种获取参数的方法非常原始，与shell的$1类似。它难以支持多个参数而无隐患，更难以进行复杂的参数解析。

想想类似cp这种命令怎么做？

$ cp file0 file1
$ cp -r dir0 dir1
$ cp dir1 dir2 -r

多个参数解析

很多Python脚本的参数解析，还在使用optparse。我建议新脚本就别用它了，因为官网文档也这么说。

Python 2:

New in version 2.3, Deprecated since version 2.7

Python 3:

Deprecated since version 3.2: The optparse module is deprecated and will not be developed further; development will continue with the argparse module.

相比argparse来说，optparse功能略弱，并且不再维护了。

另外，还有一些更老的脚本，使用C风格的getopt。这虽然没有被标为废弃，但是也不推荐新项目、新用户使用了。

Note:
The getopt module is a parser for command line options whose API is designed to be familiar to users of the C getopt() function. Users who are unfamiliar with the C getopt() function or who would like to write less code and get better help and error messages should consider using the argparse module instead.

从sys.argv，到getopt，再到optparse，最后到argparse，在参数解析的技术上实现了三次跨越。第一次使模糊的解析变得清晰，使得孤立的参数变得结构化；第二次让繁琐的解析变得简单，让帮助文档与参数组织在一起。第三次则自动生成帮助文档与错误提示，并且支持形如git的子命令。

注意：argparse仅在Python 2.7+与Python 3.3+的版本自带。

下面以argparse为例，介绍各种形式的参数解析。

无参数

一个没有参数的参数解析，应该最适合理解这个模块的用法。

import argparse


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.parse_args()
    print("Hello world!")

执行这个helloworld.py文件，看看结果。

$ python helloworld.py 
Hello world!

似乎什么也没发生。那么，加个-h试试？

$ python helloworld.py -h
usage: helloworld.py [-h]

optional arguments:
  -h, --help  show this help message and exit

哇！一个没有任何帮助的帮助文档，就这样自动生成了。

-h与--help被默认占用，显示帮助文档并退出。可以看到，Hello world!字样，并未在帮助信息的前后显示。

真正的参数解析，其实就是在parse_args()前，对argparse.ArgumentParser()进行一些设置。

位置参数

为了展示位置参数（Positional arguments），下面写一个cp.py，实现简单的文件复制功能。

import argparse
import shutil


def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "source",
        help="specify the file to be copied",
    )
    parser.add_argument(
        "target",
        help="specify a path to copy to",
    )
    return parser.parse_args()


if __name__ == '__main__':
    args = _parse_args()
    shutil.copy(src=args.source, dst=args.target)

cp.py命令后，第一个参数被识别为source，第二个参数被识别为target，然后执行复制。在经历parse_args()后，sys.argv的参数列表，变成了结构化的args。

（args的类型，是一个<class 'argparse.Namespace'>。）

如果执行python cp.py cp.py cp2.py，那么不会有任何显示信息，成功执行复制操作。

如果多一个或者少一个参数呢？

$ python cp.py cp.py
usage: cp.py [-h] source target
cp.py: error: too few arguments
$ python cp.py cp.py cp2.py cp3.py
usage: cp.py [-h] source target
cp.py: error: unrecognized arguments: cp3.py

这就比直接使用sys.argv的可靠性要高多了。

帮助文档

让我们看看前面那个脚本的帮助文档：

$ python cp.py -h
usage: cp.py [-h] source target

positional arguments:
  source      specify the file to be copied
  target      specify a path to copy to

optional arguments:
  -h, --help  show this help message and exit

只是写了两句help='...'而已，竟然生成了这么有条理的帮助信息！是不是心中充满感动，有一种活在21世纪的感觉？

可选参数

位置参数如果过多，含义往往过于模糊。对参数比较复杂的CLI程序，可以使用多个可选参数（Optional arguments）来指定。

比如，写一个增强型的echo.py，使其支持一个--by参数，指定发言人。

def _read_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'words',
        nargs='*',
        help='the words to be print',
    )
    parser.add_argument(
        '-b', '--by',
        default=None,
        help='who says the words',
        metavar='speaker',
    )
    parser.add_argument(
        '-v', '--version',
        action='version',
        version='%(prog)s 1.0.0',
    )
    return parser.parse_args()


if __name__ == '__main__':
    args = _read_args()

    words = ' '.join(args.words)
    if args.by is not None:
        words = '%s: %s' % (args.by, words)
    print(words)

参数-b与--by，在解析后可以用args.by来调用。如果用args.b，则会报错，因为在长短参数都具备的情况下，优先使用长参数；在只有短参数的情况下，才会使用短参数，args.b才存在。

另外，也支持形如--long-name的长参数。在解析后，用args.long_name来调用，减号-换成了下划线_。

以下为执行与回显。

$ python echo.py -h
usage: echo.py [-h] [-v] [-b speaker] [words [words ...]]

positional arguments:
  words                 the words to be print

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -b speaker, --by speaker
                        who says the words
$ python echo.py -v
echo.py 1.0.0
$ python echo.py How are you?
How are you?
$ python echo.py I am fine, thank you. --by me
me: I am fine, thank you.

可选参数是复杂CLI程序组织输入的最佳选择。在使用时可以随意调换参数的输入顺序，也给出了更加明显的使用提示。

add_argument() 的一些形参

前面echo.py的代码中，add_argument()里有出现nargs、default、help等形式参数，这些都是可选功能。

nargs='*'，使得words可以接受一组不定长度的参数，作为一个list。
help='...'，指定帮助提示信息。
default=None，如果该参数未指定，则使用默认值None。
metavar='speaker'，指定帮助信息里的显示，否则默认为长参数的全大写，如-b BY, --by BY who says the words。
action='...'，这是一个比较复杂的选项，详见action。
其中，version='%(prog)s 1.0.0'，与action='version'配套，显示格式化的版本信息。
而%(prog)，则是一个内置的字符串格式化变量，默认值为程序名，详见prog。

可以在官网文档add_argument中查看到更多选项。

name or flags – Either a name or a list of option strings, e.g. foo or -f, –foo.
action – The basic type of action to be taken when this argument is encountered at the command line.
nargs – The number of command-line arguments that should be consumed.
const – A constant value required by some action and nargs selections.
default – The value produced if the argument is absent from the command line.
type – The type to which the command-line argument should be converted.
choices – A container of the allowable values for the argument.
required – Whether or not the command-line option may be omitted (optionals only).
help – A brief description of what the argument does.
metavar – A name for the argument in usage messages.
dest – The name of the attribute to be added to the object returned by parse_args().

子命令

如果CLI程序有多个相互独立的功能，却又需要组织在一起，那么可以使用子命令。最典型的子命令案例，就是Git。

下面展示一个仿冒版git.py脚本。

import argparse

import clone
import init


def _init_subparsers(parent):
    subparsers = parent.add_subparsers(title='sub commands')
    parser_clone = subparsers.add_parser(
        'clone',
        help='Clone a repository into a new directory'
    )
    clone.init_parser(parser_clone)  # add_argument() in module clone
    parser_clone.set_defaults(func=clone.main)
    parser_init = subparsers.add_parser(
        'init',
        help='Create an empty Git repository or reinitialize an existing one'
    )
    init.init_parser(parser_init)  # add_argument() in module init
    parser_init.set_defaults(func=init.main)


def _parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '-v', '--version',
        action='version',
        version='%(prog)s x.x.x',
    )

    _init_subparsers(parser)

    return parser.parse_args()


if __name__ == '__main__':
    args = _parse_args()
    args.func(args)

显示一下版本与帮助文档。

$ python git.py -v
git.py x.x.x
$ python git.py -h
usage: git.py [-h] [-v] {clone,init} ...

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

sub commands:
  {clone,init}
    clone        Clone a repository into a new directory
    init         Create an empty Git repository or reinitialize an existing
                 one

通过add_subparsers()，可以获得一个<class 'argparse._SubParsersAction'>。再执行add_parser，可以添加若干个子命令。

每一个子命令，都是一个<class 'argparse.ArgumentParser'>。所以，同样支持位置参数、可选参数、子命令等。

clone.init_parser(parser_clone)，是省略的子命令parser设置。它与当前文件的_parse_args()类似，都是对argparse.ArgumentParser的解析。

这里，通过parser.set_defaults(func=module.main)的方式，把func设置为不同module的函数入口（这里是main函数）。在参数解析完毕后，执行args.func(args)，可以调用对应子命令指定的函数。并且，将自身作为参数传入，可以获得解析后的结构化参数。

比如，python git.py clone执行的就是clone.main(args)，而python git.py init执行的则是init.main(args)。

（还有另一种用法，是args.func(**vars(args))。指定的func那边，可以直接在函数声明中定义解析后的参数，不过需要过滤多余参数。）

对子命令的解析，也可以直接把subparsers传进另一个模块里去做自定义的init_parser_in(subparsers)，完成add_parser、add_argument、set_defaults三步。这样，把当前文件当成一个总入口，子命令都在独立的module中，可以达到一定的模块化效果。

也许，子命令最大的作用，是在显示帮助文档时，不会滚动多屏，吓到使用者。

小结

在有了参数解析后，Python代码就从普通脚本，升级成了CLI程序。

更详细的内容，可以查看官方文档argparse或教程tutorial。

这是21世纪第一个十年的参数解析技术，秒杀一切上个世纪的残留。作为Python的标准库之一，它的适用范围广，解析功能多样，效果稳定。我建议，参数解析技术还停留在上个世纪的Python开发者，可以学习使用它。

而在21世纪的第二个十年，则有另外三个流行的参数解析库，或更方便、或更简洁、或更有趣。有闲再说吧。

    原文作者：匿蟒
    原文地址: https://www.jianshu.com/p/409bfa57608e
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。