python – 针对本地文件的Scrapy shell

在Scrapy 1.0之前,我可以非常简单地针对本地文件运行Scrapy
Shell

$scrapy shell index.html

升级到1.0.3后,它开始抛出一个错误:

$scrapy shell index.html
2015-10-12 15:32:59 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot)
2015-10-12 15:32:59 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-10-12 15:32:59 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
Traceback (most recent call last):
  File "/Users/user/.virtualenvs/so/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/commands/shell.py", line 50, in run
    spidercls = spidercls_for_request(spider_loader, Request(url),
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 24, in __init__
    self._set_url(url)
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 59, in _set_url
    raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: index.html 

这种行为是打算还是Scrapy Shell中的错误?

作为一种解决方法,我可以在“文件”URL方案中使用文件的绝对路径:

$scrapy shell file:////absolute/path/to/index.html

这显然不那么方便和容易.

最佳答案 更新:对于Scrapy> = 1.1,这是一个内置功能,您可以这样做:

scrapy shell file:///path/to/file.html

老答案:

根据Running scrapy shell against a local file中的讨论,相关的更改是在this commit之前引入的.为了使Scrapy shell再次打开本地文件而创建了这个问题的Pull Request,它计划成为Scrapy 1.1的一部分.

点赞