2017-11-9 Scrapy安装-python学习

先参照[Python]网络爬虫(11):亮剑!爬虫框架小抓抓Scrapy闪亮登场! – CSDN博客安装step1-7部分。我安装的版本是:

1.python-3.4.3

2.lxml-3.3.1.win32-py3.4

3.setuptools不用单独装,python包含了

4.zope.interface-4.3.3.win32-py3.4

5.上面文档中链接中没有合适的twisted版本,网上找了https://www.lfd.uci.edu/~gohlke/pythonlibs/中下载twisted-17.9.0-cp34-cp34m-win32.whl。然后修改文件名,twisted-17.9.0-cp34-none-win32.whl,执行pip install twisted-17.9.0-cp34-none-win32.whl完成安装。

注意:如果不修改文件名,会提示is not a supported wheel on this platform。shell中输入import pip; print(pip.pep425tags.get_supported())可以获取到pip支持的文件名还有版本,显示如下:

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015,
22:43:06) [MSC v.1600 32 bit (Intel)] on win32

Type “copyright”,
“credits” or “license()” for more information.

>>> import pip; print(pip.pep425tags.get_supported())

[(‘cp34’, ‘none’, ‘win32’),
(‘cp34’, ‘none’, ‘any’), (‘cp3’, ‘none’, ‘any’), (‘cp33’, ‘none’, ‘any’),
(‘cp32’, ‘none’, ‘any’), (‘cp31’, ‘none’, ‘any’), (‘cp30’, ‘none’, ‘any’),
(‘py34’, ‘none’, ‘any’), (‘py3’, ‘none’, ‘any’), (‘py33’, ‘none’, ‘any’),
(‘py32’, ‘none’, ‘any’), (‘py31’, ‘none’, ‘any’), (‘py30’, ‘none’, ‘any’)]

>>>

安装完成后,shell中执行命令,可以看到twisted版本

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32

Type “copyright”, “credits” or “license()” for more information.

>>> import twisted

>>> twisted.version

Version(‘Twisted’, 17, 9, 0)

6.pyOpenSSL-0.11.winxp32-py3.2.,注意:这个版本和我最终安装的scrapy不匹配,引起不小的麻烦,后面详细说。

7.pywin32-221.win32-py3.4

8.最后安装scrapy

先说一下python相关程序的安装方法

a.打包成exe,那直接执行

b.打包成msi,也是直接执行

c.whl文件,执行pip install whl

d.gz文件包,先解压,然后cmd窗口,到解压的目录中,执行python setup.py install

=======================================

正式安装开始

执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错如下,大致意思是twisted包已经安装好了,但是缺少 queuelib包,然后连接失败(我上网是通过代理服务器的)。

Requirement already satisfied (use –upgrade to upgrade): Twisted>=13.1.0 in c:\

python34\lib\site-packages (from Scrapy==1.3.2)

Collecting queuelib (from Scrapy==1.3.2)

Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connec

tion broken by ‘ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connectio

n failed: 407 Unauthorized’,))’: /simple/queuelib/

……省略

Could not find any downloads that satisfy the requirement queuelib (from Scrap

y==1.3.2)

No distributions at all found for queuelib (from Scrapy==1.3.2)

查了一下攻略,正常安装scrapy,执行pip命令后,系统默认会自动链接到http://pypi.python.org/simple下载关联包,可以用-i参数修改pip源,比如-i http://mirrors.aliyun.com/pypi/simple。不过这个方法还是对我没有用,总是报不能连接。

没办法,自己手工来吧。

先去http://mirrors.aliyun.com/pypi/simple/queuelib,下载queuelib-1.3.0.tar.gz,安装结果:

D:\python\dist\queuelib-1.3.0>python setup.py install

running install

running bdist_egg

running egg_info

writing queuelib.egg-info\PKG-INFO

……省略

Installed c:\python34\lib\site-packages\queuelib-1.3.0-py3.4.egg

Processing dependencies for queuelib==1.3.0

Finished processing dependencies for queuelib==1.3.0

继续执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错,还是缺少包。根据报错提示,继续手工下载,然后安装。

其中,安装service_identity-14.0.0遇到一个问题,当时报错如下:

D:\python\dist\service_identity-14.0.0>python setup.py install

C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option:

‘extra_requires’

warnings.warn(msg)

running install

running bdist_egg

running egg_info

……省略

Installed c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg

Processing dependencies for service-identity==14.0.0

Searching for pyopenssl>=0.12

Reading Links for pyopenssl

Download error on https://pypi.python.org/simple/pyopenssl/: Tunnel connection f

ailed: 407 Unauthorized — Some packages may not be found!

Couldn’t find index page for ‘pyopenssl’ (maybe misspelled?)

Scanning index of all packages (this may take a while)

Reading https://pypi.python.org/simple/

Download error on https://pypi.python.org/simple/: Tunnel connection failed: 407

Unauthorized — Some packages may not be found!

No local packages or working download links found for pyopenssl>=0.12

error: Could not find suitable distribution for Requirement.parse(‘pyopenssl>=0.

12′)

D:\python\dist\service_identity-14.0.0>

解释一下,service_identity-14.0.0安装时候需要0.12版本的openssl,不过我最初安装的是0.11版本,所以不匹配。网上找了pyOpenSSL-0.12.winxp32-py3.2,下载安装。然后service_identity-14.0.0也安装成功了。

继续执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错,还是缺少包。根据报错提示,继续手工下载,然后安装。

终于,不再报错,安装完成了(过程见下),激动。

D:\>pip install D:\python\Scrapy-1.3.2-py3-none-any.whl

You are using pip version 6.0.8, however version 9.0.1 is available.

You should consider upgrading via the ‘pip install –upgrade pip’ command.

Processing d:\python\scrapy-1.3.2-py3-none-any.whl

Requirement already satisfied (use –upgrade to upgrade): service-identity in c:

\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg (from Scrapy==1.3.

2)

Requirement already satisfied (use –upgrade to upgrade): Twisted>=13.1.0 in c:\

python34\lib\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): cssselect>=0.9 in c:\p

ython34\lib\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): queuelib in c:\python3

4\lib\site-packages\queuelib-1.3.0-py3.4.egg (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): PyDispatcher>=2.0.5 in

c:\python34\lib\site-packages\pydispatcher-2.0.5-py3.4.egg (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): w3lib>=1.15.0 in c:\py

thon34\lib\site-packages\w3lib-1.15.0-py3.4.egg (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): lxml in c:\python34\li

b\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): pyOpenSSL in c:\python

34\lib\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): six>=1.5.2 in c:\pytho

n34\lib\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): parsel>=1.1 in c:\pyth

on34\lib\site-packages (from Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): characteristic>=14.0.0

in c:\python34\lib\site-packages\characteristic-14.0.0-py3.4.egg (from service-

identity->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): pyasn1 in c:\python34\

lib\site-packages (from service-identity->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): pyasn1-modules in c:\p

ython34\lib\site-packages (from service-identity->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): constantly>=15.1 in c:

\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): incremental>=16.10.1 i

n c:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): Automat>=0.3.0 in c:\p

ython34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): hyperlink>=17.1.1 in c

:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): zope.interface>=4.0.2

in c:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): attrs in c:\python34\l

ib\site-packages (from Automat>=0.3.0->Twisted>=13.1.0->Scrapy==1.3.2)

Requirement already satisfied (use –upgrade to upgrade): setuptools in c:\pytho

n34\lib\site-packages (from zope.interface>=4.0.2->Twisted>=13.1.0->Scrapy==1.3.

2)

Installing collected packages: Scrapy

Successfully installed Scrapy-1.3.2

D:\>

看样子,安装完成了。打开一个cmd窗口,在任意位置执行scrapy命令,结果得到下列页面

D:\>scrapy

Traceback (most recent call last):

File “C:\Python34\lib\site-packages\OpenSSL\__init__.py”, line 15, in <module>

orig = sys.getdlopenflags()

AttributeError: ‘module’ object has no attribute ‘getdlopenflags’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File “C:\Python34\lib\runpy.py“, line 170, in _run_module_as_main

“__main__”, mod_spec)

File “C:\Python34\lib\runpy.py“, line 85, in _run_code

exec(code, run_globals)

File “C:\Python34\Scripts\scrapy.exe\__main__.py”, line 9, in <module>

File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 121, in execute

cmds = _get_commands_dict(settings, inproject)

File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 45, in _get_comma

nds_dict

cmds = _get_commands_from_module(‘scrapy.commands’, inproject)

File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 28, in _get_comma

nds_from_module

for cmd in _iter_command_classes(module):

File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 19, in _iter_comm

and_classes

for module in walk_modules(module_name):

File “C:\Python34\lib\site-packages\scrapy\utils\misc.py“, line 71, in walk_mo

dules

submod = import_module(fullpath)

File “C:\Python34\lib\importlib\__init__.py”, line 109, in import_module

return _bootstrap._gcd_import(name[level:], package, level)

File “<frozen importlib._bootstrap>”, line 2254, in _gcd_import

File “<frozen importlib._bootstrap>”, line 2237, in _find_and_load

File “<frozen importlib._bootstrap>”, line 2226, in _find_and_load_unlocked

File “<frozen importlib._bootstrap>”, line 1200, in _load_unlocked

File “<frozen importlib._bootstrap>”, line 1129, in _exec

File “<frozen importlib._bootstrap>”, line 1471, in exec_module

File “<frozen importlib._bootstrap>”, line 321, in _call_with_frames_removed

File “C:\Python34\lib\site-packages\scrapy\commands\version.py“, line 6, in <m

odule>

import OpenSSL

File “C:\Python34\lib\site-packages\OpenSSL\__init__.py”, line 17, in <module>

from OpenSSL import crypto

ImportError: DLL load failed: 找不到指定的模块。

看报错,应该是openssl的问题。打开控制面板,发现有2个版本的openssl,卸载了0.11版本,然后执行scrapy,还是一样的错误。怎么办。。。

网上找了半天,没有找到查openssl当前版本的命令。

万般无奈,只能再尝试重新安装service_identity包,执行D:\python\dist\service_identity-14.0.0>python setup.py install。在执行同时,我无意中随便打开一个网页,结果奇迹出现了,download openssl包成功了,原来自动下载的是17.0版本openssl

Processing dependencies for service-identity==14.0.0

Searching for pyopenssl>=0.12

Reading https://pipy.python.org/simple/pyopenssl

Downloading https://pypi.python.org/packages/ee/6a/cd78737dd990297205943cc4dcad3

d3c502807fd2c5b18c5f33dc90ca214/pyOpenSSL-17.3.0.tar.gz#md5=09dcd307b8d2068f9dd5

aaa3a3a88992

Best match: pyOpenSSL 17.3.0

Processing pyOpenSSL-17.3.0.tar.gz

Writing C:\Users\aaa\AppData\Local\Temp\easy_install-3bi4uj87\pyOpenSSL-17.3

.0\setup.cfg

Running pyOpenSSL-17.3.0\setup.py -q bdist_egg –dist-dir C:\Users\aaa\AppDa

ta\Local\Temp\easy_install-3bi4uj87\pyOpenSSL-17.3.0\egg-dist-tmp-ubew1tgt

warning: no previously-included files found matching ‘leakcheck’

warning: no previously-included files matching ‘*.py’ found under directory ‘lea

kcheck’

warning: no previously-included files matching ‘*.pem’ found under directory ‘le

akcheck’

warning: no previously-included files matching ‘*.cert’ found under directory ‘e

xamples\simple’

warning: no previously-included files matching ‘*.pkey’ found under directory ‘e

xamples\simple’

no previously-included directories found matching ‘doc\_build’

no previously-included directories found matching ‘.travis’

no previously-included directories found matching ‘.mention-bot’

zip_safe flag not set; analyzing archive contents…

Copying pyopenssl-17.3.0-py3.4.egg to c:\python34\lib\site-packages

Adding pyopenssl 17.3.0 to easy-install.pth file

Installed c:\python34\lib\site-packages\pyopenssl-17.3.0-py3.4.egg

Searching for cryptography>=1.9

Reading Links for cryptography

Download error on https://pypi.python.org/simple/cryptography/: Tunnel connectio

n failed: 407 Unauthorized — Some packages may not be found!

Couldn’t find index page for ‘cryptography’ (maybe misspelled?)

Scanning index of all packages (this may take a while)

Reading https://pypi.python.org/simple/

Download error on https://pypi.python.org/simple/: Tunnel connection failed: 407

Unauthorized — Some packages may not be found!

No local packages or working download links found for cryptography>=1.9

error: Could not find suitable distribution for Requirement.parse(‘cryptography>

=1.9′)

找到秘诀之后,那就简单了。以上安装,完成了openssl包,不过cryptography没有下载下来,继续执行D:\python\dist\service_identity-14.0.0>python setup.py install,然后同时打开任意网页,有50%概率可以直接download。实在不行,就根据提示,把缺少的包手工下载单独安装。

终于安装完成,提示如下:

D:\python\dist\service_identity-14.0.0>python setup.py install

C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option:

‘extra_requires’

warnings.warn(msg)

running install

running bdist_egg

running egg_info

writing top-level names to service_identity.egg-info\top_level.txt

writing requirements to service_identity.egg-info\requires.txt

writing dependency_links to service_identity.egg-info\dependency_links.txt

writing service_identity.egg-info\PKG-INFO

reading manifest file ‘service_identity.egg-info\SOURCES.txt’

reading manifest template ‘Home – Manifest

writing manifest file ‘service_identity.egg-info\SOURCES.txt’

installing library code to build\bdist.win32\egg

running install_lib

running build_py

creating build\bdist.win32\egg

creating build\bdist.win32\egg\service_identity

copying build\lib\service_identity\exceptions.py -> build\bdist.win32\egg\servic

e_identity

copying build\lib\service_identity\pyopenssl.py -> build\bdist.win32\egg\service

_identity

copying build\lib\service_identity\_common.py -> build\bdist.win32\egg\service_i

dentity

copying build\lib\service_identity\_compat.py -> build\bdist.win32\egg\service_i

dentity

copying build\lib\service_identity\__init__.py -> build\bdist.win32\egg\service_

identity

byte-compiling build\bdist.win32\egg\service_identity\exceptions.py to exception

s.cpython-34.pyc

byte-compiling build\bdist.win32\egg\service_identity\pyopenssl.py to pyopenssl.

cpython-34.pyc

byte-compiling build\bdist.win32\egg\service_identity\_common.py to _common.cpyt

hon-34.pyc

byte-compiling build\bdist.win32\egg\service_identity\_compat.py to _compat.cpyt

hon-34.pyc

byte-compiling build\bdist.win32\egg\service_identity\__init__.py to __init__.cp

ython-34.pyc

creating build\bdist.win32\egg\EGG-INFO

copying service_identity.egg-info\PKG-INFO -> build\bdist.win32\egg\EGG-INFO

copying service_identity.egg-info\SOURCES.txt -> build\bdist.win32\egg\EGG-INFO

copying service_identity.egg-info\dependency_links.txt -> build\bdist.win32\egg\

EGG-INFO

copying service_identity.egg-info\requires.txt -> build\bdist.win32\egg\EGG-INFO

copying service_identity.egg-info\top_level.txt -> build\bdist.win32\egg\EGG-INF

O

zip_safe flag not set; analyzing archive contents…

creating ‘dist\service_identity-14.0.0-py3.4.egg’ and adding ‘build\bdist.win32\

egg’ to it

removing ‘build\bdist.win32\egg’ (and everything under it)

Processing service_identity-14.0.0-py3.4.egg

Removing c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg

Copying service_identity-14.0.0-py3.4.egg to c:\python34\lib\site-packages

service-identity 14.0.0 is already the active version in easy-install.pth

Installed c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg

Processing dependencies for service-identity==14.0.0

Searching for pyopenssl==17.3.0

Best match: pyopenssl 17.3.0

Processing pyopenssl-17.3.0-py3.4.egg

pyopenssl 17.3.0 is already the active version in easy-install.pth

Using c:\python34\lib\site-packages\pyopenssl-17.3.0-py3.4.egg

Searching for pyasn1-modules==0.1.2

Best match: pyasn1-modules 0.1.2

Adding pyasn1-modules 0.1.2 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for pyasn1==0.3.5

Best match: pyasn1 0.3.5

Adding pyasn1 0.3.5 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for characteristic==14.0.0

Best match: characteristic 14.0.0

Processing characteristic-14.0.0-py3.4.egg

characteristic 14.0.0 is already the active version in easy-install.pth

Using c:\python34\lib\site-packages\characteristic-14.0.0-py3.4.egg

Searching for six==1.11.0

Best match: six 1.11.0

Adding six 1.11.0 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for cryptography==1.9

Best match: cryptography 1.9

Adding cryptography 1.9 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for asn1crypto==0.23.0

Best match: asn1crypto 0.23.0

Adding asn1crypto 0.23.0 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for cffi==1.7.0

Best match: cffi 1.7.0

Adding cffi 1.7.0 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for idna==2.6

Best match: idna 2.6

Adding idna 2.6 to easy-install.pth file

Using c:\python34\lib\site-packages

Searching for pycparser==2.18

Best match: pycparser 2.18

Adding pycparser 2.18 to easy-install.pth file

Using c:\python34\lib\site-packages

Finished processing dependencies for service-identity==14.0.0

最后,打开一个cmd窗口,在任意位置执行scrapy命令,得到提示如下,成功。

D:\>scrapy

Scrapy 1.3.2 – no active project

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

commands

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

Use “scrapy <command> -h” to see more info about a command

D:\>

个人觉得,如果是直接上网,不通过代理服务器,安装scrapy应该不会有那么多麻烦。不过这样一折腾,也算熟悉一下手工安装的步骤。

    原文作者:没人不认识我
    原文地址: https://zhuanlan.zhihu.com/p/30883557
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞