先参照[Python]网络爬虫(11):亮剑!爬虫框架小抓抓Scrapy闪亮登场! – CSDN博客安装step1-7部分。我安装的版本是:
1.python-3.4.3
2.lxml-3.3.1.win32-py3.4
3.setuptools不用单独装,python包含了
4.zope.interface-4.3.3.win32-py3.4
5.上面文档中链接中没有合适的twisted版本,网上找了https://www.lfd.uci.edu/~gohlke/pythonlibs/中下载twisted-17.9.0-cp34-cp34m-win32.whl。然后修改文件名,twisted-17.9.0-cp34-none-win32.whl,执行pip install twisted-17.9.0-cp34-none-win32.whl完成安装。
注意:如果不修改文件名,会提示is not a supported wheel on this platform。shell中输入import pip; print(pip.pep425tags.get_supported())可以获取到pip支持的文件名还有版本,显示如下:
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015,
22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type “copyright”,
“credits” or “license()” for more information.
>>> import pip; print(pip.pep425tags.get_supported())
[(‘cp34’, ‘none’, ‘win32’),
(‘cp34’, ‘none’, ‘any’), (‘cp3’, ‘none’, ‘any’), (‘cp33’, ‘none’, ‘any’),
(‘cp32’, ‘none’, ‘any’), (‘cp31’, ‘none’, ‘any’), (‘cp30’, ‘none’, ‘any’),
(‘py34’, ‘none’, ‘any’), (‘py3’, ‘none’, ‘any’), (‘py33’, ‘none’, ‘any’),
(‘py32’, ‘none’, ‘any’), (‘py31’, ‘none’, ‘any’), (‘py30’, ‘none’, ‘any’)]
>>>
安装完成后,shell中执行命令,可以看到twisted版本
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32
Type “copyright”, “credits” or “license()” for more information.
>>> import twisted
>>> twisted.version
Version(‘Twisted’, 17, 9, 0)
6.pyOpenSSL-0.11.winxp32-py3.2.,注意:这个版本和我最终安装的scrapy不匹配,引起不小的麻烦,后面详细说。
7.pywin32-221.win32-py3.4
8.最后安装scrapy
先说一下python相关程序的安装方法
a.打包成exe,那直接执行
b.打包成msi,也是直接执行
c.whl文件,执行pip install whl
d.gz文件包,先解压,然后cmd窗口,到解压的目录中,执行python setup.py install
=======================================
正式安装开始
执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错如下,大致意思是twisted包已经安装好了,但是缺少 queuelib包,然后连接失败(我上网是通过代理服务器的)。
Requirement already satisfied (use –upgrade to upgrade): Twisted>=13.1.0 in c:\
python34\lib\site-packages (from Scrapy==1.3.2)
Collecting queuelib (from Scrapy==1.3.2)
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connec
tion broken by ‘ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connectio
n failed: 407 Unauthorized’,))’: /simple/queuelib/
……省略
Could not find any downloads that satisfy the requirement queuelib (from Scrap
y==1.3.2)
No distributions at all found for queuelib (from Scrapy==1.3.2)
查了一下攻略,正常安装scrapy,执行pip命令后,系统默认会自动链接到http://pypi.python.org/simple下载关联包,可以用-i参数修改pip源,比如-i http://mirrors.aliyun.com/pypi/simple。不过这个方法还是对我没有用,总是报不能连接。
没办法,自己手工来吧。
先去http://mirrors.aliyun.com/pypi/simple/queuelib,下载queuelib-1.3.0.tar.gz,安装结果:
D:\python\dist\queuelib-1.3.0>python setup.py install
running install
running bdist_egg
running egg_info
writing queuelib.egg-info\PKG-INFO
……省略
Installed c:\python34\lib\site-packages\queuelib-1.3.0-py3.4.egg
Processing dependencies for queuelib==1.3.0
Finished processing dependencies for queuelib==1.3.0
继续执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错,还是缺少包。根据报错提示,继续手工下载,然后安装。
其中,安装service_identity-14.0.0遇到一个问题,当时报错如下:
D:\python\dist\service_identity-14.0.0>python setup.py install
C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option:
‘extra_requires’
warnings.warn(msg)
running install
running bdist_egg
running egg_info
……省略
Installed c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg
Processing dependencies for service-identity==14.0.0
Searching for pyopenssl>=0.12
Reading Links for pyopenssl
Download error on https://pypi.python.org/simple/pyopenssl/: Tunnel connection f
ailed: 407 Unauthorized — Some packages may not be found!
Couldn’t find index page for ‘pyopenssl’ (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
Download error on https://pypi.python.org/simple/: Tunnel connection failed: 407
Unauthorized — Some packages may not be found!
No local packages or working download links found for pyopenssl>=0.12
error: Could not find suitable distribution for Requirement.parse(‘pyopenssl>=0.
12′)
D:\python\dist\service_identity-14.0.0>
解释一下,service_identity-14.0.0安装时候需要0.12版本的openssl,不过我最初安装的是0.11版本,所以不匹配。网上找了pyOpenSSL-0.12.winxp32-py3.2,下载安装。然后service_identity-14.0.0也安装成功了。
继续执行pip install D:\python\Scrapy-1.3.2-py3-none-any.whl,报错,还是缺少包。根据报错提示,继续手工下载,然后安装。
终于,不再报错,安装完成了(过程见下),激动。
D:\>pip install D:\python\Scrapy-1.3.2-py3-none-any.whl
You are using pip version 6.0.8, however version 9.0.1 is available.
You should consider upgrading via the ‘pip install –upgrade pip’ command.
Processing d:\python\scrapy-1.3.2-py3-none-any.whl
Requirement already satisfied (use –upgrade to upgrade): service-identity in c:
\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg (from Scrapy==1.3.
2)
Requirement already satisfied (use –upgrade to upgrade): Twisted>=13.1.0 in c:\
python34\lib\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): cssselect>=0.9 in c:\p
ython34\lib\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): queuelib in c:\python3
4\lib\site-packages\queuelib-1.3.0-py3.4.egg (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): PyDispatcher>=2.0.5 in
c:\python34\lib\site-packages\pydispatcher-2.0.5-py3.4.egg (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): w3lib>=1.15.0 in c:\py
thon34\lib\site-packages\w3lib-1.15.0-py3.4.egg (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): lxml in c:\python34\li
b\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): pyOpenSSL in c:\python
34\lib\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): six>=1.5.2 in c:\pytho
n34\lib\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): parsel>=1.1 in c:\pyth
on34\lib\site-packages (from Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): characteristic>=14.0.0
in c:\python34\lib\site-packages\characteristic-14.0.0-py3.4.egg (from service-
identity->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): pyasn1 in c:\python34\
lib\site-packages (from service-identity->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): pyasn1-modules in c:\p
ython34\lib\site-packages (from service-identity->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): constantly>=15.1 in c:
\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): incremental>=16.10.1 i
n c:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): Automat>=0.3.0 in c:\p
ython34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): hyperlink>=17.1.1 in c
:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): zope.interface>=4.0.2
in c:\python34\lib\site-packages (from Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): attrs in c:\python34\l
ib\site-packages (from Automat>=0.3.0->Twisted>=13.1.0->Scrapy==1.3.2)
Requirement already satisfied (use –upgrade to upgrade): setuptools in c:\pytho
n34\lib\site-packages (from zope.interface>=4.0.2->Twisted>=13.1.0->Scrapy==1.3.
2)
Installing collected packages: Scrapy
Successfully installed Scrapy-1.3.2
D:\>
看样子,安装完成了。打开一个cmd窗口,在任意位置执行scrapy命令,结果得到下列页面
D:\>scrapy
Traceback (most recent call last):
File “C:\Python34\lib\site-packages\OpenSSL\__init__.py”, line 15, in <module>
orig = sys.getdlopenflags()
AttributeError: ‘module’ object has no attribute ‘getdlopenflags’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Python34\lib\runpy.py“, line 170, in _run_module_as_main
“__main__”, mod_spec)
File “C:\Python34\lib\runpy.py“, line 85, in _run_code
exec(code, run_globals)
File “C:\Python34\Scripts\scrapy.exe\__main__.py”, line 9, in <module>
File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 121, in execute
cmds = _get_commands_dict(settings, inproject)
File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 45, in _get_comma
nds_dict
cmds = _get_commands_from_module(‘scrapy.commands’, inproject)
File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 28, in _get_comma
nds_from_module
for cmd in _iter_command_classes(module):
File “C:\Python34\lib\site-packages\scrapy\cmdline.py“, line 19, in _iter_comm
and_classes
for module in walk_modules(module_name):
File “C:\Python34\lib\site-packages\scrapy\utils\misc.py“, line 71, in walk_mo
dules
submod = import_module(fullpath)
File “C:\Python34\lib\importlib\__init__.py”, line 109, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “<frozen importlib._bootstrap>”, line 2254, in _gcd_import
File “<frozen importlib._bootstrap>”, line 2237, in _find_and_load
File “<frozen importlib._bootstrap>”, line 2226, in _find_and_load_unlocked
File “<frozen importlib._bootstrap>”, line 1200, in _load_unlocked
File “<frozen importlib._bootstrap>”, line 1129, in _exec
File “<frozen importlib._bootstrap>”, line 1471, in exec_module
File “<frozen importlib._bootstrap>”, line 321, in _call_with_frames_removed
File “C:\Python34\lib\site-packages\scrapy\commands\version.py“, line 6, in <m
odule>
import OpenSSL
File “C:\Python34\lib\site-packages\OpenSSL\__init__.py”, line 17, in <module>
from OpenSSL import crypto
ImportError: DLL load failed: 找不到指定的模块。
看报错,应该是openssl的问题。打开控制面板,发现有2个版本的openssl,卸载了0.11版本,然后执行scrapy,还是一样的错误。怎么办。。。
网上找了半天,没有找到查openssl当前版本的命令。
万般无奈,只能再尝试重新安装service_identity包,执行D:\python\dist\service_identity-14.0.0>python setup.py install。在执行同时,我无意中随便打开一个网页,结果奇迹出现了,download openssl包成功了,原来自动下载的是17.0版本openssl
Processing dependencies for service-identity==14.0.0
Searching for pyopenssl>=0.12
Reading https://pipy.python.org/simple/pyopenssl
Downloading https://pypi.python.org/packages/ee/6a/cd78737dd990297205943cc4dcad3
d3c502807fd2c5b18c5f33dc90ca214/pyOpenSSL-17.3.0.tar.gz#md5=09dcd307b8d2068f9dd5
aaa3a3a88992
Best match: pyOpenSSL 17.3.0
Processing pyOpenSSL-17.3.0.tar.gz
Writing C:\Users\aaa\AppData\Local\Temp\easy_install-3bi4uj87\pyOpenSSL-17.3
.0\setup.cfg
Running pyOpenSSL-17.3.0\setup.py -q bdist_egg –dist-dir C:\Users\aaa\AppDa
ta\Local\Temp\easy_install-3bi4uj87\pyOpenSSL-17.3.0\egg-dist-tmp-ubew1tgt
warning: no previously-included files found matching ‘leakcheck’
warning: no previously-included files matching ‘*.py’ found under directory ‘lea
kcheck’
warning: no previously-included files matching ‘*.pem’ found under directory ‘le
akcheck’
warning: no previously-included files matching ‘*.cert’ found under directory ‘e
xamples\simple’
warning: no previously-included files matching ‘*.pkey’ found under directory ‘e
xamples\simple’
no previously-included directories found matching ‘doc\_build’
no previously-included directories found matching ‘.travis’
no previously-included directories found matching ‘.mention-bot’
zip_safe flag not set; analyzing archive contents…
Copying pyopenssl-17.3.0-py3.4.egg to c:\python34\lib\site-packages
Adding pyopenssl 17.3.0 to easy-install.pth file
Installed c:\python34\lib\site-packages\pyopenssl-17.3.0-py3.4.egg
Searching for cryptography>=1.9
Reading Links for cryptography
Download error on https://pypi.python.org/simple/cryptography/: Tunnel connectio
n failed: 407 Unauthorized — Some packages may not be found!
Couldn’t find index page for ‘cryptography’ (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
Download error on https://pypi.python.org/simple/: Tunnel connection failed: 407
Unauthorized — Some packages may not be found!
No local packages or working download links found for cryptography>=1.9
error: Could not find suitable distribution for Requirement.parse(‘cryptography>
=1.9′)
找到秘诀之后,那就简单了。以上安装,完成了openssl包,不过cryptography没有下载下来,继续执行D:\python\dist\service_identity-14.0.0>python setup.py install,然后同时打开任意网页,有50%概率可以直接download。实在不行,就根据提示,把缺少的包手工下载单独安装。
终于安装完成,提示如下:
D:\python\dist\service_identity-14.0.0>python setup.py install
C:\Python34\lib\distutils\dist.py:260: UserWarning: Unknown distribution option:
‘extra_requires’
warnings.warn(msg)
running install
running bdist_egg
running egg_info
writing top-level names to service_identity.egg-info\top_level.txt
writing requirements to service_identity.egg-info\requires.txt
writing dependency_links to service_identity.egg-info\dependency_links.txt
writing service_identity.egg-info\PKG-INFO
reading manifest file ‘service_identity.egg-info\SOURCES.txt’
reading manifest template ‘Home – Manifest‘
writing manifest file ‘service_identity.egg-info\SOURCES.txt’
installing library code to build\bdist.win32\egg
running install_lib
running build_py
creating build\bdist.win32\egg
creating build\bdist.win32\egg\service_identity
copying build\lib\service_identity\exceptions.py -> build\bdist.win32\egg\servic
e_identity
copying build\lib\service_identity\pyopenssl.py -> build\bdist.win32\egg\service
_identity
copying build\lib\service_identity\_common.py -> build\bdist.win32\egg\service_i
dentity
copying build\lib\service_identity\_compat.py -> build\bdist.win32\egg\service_i
dentity
copying build\lib\service_identity\__init__.py -> build\bdist.win32\egg\service_
identity
byte-compiling build\bdist.win32\egg\service_identity\exceptions.py to exception
s.cpython-34.pyc
byte-compiling build\bdist.win32\egg\service_identity\pyopenssl.py to pyopenssl.
cpython-34.pyc
byte-compiling build\bdist.win32\egg\service_identity\_common.py to _common.cpyt
hon-34.pyc
byte-compiling build\bdist.win32\egg\service_identity\_compat.py to _compat.cpyt
hon-34.pyc
byte-compiling build\bdist.win32\egg\service_identity\__init__.py to __init__.cp
ython-34.pyc
creating build\bdist.win32\egg\EGG-INFO
copying service_identity.egg-info\PKG-INFO -> build\bdist.win32\egg\EGG-INFO
copying service_identity.egg-info\SOURCES.txt -> build\bdist.win32\egg\EGG-INFO
copying service_identity.egg-info\dependency_links.txt -> build\bdist.win32\egg\
EGG-INFO
copying service_identity.egg-info\requires.txt -> build\bdist.win32\egg\EGG-INFO
copying service_identity.egg-info\top_level.txt -> build\bdist.win32\egg\EGG-INF
O
zip_safe flag not set; analyzing archive contents…
creating ‘dist\service_identity-14.0.0-py3.4.egg’ and adding ‘build\bdist.win32\
egg’ to it
removing ‘build\bdist.win32\egg’ (and everything under it)
Processing service_identity-14.0.0-py3.4.egg
Removing c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg
Copying service_identity-14.0.0-py3.4.egg to c:\python34\lib\site-packages
service-identity 14.0.0 is already the active version in easy-install.pth
Installed c:\python34\lib\site-packages\service_identity-14.0.0-py3.4.egg
Processing dependencies for service-identity==14.0.0
Searching for pyopenssl==17.3.0
Best match: pyopenssl 17.3.0
Processing pyopenssl-17.3.0-py3.4.egg
pyopenssl 17.3.0 is already the active version in easy-install.pth
Using c:\python34\lib\site-packages\pyopenssl-17.3.0-py3.4.egg
Searching for pyasn1-modules==0.1.2
Best match: pyasn1-modules 0.1.2
Adding pyasn1-modules 0.1.2 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for pyasn1==0.3.5
Best match: pyasn1 0.3.5
Adding pyasn1 0.3.5 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for characteristic==14.0.0
Best match: characteristic 14.0.0
Processing characteristic-14.0.0-py3.4.egg
characteristic 14.0.0 is already the active version in easy-install.pth
Using c:\python34\lib\site-packages\characteristic-14.0.0-py3.4.egg
Searching for six==1.11.0
Best match: six 1.11.0
Adding six 1.11.0 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for cryptography==1.9
Best match: cryptography 1.9
Adding cryptography 1.9 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for asn1crypto==0.23.0
Best match: asn1crypto 0.23.0
Adding asn1crypto 0.23.0 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for cffi==1.7.0
Best match: cffi 1.7.0
Adding cffi 1.7.0 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for idna==2.6
Best match: idna 2.6
Adding idna 2.6 to easy-install.pth file
Using c:\python34\lib\site-packages
Searching for pycparser==2.18
Best match: pycparser 2.18
Adding pycparser 2.18 to easy-install.pth file
Using c:\python34\lib\site-packages
Finished processing dependencies for service-identity==14.0.0
最后,打开一个cmd窗口,在任意位置执行scrapy命令,得到提示如下,成功。
D:\>scrapy
Scrapy 1.3.2 – no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
commands
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use “scrapy <command> -h” to see more info about a command
D:\>
个人觉得,如果是直接上网,不通过代理服务器,安装scrapy应该不会有那么多麻烦。不过这样一折腾,也算熟悉一下手工安装的步骤。