前言
需求:将请求不是200的url抓下来保存到本地记录
方法:在scrapy的middlewares中创建一个中间件,对response.status状态不为200的url收集下来
middleware中设置方法
class GetFailedUrl(object):
def process_response(self,response,request,spider):
if response.status != 200:
name = time.strftime('%Y-%m-%d %H:%M',time.localtime())
with open (str(name),'w+') as file:
file.write(response.url)
return response
else:
return response