问题场景:
scrapy中的spider如下
# -*- coding=utf-8 -*-
import scrapy
import logging
import json
class Www51jobSpider(scrapy.Spider):
name = 'www51job'
allowed_domains = ['51job.com']
start_urls = [
'http://search.51job.com/jobsearch/search_result.php?fromJs=1&jobarea=020000&keyword=python&keywordtype=2&lang=c&stype=2&postchannel=0000&fromType=1&confirmdate=9'
]
def parse(self, response):
for item in response.xpath('//*[@id="resultList"]/div[@class="el"]'):
# data = item.xpath('p/span/a/text()').extract_first().strip()
# newdata = json.dumps(data, ensure_ascii=False)
yield dict(job_name=item.xpath('p/span/a/text()').extract_first().strip(),
company=item.xpath('span[@class="t2"]/a/text()').extract_first(),
location=item.xpath('span[@class="t3"]/text()').extract_first(),
salary=item.xpath('span[@class="t4"]/text()').extract_first(),
pub_date=item.xpath('span[@class="t5"]/text()').extract_first())
执行如下命令
scrapy crawl www51job -o ../out/www51job.jl
显示问题
{"job_name": "\u6570\u636e\u5206\u6790\u5e08", "company": ......}
scrapy并没有正确读取unicode字符,因此出现上面转义字符’\u’的显示。
解决方法:
settings.py中添加如下一行
FEED_EXPORT_ENCODING = 'utf-8'