记一次有意思的python爬虫

使用python爬虫分析京东购物数据

分析女士购买内衣数据

1、首先获取评论数据
url= “https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10
2、然后分析返回数据格式

{
"id":11577732545,
"topped":0,
"guid":"6f8c009f-bb28-404a-8bbb-8e2d2c1b67b0",
"content":"是夏天穿的薄款,就是感觉聚拢效果没说的那么好",
"creationTime":"2018-06-10 21:12:45",
"isTop":false,
"referenceId":"11565382120",
"referenceImage":"jfs/t21367/46/2125692555/126045/5500b502/5b480df9Nc703f5cd.jpg",
"referenceName":"都市丽人文胸大码内衣性感聚拢无痕无钢圈深v透气薄款洞洞杯调整上托胸罩 2B7513 紫灰 75B/34",
"referenceTime":"2018-06-01 15:57:58",
"referenceType":"Product",
"referenceTypeId":0,
"firstCategory":1315,
"secondCategory":1345,
"thirdCategory":1364,
"replies":[],
"replyCount":1,
"replyCount2":1,
"score":5,
"status":1,
"title":"",
"usefulVoteCount":0,
"uselessVoteCount":0,
"userImage":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userImageUrl":"misc.360buyimg.com/user/myjd-2015/css/i/peisong.jpg",
"userLevelId":"56",
"userProvince":"",
"viewCount":0,
"orderId":0,
"isReplyGrade":false,
"nickname":"Z***t",
"productColor":"黑色",
"productSize":"75B/34",
"userClientShow":"来自京东Android客户端",
"userLevelName":"铜牌会员",
"userClient":4,
"images":[]
}

然后分析用户购买数据中的产品颜色、客户端、size等信息

之后我们使用python的图标化工具将数据进行可视化展示
购买颜色排行
《记一次有意思的python爬虫》
可以看出夏天出于防走光的目的,购买肤色及黑色的最多,奇怪为什么没人买粉色,萌萌哒不好看么

size排行
《记一次有意思的python爬虫》
很遗憾的是,电影都是骗人的,妹子们的size主要集中在75B和80B,(具体是多大我也不清楚~~~)

手机与size排行
《记一次有意思的python爬虫》
这个是手机与妹子size的一个数据分析,可以看出拿iphone的妹子,貌似size不如拿Android手机的妹子大,,

评论词云
《记一次有意思的python爬虫》

核心代码

class BraSpider(object):
	
	base_url = "https://sclub.jd.com/comment/productPageComments.action?productId=11565382115&score=%d&sortType=5&page=%d&pageSize=10"
	def parse_comment(self, response, ret):
		content = json.loads(response.text)
		comments = content['comments']
		i = len(ret) + 1
		for comment in comments:
			item = {}
			#item['content'] = comment['content']
			#item['guid'] = comment['guid']
			#item['id'] = comment['id']
			#item['time'] = comment['referenceTime']
			item['color'] = comment['productColor']
			item['size'] = comment['productSize']
			item['userClientShow'] = comment['userClientShow']
			ret.insert(i, item)
			i = i + 1


	def start_requests(self):
		comments_ret = []
		hot_tag_ret = {}
		ret = {}
		for page in range(1,150):
			for i in range(0,6):
				url = self.base_url % (i, page)
				response = requests.get(url)
				if response.status_code == 200:
					self.parse_comment(response, comments_ret)

		ret['comments'] = comments_ret
		ret['tag'] = hot_tag_ret
		return ret

github连接地址

https://github.com/libinbin-1014/python-study/blob/master/bar/bar-scrapy.py
    原文作者:libinbin_1014
    原文地址: https://blog.csdn.net/libinbin_1014/article/details/81588435
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞