Python爬虫练手之爬句子迷

缘由

《北京遇上西雅图2不二情书》上映其实很久了,然而,最近才有时间从网上拖下来看(原谅,我们这破旧的小地方没有电影院这个设施)。发现里面的句子还是不错的,所有想弄下来研读一下。刚好,Python很适合最这个(ps:其实我也就只懂这个)

环境

windows,Python2.x,requests,BeautifulSoup

代码

#!/usr/bin/python
# -*- coding: utf-8 -*-
# 获取经典句子

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0',}

def get_html(url):
    r = requests.get(url,headers = headers)
    html = r.content
    return html

def get_juzi(html):
    soup =BeautifulSoup(html, "lxml")
    juzilist = soup.find_all('a',class_="xlistju")
    for x in juzilist:
        print x.get_text().encode('utf-8')
        print

def get_title(html):
    soup =BeautifulSoup(html, "lxml")
    print soup.title.get_text().encode('utf-8').replace('_句子迷','')

if __name__ == '__main__':
    # url = 'http://www.juzimi.com/article/316132?page=0' url 的模式
    for item in range(8):  #这里是手动模式 ^_^
        url = 'http://www.juzimi.com/article/316132?page=%s' % item
        html = get_html(url)
        if item == 0:
            get_title(html)
        get_juzi(html)

结束语

喜欢的话,欢迎关注,收藏,谢谢!

    原文作者:爱要趁早
    原文地址: https://www.jianshu.com/p/aa4c74ab5d0e
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞