Python爬虫练手之爬句子迷

2023年3月24日 261次阅读来源: 爱要趁早

缘由

《北京遇上西雅图2不二情书》上映其实很久了，然而，最近才有时间从网上拖下来看（原谅，我们这破旧的小地方没有电影院这个设施）。发现里面的句子还是不错的，所有想弄下来研读一下。刚好，Python很适合最这个（ps:其实我也就只懂这个）

环境

windows，Python2.x，requests，BeautifulSoup

代码

#!/usr/bin/python
# -*- coding: utf-8 -*-
# 获取经典句子

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0',}

def get_html(url):
    r = requests.get(url,headers = headers)
    html = r.content
    return html

def get_juzi(html):
    soup =BeautifulSoup(html, "lxml")
    juzilist = soup.find_all('a',class_="xlistju")
    for x in juzilist:
        print x.get_text().encode('utf-8')
        print

def get_title(html):
    soup =BeautifulSoup(html, "lxml")
    print soup.title.get_text().encode('utf-8').replace('_句子迷','')

if __name__ == '__main__':
    # url = 'http://www.juzimi.com/article/316132?page=0' url 的模式
    for item in range(8):  #这里是手动模式 ^_^
        url = 'http://www.juzimi.com/article/316132?page=%s' % item
        html = get_html(url)
        if item == 0:
            get_title(html)
        get_juzi(html)

结束语

喜欢的话，欢迎关注，收藏，谢谢！

    原文作者：爱要趁早
    原文地址: https://www.jianshu.com/p/aa4c74ab5d0e
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。