python3爬虫超简单实例

2019年6月16日 110次阅读来源: fanren224

网站入口：http://wise.xmu.edu.cn/people/faculty
爬取信息：姓名和主页地址
python版本：3.5

import requests

r = requests.get('http://www.wise.xmu.edu.cn/people/faculty')
html = r.content

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'xml')


div_people_list = soup.find('div', attrs={'class': 'people_list'})
a_s = div_people_list.find_all('a', attrs={'target': '_blank'})

for a in a_s:
    url = a['href']
    name = a.get_text()
    print(name, url)

输出：

                         敖萌幪 /people/faculty/494d4f1c-0470-4f53-8b7c-d3594241876b.html

                        Bowers, Roslyn /people/faculty/d01fe119-7980-4238-a3ec-abb9b66ec706.html

                        Brown, Katherine /people/faculty/36c6b263-2cc2-4682-9975-02b75e6505f7.html

                        鲍小佳 /people/faculty/bdc3fd77-84de-4020-846d-344e02f110e9.html

                        Chang, Seong Yeon /people/faculty/0534965d-6393-4e22-a6bb-6ac3b11fe431.html

                        蔡熙乾 /people/faculty/95d97944-beb6-4a47-af85-a0778e1788b2.html

原文地址：https://zhuanlan.zhihu.com/p/21377121

    原文作者：fanren224
    原文地址: https://blog.csdn.net/fanren224/article/details/72817028
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。