python学习-网页爬虫报错: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 16338: ordinal not in range(128)

原代码如下:

from bs4 import BeautifulSoup

with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r') as wb_data:
    Soup = BeautifulSoup(wb_data, 'lxml')
    #image = Soup.select('body > div:nth-of-type(2) > div > div.col-md-9 > div:nth-of-type(2) > div:nth-of-type(1) > div > img')
    print (Soup)

运行时出现报错:

/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py
Traceback (most recent call last):
  File "/Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py", line 4, in <module>
    Soup = BeautifulSoup(wb_data, 'lxml')
  File "/usr/local/lib/python3.6/site-packages/bs4/__init__.py", line 191, in __init__
    markup = markup.read()
  File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 16338: ordinal not in range(128)

Process finished with exit code 1

仔细看报错内容,ascii 是美国信息互换标准代码'ascii' codec编码解释器 can't decode解释代码 byte字节 0xc2 in position 16338: ordinal序列 not in range范围内(128)

通过查询得知,这个报错原因是内部代码里面的编码乱码,未按照ascii标准,可能是网页中存在中文字符,这时候,只需要修改第二行代码,添加encoding="gb2312"即可,下方是正确代码:

with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r', encoding="gb2312") as wb_data:

好了,问题解决了。参考网址

    原文作者:时间之友
    原文地址: https://www.jianshu.com/p/2479b530313a
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞