原代码如下:
from bs4 import BeautifulSoup
with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r') as wb_data:
Soup = BeautifulSoup(wb_data, 'lxml')
#image = Soup.select('body > div:nth-of-type(2) > div > div.col-md-9 > div:nth-of-type(2) > div:nth-of-type(1) > div > img')
print (Soup)
运行时出现报错:
/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py
Traceback (most recent call last):
File "/Users/jkxuan/Desktop/1_2answer_of_homework/1.2.py", line 4, in <module>
Soup = BeautifulSoup(wb_data, 'lxml')
File "/usr/local/lib/python3.6/site-packages/bs4/__init__.py", line 191, in __init__
markup = markup.read()
File "/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 16338: ordinal not in range(128)
Process finished with exit code 1
仔细看报错内容,ascii 是美国信息互换标准代码'ascii' codec编码解释器 can't decode解释代码 byte字节 0xc2 in position 16338: ordinal序列 not in range范围内(128)
通过查询得知,这个报错原因是内部代码里面的编码乱码,未按照ascii标准,可能是网页中存在中文字符,这时候,只需要修改第二行代码,添加encoding="gb2312"
即可,下方是正确代码:
with open('/Users/jkxuan/Desktop/1_2answer_of_homework/1_2_homework_required/index.html', 'r', encoding="gb2312") as wb_data:
好了,问题解决了。参考网址