Python BeautifulSoup到csv抓取

2024年2月2日 231次阅读

我试图从
HTML页面中删除一些简单的字典信息.到目前为止,我能够在IDE上打印我需要的所有单词.我的下一步是将单词转换为数组.我的最后一步是将数组保存为csv文件…当我运行我的代码时,似乎在第1309或第1311字之后停止获取信息,尽管我相信网页上有超过100万.我被困住了,非常感谢任何帮助.谢谢

from bs4 import BeautifulSoup
from urllib import urlopen
import csv

html = urlopen('http://www.mso.anu.edu.au/~ralph/OPTED/v003/wb1913_a.html').read()

soup = BeautifulSoup(html,"lxml")

words = []

for section in soup.findAll('b'):

    words.append(section.renderContents())

print ('success')
print (len(words))

myfile = open('A.csv', 'wb')
wr = csv.writer(myfile)
wr.writerow(words)

最佳答案我无法重现问题(总是得到11616项),但我怀疑你已经安装了过时的beautifulsoup4或lxml版本.升级：

pip install --upgrade beautifulsoup4
pip install --upgrade lxml

当然,这只是一个理论.