我正试图从韩国网站上提取股票价格和市值数据.
这是我的代码:
import requests
from bs4 import BeautifulSoup
response = requests.get('http://finance.naver.com/sise/sise_market_sum.nhn?sosok=0&page=1')
html = response.text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', { 'class': 'type_2' })
data = []
for tr in table.find_all('tr'):
tds = list(tr.find_all('td'))
for td in tds:
if td.find('a'):
company_name = td.find('a').text
price_now = tds[2].text
market_cap = tds[5].text
data.append([company_name, price_now, market_cap])
print(*data, sep = "\n")
这就是我得到的结果. (抱歉韩文字符)
[‘삼성전자’, ‘43,650’, ‘100’]
[”, ‘43,650’, ‘100’]
[‘SK하이닉스’, ‘69,800’, ‘5,000’]
[”, ‘69,800’, ‘5,000’]
结果中的第二行和第四行不应该存在.我只想要第一行和第三行.第二和第四行来自哪里,如何摆脱它们?
最佳答案 亲爱的朋友,我认为问题是你应该检查td.find(‘a’).文本是否有值!
所以我将你的代码改为此,它的工作原理!
import requests
from bs4 import BeautifulSoup
response = requests.get(
'http://finance.naver.com/sise/sise_market_sum.nhn?sosok=0&page=1')
html = response.text
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', {'class': 'type_2'})
data = []
for tr in table.find_all('tr'):
tds = list(tr.find_all('td'))
for td in tds:
# where magic happends!
if td.find('a') and td.find('a').text:
company_name = td.find('a').text
price_now = tds[2].text
market_cap = tds[5].text
data.append([company_name, price_now, market_cap])
print(*data, sep="\n")