如何从BeautifulSoup中的网页获取文件的大小

2019年8月3日 203次阅读

我在
Python中使用BeautifulSoup.

我想从网页上获取可下载文件的大小.例如,this页面有一个下载txt文件的链接(通过点击“保存”).如何获取该文件的大小(以字节为单位)(最好不要下载)？

如果BeautifulSoup中没有选项,那么请在Python内外建议其他选项.

最佳答案使用
requests程序包,您可以向提供文本文件的URL发送HEAD请求,并检查标题中的Content-Length：

>>> url = "http://cancer.jpl.nasa.gov/fmprod/data?refIndex=0&productID=02965767-873d-11e5-a4ea-252aa26bb9af"
>>> res = requests.head(url)
>>> res.headers
{'content-length': '944', 'content-disposition': 'attachment; filename="Lab001_A_R03.txt"', 'server': 'Apache-Coyote/1.1', 'connection': 'close', 'date': 'Thu, 19 May 2016 05:04:45 GMT', 'content-type': 'text/plain; charset=UTF-8'}
>>> int(res.headers['content-length'])
944

如您所见,尺寸与the page中提到的相同.