Python爬虫下载PDF文件

requests库

def get_file_content(date,files):
    time = date[0:4] + date[5:7]
    file_name = files[0][1]
    suburl = homepage + time + r'/' + files[0][0]     # 拼接出正确的URL
    r = requests.get(suburl)
    fo = open(file_name,'wb')                         # 注意要用'wb',b表示二进制,不要用'w'
    fo.write(r.content)                               # r.content -> requests中的二进制响应内容:以字节的方式访问请求响应体,对于非文本请求
    fo.close()

urllib

 u = urllib.request.urlopen(suburl)
    f = open(file_name, 'wb')

    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        f.write(buffer)
    f.close()
    原文作者:努力敲代码的竹子
    原文地址: https://blog.csdn.net/sinat_38944746/article/details/79126124
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞