python下载ICCV2017全套paper

2023年5月11日 211次阅读来源: Best July

python下载CVPR2017全套paper

发现直接把我之前写的下载CVPR的那段代码中的CVPR改成ICCV就可以运行了，代码内容的一些解释在原文章中。

代码默认下载到‘E:\study\papers\ICCV2017\’文件夹下，文件名称为文章的全称，有需要的可以按照自己需求修改代码里的localDir变量。

以下是基于python 2.7的使用的代码：

# coding:utf-8
import re
import requests
import urllib
import os
# get web context
r = requests.get('http://openaccess.thecvf.com/ICCV2017.py')
data = r.text
# find all pdf links
link_list =re.findall(r"(?<=href=\").+?pdf\">pdf|(?<=href=\').+?pdf\">pdf" ,data)
name_list =re.findall(r"(?<=href=\").+?2017_paper.html\">.+?</a>" ,data)
cnt = 0
totalnum = len(link_list)
# your local path to download pdf files
localDir = 'E:\study\papers\ICCV2017\\'
if not os.path.exists(localDir):
    os.makedirs(localDir)
# for url in link_list:
while cnt < totalnum:
    url = link_list[cnt]
    url = url[0:-5]
    #seperate file name from url links
    file_name = name_list[cnt].split('<')[0].split('>')[1]
    file_name = file_name.replace(':','_')
    file_name = file_name.replace('\"','_')
    file_name = file_name.replace('?','_')
    file_name = file_name.replace('/','_')
    file_path = localDir + file_name + '.pdf'
    print file_name
    # download pdf files
    try:
        urllib.urlretrieve('http://openaccess.thecvf.com/'+url,file_path)
        # os.symtem('wget '+url+' -O '+file_path)
        print "downloading:"+url+" -> "+file_path  
        print "Downloading %s/%s" % (cnt, totalnum)
    except Exception,e:
        continue
    cnt = cnt + 1
print "all download finished"

为了方便大家下载，上传了一份百度云，链接：http://pan.baidu.com/s/1eRLved8 密码：ktum。

—-

评论区的大神 @想飞的石头指出可以使用wget命令一键下载，试了一下可以运行。这里是脚本命令：

wget --recursive --level=1 --no-directories --no-host-directories --accept pdf http://openaccess.thecvf.com/ICCV2017.py

该方法下载下来的名称格式为：文章第一作者+标题前3个单词。

    原文作者：Best July
    原文地址: https://zhuanlan.zhihu.com/p/30420402
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。