写个python脚本下载并解压 MNIST 数据集(1)

【UpdateTime:201706011】

写个python脚本下载并解压 MNIST 数据集

一、本文目的

MNIST之于机器学习&&深度学习,就相当于cout<<“hello world”之于编程(引用于tensorflow教程)。最近刚入门深度学习,当然也不忘学习机器学习,接触了各种MNIST相关的案例。本文的主要贡献是基于python语言编写一个自动下载和解压MNIST的程序,在此整理归纳并分享,后续根据学习情况继续更新。


本文涉及的相关插件,请看脚本最前面的import相关内容。由于本文实验之前安装过多种深度学习的框架,所以一些相关的插件也都已经存在于系统中。倘若读者遇到什么问题,可以根据提示安装相关的插件(pip install xxx)


本文的原理很简单,就是通过如下代码下载数据集(urllib 插件):

filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)
statinfo = os.stat(filepath)

然后通过如下代码解压数据集(uzip):

cmd = ['gzip', '-d', target_path]
print('Unzip ', target_path)
subprocess.call(cmd)

二、环境

1、Ubuntu环境:http://blog.csdn.net/houchaoqun_xmu/article/details/72453187

2、Anaconda2:http://blog.csdn.net/houchaoqun_xmu/article/details/72461592


三、代码

# Copyright 20170611 . All Rights Reserved.
# Prerequisites:
# Python 2.7
# gzip, subprocess, numpy
# 
# ==============================================================================
"""Functions for downloading and uzip MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gzip
import subprocess
import os
import numpy
from six.moves import urllib

def maybe_download(filename, data_dir, SOURCE_URL):
	"""Download the data from Yann's website, unless it's already here."""
	filepath = os.path.join(data_dir, filename)
	if not os.path.exists(filepath):
		filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)
		statinfo = os.stat(filepath)
		print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')


def check_file(data_dir):
	if os.path.exists(data_dir):
		return True
	else:
		os.mkdir(data_dir)
		return False

def uzip_data(target_path):
	# uzip mnist data
	cmd = ['gzip', '-d', target_path]
	print('Unzip ', target_path)
	subprocess.call(cmd)

def read_data_sets(data_dir):
	if check_file(data_dir):
		print(data_dir)
		print('dir mnist already exist.')

		# delete the dir mnist
		cmd = ['rm', '-rf', data_dir]
		print('delete the dir', data_dir)
		subprocess.call(cmd)
		os.mkdir(data_dir)

	
	SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
	data_keys = ['train-images-idx3-ubyte.gz', 'train-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz', 't10k-labels-idx1-ubyte.gz']
	for key in data_keys:
		if os.path.isfile(os.path.join(data_dir, key)):
			print("[warning...]", key, "already exist.")
		else:
			maybe_download(key, data_dir, SOURCE_URL)

	# uzip the mnist data.
	uziped_data_keys = ['train-images-idx3-ubyte', 'train-labels-idx1-ubyte', 't10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte']
	for key in uziped_data_keys:
		if os.path.isfile(os.path.join(data_dir, key)):
			print("[warning...]", key, "already exist.")
		else:
			target_path = os.path.join(data_dir, key)
			uzip_data(target_path)
		

if __name__ == '__main__':
	print("===== running - input_data() script =====")
	read_data_sets("./mnist")
	print("=============   =============")

打开终端执行如下命令:

python get_mnist.py

效果如下所示:

《写个python脚本下载并解压 MNIST 数据集(1)》

代码下载地址:http://download.csdn.net/detail/houchaoqun_xmu/9867456

四、相关文献

Activation-Visualization-Histogram:https://github.com/shaohua0116/Activation-Visualization-Histogram

MNIST机器学习入门:http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_beginners.html

Python读取mnist:http://blog.csdn.net/mmmwhy/article/details/62891092

Tesnorflow下载MNIST手写数字识别数据集的python代码:http://download.csdn.net/detail/yhhyhhyhhyhh/9738704

batch处理的MNIST代码(tensorflow_GPU):http://download.csdn.net/detail/houchaoqun_xmu/9851221


    原文作者:Houchaoqun_XMU
    原文地址: https://blog.csdn.net/Houchaoqun_XMU/article/details/73057257
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞