由于发现网上大部分tensorflow的RNN教程都过于简答或者复杂,所以尝试一下从简单到深的在TF中写出RNN代码,这篇文章主要参考打是TensorFlow人工智能引擎入门教程之九 RNN/LSTM循环神经网络长短期记忆网络使用中使用的代码,但是由于代码版本较为古老,所以TF报错,参考解读tensorflow之rnn 对代码进行修改和实现,第一版实现来一个最简单打RNN模型。
RNN原理见参考资料
由于本次实验在jupyter中完成的,所以部分图片和输出不好更如知乎中,好一点的版本见:RNNStudy/simpleRNN.ipynb
记录步骤如下:
引入相关包
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
#from tensorflow.nn import rnn, rnn_cell
import numpy as np
先来看输入数据,本次用打输入数据是MNIST打数据可以看到如下
print '输入数据:'
print mnist.train.images
print '输入数据打shape:'
print mnist.train.images.shape
可以看到其中784是图据28×28像素打图像,将其转化成图像观察一下如下图所示,
%pylab inline
%matplotlib inline
import pylab
im = mnist.train.images[1]
im = im.reshape(-1,28)
pylab.imshow(im)
pylab.show()
如果我们要用RNN来训练这个网络打话,则应该选择n_input = 28 ,n_steps = 28结构
a= np.asarray(range(20))
b = a.reshape(-1,2,2)
print '生成一列数据'
print a
print 'reshape函数的效果'
print b
c = np.transpose(b,[1,0,2])
d = c.reshape(-1,2)
print '--------c-----------'
print c
print '--------d-----------'
print d
定义一些模型打参数
''' To classify images using a reccurent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 steps for every sample. '''
# Parameters
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 100
# Network Parameters
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)
构建RNN打函数可以参考 :Neural Network开始我们先创建两个占位符placeholder,基本使用可以参考官方文档:基本使用 – TensorFlow 官方文档中文版
# tf Graph input
x = tf.placeholder("float32", [None, n_steps, n_input])
# Tensorflow LSTM cell requires 2x n_hidden length (state & cell)
y = tf.placeholder("float32", [None, n_classes])
# Define weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])), # Hidden layer weights
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
首先创建一个CELL这里需要打一个参数是隐藏单元打个数n_hidden,在创建完成后对其进行初始化
这里会造成一个BUG,后面说道
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(n_hidden, forget_bias=0.0, state_is_tuple=True)
_state = lstm_cell.zero_state(batch_size,tf.float32)
为了使得 原始数据打输入和模型匹配,我们对数据进行一系列变换,变换打结果如下,数据变化可以参考上面打小实验
a1 = tf.transpose(x, [1, 0, 2])
a2 = tf.reshape(a1, [-1, n_input])
a3 = tf.matmul(a2, weights['hidden']) + biases['hidden']
a4 = tf.split(0, n_steps, a3)
print '-----------------------'
print 'a1:'
print a1
print '-----------------------'
print 'a2:'
print a2
print '-----------------------'
print 'a3:'
print a3
print '-----------------------'
print 'a4:'
print a4
为了使得 原始数据打输入和模型匹配,我们对数据进行一系列变换,变换打结果如下这里主要是为了匹配tf.nn.rnn遮盖函数,函数可参考官方文档:Neural Network或者前面解读RNN那篇解读tensorflow之rnn
outputs, states = tf.nn.rnn(lstm_cell, a4, initial_state = _state)
print 'outputs[-1]'
print outputs[-1]
print '-----------------------'
a5 = tf.matmul(outputs[-1], weights['out']) + biases['out']
print 'a5:'
print a5
print '-----------------------'
定义cost,使用梯度下降求最优
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(a5, y))
#AdamOptimizer
#optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer
correct_pred = tf.equal(tf.argmax(a5,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
init = tf.initialize_all_variables()
进行模型训练,这里需要注意,由于我使用打是Jupyter,采取来交互式环境,所以在普通py中sess = tf.InteractiveSession() 这一句不一定正确,需要自己修改为tf.Session()
sess = tf.InteractiveSession()
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_xs = batch_xs.reshape((batch_size, n_steps, n_input))
# Fit training using batch data
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
if step % display_step == 0:
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys,})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys})
print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc)
step += 1
print "Optimization Finished!"
测试模型准确率
test_len = batch_size
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
# Evaluate model
correct_pred = tf.equal(tf.argmax(a5,1), tf.argmax(y,1))
print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: test_label})
在这里测试准确率有一个BUG,test_len 必须和batch_size相等,这是由于前面在初始化模型打时候选择batch_size作为参数,导致a5输出一直是一个batch_size行打矩阵,若est_len 和batch_size不想等,accuracy计算会报错。 由于暂时没想到简单打解决方法,所以待下次处理。
python参考资料: