tensorflow初探：mlp识别mnist，准确率98%

2024年5月27日 94次阅读来源: weixin_38286298

tensorflow初探:mlp识别mnist

最近在看tensorflow学习课，课上介绍了mlp识别mnist，最终准确率为91%，老师留作业要求将结果优化到95%。主要通过以下几个方面的优化实现：

优化

增加隐层，加入激活函数
参数初始化方式，可以试试不全为0 ，发现全0全1比正态分布精度高，第一层w用0，其他参数用正态分布精度更高，这里有个问题，当第一层隐层的w和b都初始化为0的时候，会导致网络没法训练，准确率一直在0.13不变，第一层的w和b一直为0，不更新。
从均方误差换成交叉熵损失函数可以增加精度
更改batch_size，发现太大反而精度降低，变小反而增高一些，64时表现较好
可以修改优化器，不适用梯度下降法，使用adam或者moment（常用的优化器： tf.train.GradientDescentOptimizer、tf.train.MomentumOptimizer、tf.train.AdamOptimizer）

结果

最终，我们加入了一层100个神经元的隐层并使用tanh损失函数，修改batch_size=64，使用了交叉熵损失函数，使得准确率在20轮内提高到97%。

代码实现

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = mnist.read_data_sets("MNIST",one_hot=True)
#每个批次大小
batch_size = 64
#计算有多少批次
n_batch = mnist.train.num_examples // batch_size
#定义占位符
x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])

#定义隐层，初始化w，b，使用relu做激活函数
W_L1 = tf.Variable(tf.zeros([784,100]))
b_L1 = tf.Variable(tf.random.normal([100]))
WL1_plus_b = tf.matmul(x,W_L1) + b_L1
L1 = tf.nn.relu(WL1_plus_b)

#prediction = tf.nn.softmax(tf.matmul(x,W)+b)
W_L2 = tf.Variable(tf.random.normal([100,10]))
b_L2 = tf.Variable(tf.random.normal([10]))
WL2_plus_b = tf.matmul(L1,W_L2) + b_L2

prediction = WL2_plus_b


#定义损失函数，用交叉熵
#loss = tf.reduce_mean(tf.square(y-prediction))
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#定义梯度下降法
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
#初始化变量
init = tf.global_variables_initializer()
#返回预测结果是否正确
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
#求准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(21):
        for batch in range(n_batch):
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step,feed_dict={ x:batch_xs,y:batch_ys})
       # print(sess.run(WL2_plus_b,feed_dict={x:mnist.test.images}))
        acc = sess.run(accuracy,feed_dict={ x:mnist.test.images,y:mnist.test.labels})
        print("Iter "+str(epoch) + ",testring acc" +str(acc))

最终结果

损失函数是relu时：

Iter 0,testring acc0.9082
Iter 1,testring acc0.9283
Iter 2,testring acc0.936
Iter 3,testring acc0.9372
Iter 4,testring acc0.9399
Iter 5,testring acc0.9402
Iter 6,testring acc0.9367
Iter 7,testring acc0.9438
Iter 8,testring acc0.9461
Iter 9,testring acc0.9498
Iter 10,testring acc0.9483
Iter 11,testring acc0.9474
Iter 12,testring acc0.9479
Iter 13,testring acc0.9495
Iter 14,testring acc0.9503
Iter 15,testring acc0.9529
Iter 16,testring acc0.9538
Iter 17,testring acc0.9543
Iter 18,testring acc0.9541
Iter 19,testring acc0.9505
Iter 20,testring acc0.9501

在第14轮的时候就收敛到了95%。
如果将损失函数换成tanh或者sigmoid：

Iter 0,testring acc0.9275
Iter 1,testring acc0.9437
Iter 2,testring acc0.9527
Iter 3,testring acc0.9574
Iter 4,testring acc0.961
Iter 5,testring acc0.9637
Iter 6,testring acc0.9666
Iter 7,testring acc0.9691
Iter 8,testring acc0.9704
Iter 9,testring acc0.9705
Iter 10,testring acc0.9699
Iter 11,testring acc0.9704
Iter 12,testring acc0.9712
Iter 13,testring acc0.973
Iter 14,testring acc0.9705
Iter 15,testring acc0.9732
Iter 16,testring acc0.9737
Iter 17,testring acc0.9731
Iter 18,testring acc0.9732
Iter 19,testring acc0.9733
Iter 20,testring acc0.9736

效果显然更好，tanh效果和sigmoid相似

总结

参数的初始化很重要，根据优化方法的选择不同，初始化的方法也不同，但上述出现的第一层隐层w和b初始化为全0后，网络居然无法学习，这点没有想通，手动求导，反向传播了一波也没发现会导致无法学习啊。求大佬指教。常用的w的初始化方式为 tf.truncated_normal，截断式正太分布
不同激活函数导致的效果可能不同，本次实验中发现sigmoid和tanh效果优于relu。
batch_size会对结果有着影响：
1. 在合理范围内，增大 Batch_Size 有何好处？
  内存利用率提高了，大矩阵乘法的并行化效率提高。
  跑完一次 epoch（全数据集）所需的迭代次数减少，对于相同数据量的处理速度进一步加快。
  在一定范围内，一般来说 Batch_Size 越大，其确定的下降方向越准，引起训练震荡越小。
2. 盲目增大 Batch_Size 有何坏处？
  内存利用率提高了，但是内存容量可能撑不住了。
  跑完一次 epoch（全数据集）所需的迭代次数减少，要想达到相同的精度，其所花费的时间大大增加了，从而对参数的修正也就显得更加缓慢。
  Batch_Size 增大到一定程度，其确定的下降方向已经基本不再变化。

后续

在将w和b的初始化方式改变后，将隐层神经元增加到300，再将优化器改为adam后，其准确率可以达到98%。

batch_size = 100
#计算有多少批次
n_batch = mnist.train.num_examples // batch_size

x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])

#创建简单的nn
W_L1 = tf.Variable(tf.truncated_normal([784,300],stddev=0.1))

b_L1 = tf.Variable(tf.zeros([300])+0.1)
WL1_plus_b = tf.matmul(x,W_L1) + b_L1
L1 = tf.nn.relu(WL1_plus_b)

W_L2 = tf.Variable(tf.truncated_normal([300,10]))
b_L2 = tf.Variable(tf.zeros([10])+0.1)
WL2_plus_b = tf.matmul(L1,W_L2) + b_L2

prediction = WL2_plus_b

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))

lr = tf.Variable(0.001,dtype=tf.float32)
train_step =  tf.train.AdamOptimizer(lr).minimize(loss)

#初始化变量
init = tf.global_variables_initializer()
#返回预测结果是否正确
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
#求准确率
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
  sess.run(init)
  for epoch in range(51):
      sess.run(tf.assign(lr,0.001*(0.95**epoch)))
      for batch in range(n_batch):
          batch_xs,batch_ys = mnist.train.next_batch(batch_size)
          sess.run(train_step,feed_dict={ x:batch_xs,y:batch_ys})
     # print(sess.run(WL2_plus_b,feed_dict={x:mnist.test.images}))
      acc = sess.run(accuracy,feed_dict={ x:mnist.test.images,y:mnist.test.labels})
      print("Iter "+str(epoch) + ",testring acc" +str(acc))

参考 [1]: https://www.cnblogs.com/alexanderkun/p/8099450.html
课程资源 [2]:https://www.bilibili.com/video/av20542427/?p=10

    原文作者：weixin_38286298
    原文地址: https://blog.csdn.net/weixin_38286298/article/details/90317149
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。