代码调试是一项非常繁琐、很有挑战性的任务。然而,你还是得熟悉自己写的代码,找出问题。通常,会有一些代码调试指南,而且不少语言和框架的调试过程都有不错的文档记录。
不过,对于 TensorFlow 来说,它的工作方式却给我们调试代码带来了一些新的麻烦。
TensorFlow 的官方文档写道:
一个 TensorFlow Core 程序包含两个分离部分:
- 创建计算图(.tf Graph)
- 运行计算图(使用tf.Session)
实际计算使用 session.run() 完成,意味着我们需要找到一种方法来检查函数中的值。
参考代码
作为参考,我们使用该 GitHub 仓库中的对应代码
我们会用一个基本的神经网络来分类 MNIST 数据集中的手写数字,使用:
- tf.nn.softmax_cross_entropy_with_logits_v2 作为 TF 分类操作,用于定义损失
- tf.train.GradientDescentOptimizer 用于将损失最小化
运行这个小型神经网络后,显示它的准确度达到了 92% 左右。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)
# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")
# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
y = tf.matmul(x, W) + b
# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=y_, logits=y, name="lossFunction"
),
name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")
# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()
# ------ Set Session or InteractiveSession
sess = tf.Session()
sess.run(init)
# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
test_accuracy = sess.run(
accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}
)
print("Test Accuracy: {}%".format(test_accuracy * 100.0))
sess.close()
调试过程
现在我们开始调试代码,基本上有 5 种方法达到目的。
- 在 Session.run 内获取和打印值
这可能是你获取信息最快最容易的一种方法。
- 既简单又快捷
- 不管从哪里都能获取任何求值
本质上,你是在 print 语句中运行 session,向其输入字典,像这样:
print( f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}" )
如果代码更为复杂些,可以应用 session 的 partial_run 执行。但由于这是一种实验特性,这里不再进一步实现展示给大家看了。
另外,不要忘了特别用于评估张量的 .eval() 方法。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)
# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")
# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)
# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=y_, logits=y, name="lossFunction"
),
name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")
# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()
# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)
# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
print("=====================================")
print(
f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)
print(
f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
print(
f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)
sess.close()
2.使用 tf.Print 操作
在运行时求值时,tf.Print 方法用起来非常方便,因为这时我们不想用 session.run() 显式地取用代码。这是一个 identity 操作,在求值时会打印出数据。
- 它能让我们查看求值期间的值的变化
- 它对配置的要求有限,所以能很容易地 clog 终端
谷歌云 AI 团队成员 Yufeng Guo 写过一篇很不错的文章,讲解了如何使用 tf.Print 语句。他指出:
你实际上使用返回的节点是非常重要的,因为如果你没这么做,会很不稳定。
在我们这段代码中,添加了一个 print 语句获取 session 内的值,用以展示执行中这两种方法分不同之处。
通过运行时求值,我们还能使用 tf.Assert 进行运行时声明。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)
# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")
# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)
# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=y_, logits=y, name="lossFunction"
),
name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")
# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()
# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)
# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if i % 20 == 0:
loss = tf.Print(loss, [loss], message="loss")
loss.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print(
f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}%"
)
# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
print("============================================")
print(
f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
sess.close()
3.使用 TensorBoard 可视化进行监督
在深入了解这种调试方法之前,注意 TensorBoard 和 TensorBoard 调试工具这两样东西。
TensorFlow 官网上有篇很棒的教程,讲了怎么实现它和使用 TensorBoard
用法的关键就是数据的序列化。TensorFlow 提供总结性的操作,能让你导出模型的压缩后信息,它们就像锚点一样告诉可视化面板绘制什么样的图。
a)使用恰当的名称和名称作用域清理计算图
首先我们需要使用 TensorFlow 提供的所有作用域方法将全部变量和运算组织起来。
with tf.name_scope("variables_scope"):
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
b)添加 tf.summaries
例如:
with tf.name_scope("weights_scope"):
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
tf.summary.histogram("weight_histogram", W)
c)添加一个 tf.summary.FileWriter 创建日志文件
Tips:一定要为每个日志创建一个子文件夹,以避免计算图挤在一起
d)从你的终端启动 TensorBoard 服务器
例如:
tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1
导航至 TensorBoard 服务器(这里是http://127.0.0.1:8090)显示如下内容:
现在 TensorBoard 的强大之处就很明显了。它能让你很容易地发现机器学习模型中存在的错误。我们这里的代码示例比较简单,假如是个有很多网络层和参数的复杂模型,如果没有这个功能,调试模型会有多麻烦。
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from datetime import datetime
# Create a subfolder for each log
subFolder = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = f"./tfb_logs/{subFolder}/"
# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)
# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
tf.summary.image("image_input", tf.reshape(x, [-1, 28, 28, 1]), 3)
with tf.name_scope("weights_scope"):
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
tf.summary.histogram("weight_histogram", W)
with tf.name_scope("bias_scope"):
b = tf.Variable(tf.zeros([10]), name="bias_variable")
tf.summary.histogram("bias_histogram", b)
# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
with tf.name_scope("yReal_scope"):
y = tf.add(tf.matmul(x, W), b, name="y_calculated")
tf.summary.histogram("yReal_histogram", y)
assert y.get_shape().as_list() == [None, 10]
# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=y_, logits=y, name="lossFunction"
),
name="loss",
)
with tf.name_scope("training_scope"):
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
loss, name="gradDescent"
)
tf.summary.histogram("loss_histogram", loss)
tf.summary.scalar("loss_scalar", loss)
# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
tf.summary.histogram("accurace_scalar", accuracy)
tf.summary.scalar("accurace_scalar", accuracy)
# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()
# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)
# TensorBoard - Write the default graph out so we can view it's structure
merged_summary_op = tf.summary.merge_all()
tbWriter = tf.summary.FileWriter(logdir)
tbWriter.add_graph(sess.graph)
# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
if i % 5 == 0:
summary = sess.run(merged_summary_op, feed_dict={x: batch_xs, y_: batch_ys})
tbWriter.add_summary(summary, i)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
print("============================================")
print(
f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
sess.close()
# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1
4.使用 TensorBoard 调试工具
TensorFlow 内置的这个调试功能非常实用,可以看看对应的 GitHub 仓库加深了解
要想使用这种方法,需要向前面的代码添加 3 样东西:
- 导入 from tensorflow.python import debug as tf_debug
- 用 tf_debug.TensorBoardDebugWrapsperSession 添加你的 session
- 将 debugger_port 添加到你的 TensorBoard 服务器
现在我们就有了调试整个可视化后模型的选项,没有其它调试工具,而是一张很美观的图。可以选择特定的节点并检查它们,使用“step”和“continue”按钮控制代码的执行,可视化张量和它们的值。
5.使用 TensorFlow 调试工具
最后一种同样强大的方法就是 CLI TensorFlow 调试工具。
这个调试工具重点是使用 tfdbg 的命令行界面,和 tfdbg 的图形用户界面相对。
只需用 tf_debug.LocalCLIDebugWrapperSession(sess) 封装 session,然后执行文件开始调试。
基本上能让你运行和查看模型的执行步骤,并提供评估指标。
那么这里的重要特性就是命令 invoke_stepper,然后按 s 逐步查看每项操作。这是一项很基本的调试功能,不过是 CLI 中。如下所示:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python import debug as tf_debug
# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)
# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
with tf.name_scope("weights_scope"):
W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
with tf.name_scope("bias_scope"):
b = tf.Variable(tf.zeros([10]), name="bias_variable")
# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
with tf.name_scope("yReal_scope"):
y = tf.add(tf.matmul(x, W), b, name="y_calculated")
assert y.get_shape().as_list() == [None, 10]
# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
labels=y_, logits=y, name="lossFunction"
),
name="loss",
)
with tf.name_scope("training_scope"):
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
loss, name="gradDescent"
)
# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()
# ------ Set Session or InteractiveSession
sess = tf.Session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.run(init)
# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
print("============================================")
print(
f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
sess.close()
# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --debugger_port 8080 --host=127.0.0.1
结语
如上所示,调试一款 TensorFlow 应用有很多方式。每张方法都有它自己 IDE 优缺点。
关于 TensorFlow 调试提一些建议:
- 恰当地命名张量
- 记录日志
- 正确地使用异常
- 有序组织你的模块和代码
参考资料:
https://medium.freecodecamp.org/debugging-tensorflow-a-starter-e6668ce72617