TF Boys必看：TensorFlow调试入门指南

2019年7月14日 146次阅读来源: 景略集智

代码调试是一项非常繁琐、很有挑战性的任务。然而，你还是得熟悉自己写的代码，找出问题。通常，会有一些代码调试指南，而且不少语言和框架的调试过程都有不错的文档记录。

不过，对于 TensorFlow 来说，它的工作方式却给我们调试代码带来了一些新的麻烦。

TensorFlow 的官方文档写道：

一个 TensorFlow Core 程序包含两个分离部分：

创建计算图（.tf Graph）
运行计算图（使用tf.Session）

《TF Boys必看：TensorFlow调试入门指南》

实际计算使用 session.run() 完成，意味着我们需要找到一种方法来检查函数中的值。

参考代码

作为参考，我们使用该 GitHub 仓库中的对应代码

我们会用一个基本的神经网络来分类 MNIST 数据集中的手写数字，使用：

tf.nn.softmax_cross_entropy_with_logits_v2 作为 TF 分类操作，用于定义损失
tf.train.GradientDescentOptimizer 用于将损失最小化

运行这个小型神经网络后，显示它的准确度达到了 92% 左右。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")

# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
y = tf.matmul(x, W) + b

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.Session()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

test_accuracy = sess.run(
    accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}
)
print("Test Accuracy: {}%".format(test_accuracy * 100.0))

sess.close()

调试过程

现在我们开始调试代码，基本上有 5 种方法达到目的。

在 Session.run 内获取和打印值

这可能是你获取信息最快最容易的一种方法。

既简单又快捷
不管从哪里都能获取任何求值

本质上，你是在 print 语句中运行 session，向其输入字典，像这样：

print( f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}" )

如果代码更为复杂些，可以应用 session 的 partial_run 执行。但由于这是一种实验特性，这里不再进一步实现展示给大家看了。

另外，不要忘了特别用于评估张量的 .eval() 方法。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")


# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

print("=====================================")
print(
    f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
print(
    f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)

sess.close()

查看此处完整代码

2.使用 tf.Print 操作

在运行时求值时，tf.Print 方法用起来非常方便，因为这时我们不想用 session.run() 显式地取用代码。这是一个 identity 操作，在求值时会打印出数据。

它能让我们查看求值期间的值的变化
它对配置的要求有限，所以能很容易地 clog 终端

谷歌云 AI 团队成员 Yufeng Guo 写过一篇很不错的文章，讲解了如何使用 tf.Print 语句。他指出：

你实际上使用返回的节点是非常重要的，因为如果你没这么做，会很不稳定。

在我们这段代码中，添加了一个 print 语句获取 session 内的值，用以展示执行中这两种方法分不同之处。

通过运行时求值，我们还能使用 tf.Assert 进行运行时声明。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")


# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    if i % 20 == 0:
        loss = tf.Print(loss, [loss], message="loss")
        loss.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
        print(
            f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}%"
        )


# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

查看此处完整代码

3.使用 TensorBoard 可视化进行监督

在深入了解这种调试方法之前，注意 TensorBoard 和 TensorBoard 调试工具这两样东西。

TensorFlow 官网上有篇很棒的教程，讲了怎么实现它和使用 TensorBoard

用法的关键就是数据的序列化。TensorFlow 提供总结性的操作，能让你导出模型的压缩后信息，它们就像锚点一样告诉可视化面板绘制什么样的图。

a)使用恰当的名称和名称作用域清理计算图

首先我们需要使用 TensorFlow 提供的所有作用域方法将全部变量和运算组织起来。

with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

b)添加 tf.summaries

例如：

with tf.name_scope("weights_scope"):
    W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
tf.summary.histogram("weight_histogram", W)

c)添加一个 tf.summary.FileWriter 创建日志文件

Tips：一定要为每个日志创建一个子文件夹，以避免计算图挤在一起

d)从你的终端启动 TensorBoard 服务器

例如：

tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1

导航至 TensorBoard 服务器（这里是http://127.0.0.1:8090）显示如下内容：

《TF Boys必看：TensorFlow调试入门指南》

现在 TensorBoard 的强大之处就很明显了。它能让你很容易地发现机器学习模型中存在的错误。我们这里的代码示例比较简单，假如是个有很多网络层和参数的复杂模型，如果没有这个功能，调试模型会有多麻烦。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from datetime import datetime

# Create a subfolder for each log
subFolder = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = f"./tfb_logs/{subFolder}/"

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
    y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
    tf.summary.image("image_input", tf.reshape(x, [-1, 28, 28, 1]), 3)

    with tf.name_scope("weights_scope"):
        W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
        tf.summary.histogram("weight_histogram", W)

    with tf.name_scope("bias_scope"):
        b = tf.Variable(tf.zeros([10]), name="bias_variable")
        tf.summary.histogram("bias_histogram", b)

    # Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
    assert x.get_shape().as_list() == [None, 784]
    assert y_.get_shape().as_list() == [None, 10]
    assert W.get_shape().as_list() == [784, 10]
    assert b.get_shape().as_list() == [10]

    with tf.name_scope("yReal_scope"):
        y = tf.add(tf.matmul(x, W), b, name="y_calculated")
        tf.summary.histogram("yReal_histogram", y)

    assert y.get_shape().as_list() == [None, 10]

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(
            labels=y_, logits=y, name="lossFunction"
        ),
        name="loss",
    )

with tf.name_scope("training_scope"):
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
        loss, name="gradDescent"
    )
    tf.summary.histogram("loss_histogram", loss)
    tf.summary.scalar("loss_scalar", loss)


# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
    tf.summary.histogram("accurace_scalar", accuracy)
    tf.summary.scalar("accurace_scalar", accuracy)

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# TensorBoard - Write the default graph out so we can view it's structure
merged_summary_op = tf.summary.merge_all()
tbWriter = tf.summary.FileWriter(logdir)
tbWriter.add_graph(sess.graph)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    if i % 5 == 0:
        summary = sess.run(merged_summary_op, feed_dict={x: batch_xs, y_: batch_ys})
        tbWriter.add_summary(summary, i)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1

查看这里的完整代码

4.使用 TensorBoard 调试工具

TensorFlow 内置的这个调试功能非常实用，可以看看对应的 GitHub 仓库加深了解

要想使用这种方法，需要向前面的代码添加 3 样东西：

导入 from tensorflow.python import debug as tf_debug
用 tf_debug.TensorBoardDebugWrapsperSession 添加你的 session
将 debugger_port 添加到你的 TensorBoard 服务器

现在我们就有了调试整个可视化后模型的选项，没有其它调试工具，而是一张很美观的图。可以选择特定的节点并检查它们，使用“step”和“continue”按钮控制代码的执行，可视化张量和它们的值。

《TF Boys必看：TensorFlow调试入门指南》

5.使用 TensorFlow 调试工具

最后一种同样强大的方法就是 CLI TensorFlow 调试工具。

这个调试工具重点是使用 tfdbg 的命令行界面，和 tfdbg 的图形用户界面相对。

只需用 tf_debug.LocalCLIDebugWrapperSession(sess) 封装 session，然后执行文件开始调试。

基本上能让你运行和查看模型的执行步骤，并提供评估指标。

那么这里的重要特性就是命令 invoke_stepper，然后按 s 逐步查看每项操作。这是一项很基本的调试功能，不过是 CLI 中。如下所示：

《TF Boys必看：TensorFlow调试入门指南》

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python import debug as tf_debug

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
    y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

    with tf.name_scope("weights_scope"):
        W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")

    with tf.name_scope("bias_scope"):
        b = tf.Variable(tf.zeros([10]), name="bias_variable")

    # Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
    assert x.get_shape().as_list() == [None, 784]
    assert y_.get_shape().as_list() == [None, 10]
    assert W.get_shape().as_list() == [784, 10]
    assert b.get_shape().as_list() == [10]

    with tf.name_scope("yReal_scope"):
        y = tf.add(tf.matmul(x, W), b, name="y_calculated")

    assert y.get_shape().as_list() == [None, 10]

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(
            labels=y_, logits=y, name="lossFunction"
        ),
        name="loss",
    )

with tf.name_scope("training_scope"):
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
        loss, name="gradDescent"
    )

# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.Session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --debugger_port 8080 --host=127.0.0.1

查看这里的完整代码

结语

如上所示，调试一款 TensorFlow 应用有很多方式。每张方法都有它自己 IDE 优缺点。

关于 TensorFlow 调试提一些建议：

恰当地命名张量
记录日志
正确地使用异常
有序组织你的模块和代码

参考资料：
https://medium.freecodecamp.org/debugging-tensorflow-a-starter-e6668ce72617

课程 | 景略集智 jizhi.ai

    原文作者：景略集智
    原文地址: https://zhuanlan.zhihu.com/p/51714932
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。