TF Boys必看:TensorFlow调试入门指南

代码调试是一项非常繁琐、很有挑战性的任务。然而,你还是得熟悉自己写的代码,找出问题。通常,会有一些代码调试指南,而且不少语言和框架的调试过程都有不错的文档记录。

不过,对于 TensorFlow 来说,它的工作方式却给我们调试代码带来了一些新的麻烦。

TensorFlow 的官方文档写道:

一个 TensorFlow Core 程序包含两个分离部分:

  • 创建计算图(.tf Graph)
  • 运行计算图(使用tf.Session)

《TF Boys必看:TensorFlow调试入门指南》

实际计算使用 session.run() 完成,意味着我们需要找到一种方法来检查函数中的值。

参考代码

作为参考,我们使用该 GitHub 仓库中的对应代码

我们会用一个基本的神经网络来分类 MNIST 数据集中的手写数字,使用:

  • tf.nn.softmax_cross_entropy_with_logits_v2 作为 TF 分类操作,用于定义损失
  • tf.train.GradientDescentOptimizer 用于将损失最小化

运行这个小型神经网络后,显示它的准确度达到了 92% 左右。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")

# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
y = tf.matmul(x, W) + b

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.Session()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

test_accuracy = sess.run(
    accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}
)
print("Test Accuracy: {}%".format(test_accuracy * 100.0))

sess.close()

调试过程

现在我们开始调试代码,基本上有 5 种方法达到目的。

  1. Session.run 内获取和打印值

这可能是你获取信息最快最容易的一种方法。

  • 既简单又快捷
  • 不管从哪里都能获取任何求值

本质上,你是在 print 语句中运行 session,向其输入字典,像这样:

print( f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}" )

如果代码更为复杂些,可以应用 session 的 partial_run 执行。但由于这是一种实验特性,这里不再进一步实现展示给大家看了。

另外,不要忘了特别用于评估张量的 .eval() 方法。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")


# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

print("=====================================")
print(
    f"The bias parameter is: {sess.run(b, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)
print(
    f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}"
)

sess.close()

查看此处完整代码

2.使用 tf.Print 操作

在运行时求值时,tf.Print 方法用起来非常方便,因为这时我们不想用 session.run() 显式地取用代码。这是一个 identity 操作,在求值时会打印出数据。

  • 它能让我们查看求值期间的值的变化
  • 它对配置的要求有限,所以能很容易地 clog 终端

谷歌云 AI 团队成员 Yufeng Guo 写过一篇很不错的文章,讲解了如何使用 tf.Print 语句。他指出:

你实际上使用返回的节点是非常重要的,因为如果你没这么做,会很不稳定。

在我们这段代码中,添加了一个 print 语句获取 session 内的值,用以展示执行中这两种方法分不同之处。

通过运行时求值,我们还能使用 tf.Assert 进行运行时声明。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
b = tf.Variable(tf.zeros([10]), name="bias_variable")


# Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
assert x.get_shape().as_list() == [None, 784]
assert y_.get_shape().as_list() == [None, 10]
assert W.get_shape().as_list() == [784, 10]
assert b.get_shape().as_list() == [10]
y = tf.add(tf.matmul(x, W), b)

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_, logits=y, name="lossFunction"
    ),
    name="loss",
)
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss, name="gradDescent")

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    if i % 20 == 0:
        loss = tf.Print(loss, [loss], message="loss")
        loss.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
        print(
            f"Loss of the model is: {sess.run(loss, feed_dict={x: mnist.test.images, y_: mnist.test.labels})}%"
        )


# Evaluate the accuracy of the model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

查看此处完整代码

3.使用 TensorBoard 可视化进行监督

在深入了解这种调试方法之前,注意 TensorBoard 和 TensorBoard 调试工具这两样东西。

TensorFlow 官网上有篇很棒的教程,讲了怎么实现它和使用 TensorBoard

用法的关键就是数据的序列化。TensorFlow 提供总结性的操作,能让你导出模型的压缩后信息,它们就像锚点一样告诉可视化面板绘制什么样的图。

a)使用恰当的名称和名称作用域清理计算图

首先我们需要使用 TensorFlow 提供的所有作用域方法将全部变量和运算组织起来。

with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

b)添加 tf.summaries

例如:

with tf.name_scope("weights_scope"):
    W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
tf.summary.histogram("weight_histogram", W)

c)添加一个 tf.summary.FileWriter 创建日志文件

Tips:一定要为每个日志创建一个子文件夹,以避免计算图挤在一起

d)从你的终端启动 TensorBoard 服务器

例如:

tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1

导航至 TensorBoard 服务器(这里是http://127.0.0.1:8090)显示如下内容:

《TF Boys必看:TensorFlow调试入门指南》
《TF Boys必看:TensorFlow调试入门指南》

现在 TensorBoard 的强大之处就很明显了。它能让你很容易地发现机器学习模型中存在的错误。我们这里的代码示例比较简单,假如是个有很多网络层和参数的复杂模型,如果没有这个功能,调试模型会有多麻烦。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from datetime import datetime

# Create a subfolder for each log
subFolder = datetime.now().strftime("%Y%m%d-%H%M%S")
logdir = f"./tfb_logs/{subFolder}/"

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
    y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")
    tf.summary.image("image_input", tf.reshape(x, [-1, 28, 28, 1]), 3)

    with tf.name_scope("weights_scope"):
        W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")
        tf.summary.histogram("weight_histogram", W)

    with tf.name_scope("bias_scope"):
        b = tf.Variable(tf.zeros([10]), name="bias_variable")
        tf.summary.histogram("bias_histogram", b)

    # Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
    assert x.get_shape().as_list() == [None, 784]
    assert y_.get_shape().as_list() == [None, 10]
    assert W.get_shape().as_list() == [784, 10]
    assert b.get_shape().as_list() == [10]

    with tf.name_scope("yReal_scope"):
        y = tf.add(tf.matmul(x, W), b, name="y_calculated")
        tf.summary.histogram("yReal_histogram", y)

    assert y.get_shape().as_list() == [None, 10]

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(
            labels=y_, logits=y, name="lossFunction"
        ),
        name="loss",
    )

with tf.name_scope("training_scope"):
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
        loss, name="gradDescent"
    )
    tf.summary.histogram("loss_histogram", loss)
    tf.summary.scalar("loss_scalar", loss)


# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")
    tf.summary.histogram("accurace_scalar", accuracy)
    tf.summary.scalar("accurace_scalar", accuracy)

# Initialize all variables

# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.InteractiveSession()
sess.run(init)

# TensorBoard - Write the default graph out so we can view it's structure
merged_summary_op = tf.summary.merge_all()
tbWriter = tf.summary.FileWriter(logdir)
tbWriter.add_graph(sess.graph)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    if i % 5 == 0:
        summary = sess.run(merged_summary_op, feed_dict={x: batch_xs, y_: batch_ys})
        tbWriter.add_summary(summary, i)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --host=127.0.0.1

查看这里的完整代码

4.使用 TensorBoard 调试工具

TensorFlow 内置的这个调试功能非常实用,可以看看对应的 GitHub 仓库加深了解

要想使用这种方法,需要向前面的代码添加 3 样东西:

  • 导入 from tensorflow.python import debug as tf_debug
  • 用 tf_debug.TensorBoardDebugWrapsperSession 添加你的 session
  • 将 debugger_port 添加到你的 TensorBoard 服务器

现在我们就有了调试整个可视化后模型的选项,没有其它调试工具,而是一张很美观的图。可以选择特定的节点并检查它们,使用“step”和“continue”按钮控制代码的执行,可视化张量和它们的值。

《TF Boys必看:TensorFlow调试入门指南》
《TF Boys必看:TensorFlow调试入门指南》

5.使用 TensorFlow 调试工具

最后一种同样强大的方法就是 CLI TensorFlow 调试工具

这个调试工具重点是使用 tfdbg 的命令行界面,和 tfdbg 的图形用户界面相对。

只需用 tf_debug.LocalCLIDebugWrapperSession(sess) 封装 session,然后执行文件开始调试。

基本上能让你运行和查看模型的执行步骤,并提供评估指标。

那么这里的重要特性就是命令 invoke_stepper,然后按 s 逐步查看每项操作。这是一项很基本的调试功能,不过是 CLI 中。如下所示:

《TF Boys必看:TensorFlow调试入门指南》
《TF Boys必看:TensorFlow调试入门指南》

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python import debug as tf_debug

# Only log errors (to prevent unnecessary cluttering of the console)
tf.logging.set_verbosity(tf.logging.ERROR)

# We use the TF helper function to pull down the data from the MNIST site
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# x is the placeholder for the 28 x 28 image data (the input)
# y_ is a 10 element vector, containing the predicted probability of each digit (0-9) class
# Define the weights and balances (always keep the dimensions in mind)
with tf.name_scope("variables_scope"):
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x_placeholder")
    y_ = tf.placeholder(tf.float32, shape=[None, 10], name="y_placeholder")

    with tf.name_scope("weights_scope"):
        W = tf.Variable(tf.zeros([784, 10]), name="weights_variable")

    with tf.name_scope("bias_scope"):
        b = tf.Variable(tf.zeros([10]), name="bias_variable")

    # Define the activation function = the real y. Do not use softmax here, as it will be applied in the next step
    assert x.get_shape().as_list() == [None, 784]
    assert y_.get_shape().as_list() == [None, 10]
    assert W.get_shape().as_list() == [784, 10]
    assert b.get_shape().as_list() == [10]

    with tf.name_scope("yReal_scope"):
        y = tf.add(tf.matmul(x, W), b, name="y_calculated")

    assert y.get_shape().as_list() == [None, 10]

# Loss is defined as cross entropy between the prediction and the real value
# Each training step in gradient descent we want to minimize the loss
with tf.name_scope("loss_scope"):
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(
            labels=y_, logits=y, name="lossFunction"
        ),
        name="loss",
    )

with tf.name_scope("training_scope"):
    train_step = tf.train.GradientDescentOptimizer(0.5).minimize(
        loss, name="gradDescent"
    )

# Evaluate the accuracy of the model
with tf.name_scope("accuracy_scope"):
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="accuracy")

# Initialize all variables
# Perform the initialization which is only the initialization of all global variables
init = tf.global_variables_initializer()

# ------ Set Session or InteractiveSession
sess = tf.Session()
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.run(init)

# Perform 1000 training steps
# Feed the next batch and run the training
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

print("============================================")
print(
    f"Accuracy of the model is: {sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})*100}%"
)

sess.close()

# Use this in the terminal to start the tensorboard server
# tensorboard --logdir=./tfb_logs/ --port=8090 --debugger_port 8080 --host=127.0.0.1

查看这里的完整代码

结语

如上所示,调试一款 TensorFlow 应用有很多方式。每张方法都有它自己 IDE 优缺点。

关于 TensorFlow 调试提一些建议:

  • 恰当地命名张量
  • 记录日志
  • 正确地使用异常
  • 有序组织你的模块和代码

参考资料:

https://medium.freecodecamp.org/debugging-tensorflow-a-starter-e6668ce72617

课程 | 景略集智jizhi.ai

    原文作者:景略集智
    原文地址: https://zhuanlan.zhihu.com/p/51714932
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞