初步理解 TensorFlow 的 operation

2023年8月22日 231次阅读来源: lonlon ago

上文提到的 TensorFlow 的节点，或者说 OP（operation）怎么理解？

我们先来看看官方提纲挈领的说明：

Represents a graph node that performs computation on tensors.
An
Operation is a node in a TensorFlow
Graph that takes zero or more
Tensor objects as input, and produces zero or more
Tensor objects as output. Objects of type
Operation are created by calling a Python op constructor (such as
tf.matmul) or
tf.Graph.create_op.
For example
c = tf.matmul(a, b) creates an
Operation of type “MatMul” that takes tensors
aand
b as input, and produces
c as output.
After the graph has been launched in a session, an
Operation can be executed by passing it to
tf.Session.run.
op.run() is a shortcut for calling
tf.get_default_session().run(op).

翻译过来就是，OP 是一个计算图的节点，用来运行在 tensor 上执行的计算操作，它接受零或多个 tensor 作为输入，产生零或多个 Tensor 作为输出，简单的使用任意一个关联到 OP 创建器的操作都会成功的创建 OP 的对象实例，比如说使用：c = tf.matmul(a, b)，就创建了一个类型为MatMul的 OP，它的输入是 a,b，输出是 c

计算图在 session 里面启动后，OP 就能通过传入 session.run 得到执行

通过以下代码，可以对 OP 有一个较好的认识：

a = tf.constant(30.0,name='a1') 
g = tf.Graph()
with g.as_default():
    c = tf.constant(30.0,name='c1')
    d = tf.constant(30.0)
    e = c*d
    f = tf.divide(c,d, name='divi')


print a.graph is g
# False

a 并不在默认的 Graph 里面, 这里好像和 TensorFlow 的机制有点矛盾了，不是说创建的Tensor 会自动放入默认的计算图吗？

经过查阅资料和测试发现，TensorFlow 默认的计算图是 global default graph 全局计算图，而我们这里使用的 g 只是相当于另外定义了一个计算图，所有在这个上下文管理器内部的操作都放在了这个计算图内：

print a.graph is g
False

print a.graph is tf.get_default_graph()
True

print g is tf.get_default_graph()
False

tf.get_default_graph()
c2 = tf.constant(4.0, name='c_2')
print c2.graph is tf.get_default_graph()
True

print c.graph is tf.get_default_graph()
False

可以看到，g.as_default 操作并不是把该计算图设置为默认的计算图了，这是之前理解的一个误区，它重新定义了一个计算图

print a.graph
# <tensorflow.python.framework.ops.Graph object at 0x11c8b9f50>


print g.get_operations()
#[<tf.Operation 'c1' type=Const>, <tf.Operation 'Const' type=Const>, 
#<tf.Op#eration 'mul' type=Mul>, <tf.Operation 'divi' type=RealDiv>]

可以看到，所有的常量和计算的操作都是一个 OP，而且常量 a 也不在 g 的 OP 里面；我们接下来看看一个变量的 OP：

a_1 = tf.Variable(40.0, dtype=tf.float32, name='a1')

#<tf.Operation 'a1/initial_value' type=Const>, 
#<tf.Operation 'a1' type=VariableV2>, 
#<tf.Operation 'a1/Assign' type=Assign>, 
#<tf.Operation 'a1/read' type=Identity>

可以看到，一个变量却包含了4个OP，包括了它的初始值，变量类型，赋值，读取四个 OP

print g.get_operation_by_name('divi')
#name: "divi"
#op: "RealDiv"
#input: "c1"
#input: "Const"
#attr {
# key: "T"
# value {
# type: DT_FLOAT
# }
#}

OP 的获取只能通过元素的名称，而不是变量的名称

print a.op
# name: "a1_1"
# op: "Const"
# attr {
# key: "dtype"
# value {
# type: DT_FLOAT
# }
# }
# attr {
# key: "value"
# value {
# tensor {
# dtype: DT_FLOAT
# tensor_shape {
# }
# float_val: 30.0
# }
# }
# }

它的 OP 信息包含了它的名称、操作类型、属性；一个常量具有二个属性，一个是它的数据类型，一个是它的值

opf = f.op
print opf.node_def
# name: "divi"
# op: "RealDiv"
# input: "c1"
# input: "Const"
# attr {
# key: "T"
# value {
# type: DT_FLOAT
# }
# }

除法 OP 还包含了它的输入

print opf.inputs
# <tensorflow.python.framework.ops._InputList object at 0x104107d50>

输入的信息显示的是地址

print opf.outputs
# [<tf.Tensor 'divi:0' shape=() dtype=float32>]
print opf.values() # 返回和上面同样的结果

输出的信息显示的是输出数据名称，大小，类型

print opf.get_attr('T')
# <dtype: 'float32'>

print a.op.get_attr('value')
# dtype: DT_FLOAT
#tensor_shape {
#}
#float_val: 30.0

返回该 OP 的某个属性的信息

进一步从 TensorFlow 的总体框架层面去理解 OP，还可以看到OP 的执行是在 kernel 层实现的：

Worker Service派发
OP到本地设备，执行
Kernel的特定实现。它将尽最大可能地利用多
CPU/GPU的处理能力，并发地执行
Kernel实现。
TensorFlow的运行时包含
200多个标准的
OP，包括数值计算，多维数组操作，控制流，状态管理等。每一个
OP根据设备类型都会存在一个优化了的
Kernel实现。在运行时，运行时根据本地设备的类型，为
OP选择特定的
Kernel实现，完成该
OP的计算。
其中，大多数
Kernel基于
Eigen::Tensor实现。
Eigen::Tensor是一个使用
C++模板技术，为多核
CPU/GPU生成高效的并发代码。但是，
TensorFlow也可以灵活地直接使用
cuDNN实现更高效的
Kernel。
此外，
TensorFlow实现了矢量化技术，使得在移动设备，及其满足高吞吐量，以数据为中心的应用需求，实现更高效的推理。
如果对于复合
OP的子计算过程很难表示，或执行效率低下，
TensorFlow甚至支持更高效的
Kernle实现的注册，其扩展性表现相当优越。

总结下：

所有的常量、变量和计算的操作都是 OP
变量包含的 OP 更加复杂
可以通过 graph.get_operation_by_name，或者 x.op 的方式获得该 OP 的详细信息
OP 可以简单理解为一个个小份的特定任务，通过并发地执行Kernel实现

    原文作者：lonlon ago
    原文地址: https://zhuanlan.zhihu.com/p/32399032
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。