PyTorch的cpp代码生成

背景

PyTorch中有些C++代码是在编译PyTorch的过程中才创建出来的。这个创建过程是由python脚本完成的。那么为什么会有这个过程呢?主要有2种原因:1,可以复用很多代码的逻辑;2,根据配置渲染模板。

其实这种动态生成cpp代码的方式是比较落后的,在C++11、14、17的年代,这样的生成方式大部分可以由C++ template代替。Gemfield可能会择机改写目前这样的方式。

ONNX proto的生成

这是三方库onnx中的:

gemfield@ThinkPad-X1C:~/github/pytorch$ python3 \
    /home/gemfield/github/pytorch/third_party/onnx/onnx/gen_proto.py \
    -p onnx_torch \
    -o /home/gemfield/github/pytorch/build/third_party/onnx/onnx \
    onnx
Processing /home/gemfield/github/pytorch/third_party/onnx/onnx/onnx.in.proto
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch.proto
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch.proto3
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx.pb.h
generating /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx_pb.py


gemfield@ThinkPad-X1C:~/github/pytorch$ python3 \
    /home/gemfield/github/pytorch/third_party/onnx/onnx/gen_proto.py \
    -p onnx_torch \
    -o /home/gemfield/github/pytorch/build/third_party/onnx/onnx \
    onnx-operators
Processing /home/gemfield/github/pytorch/third_party/onnx/onnx/onnx-operators.in.proto
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch.proto
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch.proto3
Writing /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx-operators.pb.h
generating /home/gemfield/github/pytorch/build/third_party/onnx/onnx/onnx_operators_pb.py

PyTorch ATen代码的动态生成

这部分已经在专栏文章 Gemfield:PyTorch ATen代码的动态生成 中详细描述过了。简单来说,就是:

gemfield@ThinkPad-X1C:~/github/pytorch$ python3 \
/home/gemfield/github/pytorch/aten/src/ATen/gen.py \
--source-path /home/gemfield/github/pytorch/aten/src/ATen \
--install_dir /home/gemfield/github/pytorch/build/aten/src/ATen \
/home/gemfield/github/pytorch/aten/src/ATen/Declarations.cwrap \
/home/gemfield/github/pytorch/aten/src/THNN/generic/THNN.h \
/home/gemfield/github/pytorch/aten/src/THCUNN/generic/THCUNN.h \
/home/gemfield/github/pytorch/aten/src/ATen/nn.yaml \
/home/gemfield/github/pytorch/aten/src/ATen/native/native_functions.yaml

这个命令会生成以下文件:

gemfield@ThinkPad-X1C:~/github/pytorch$ ls -l /home/gemfield/github/pytorch/build/aten/src/ATen
总用量 9728
drwxrwxr-x 2 gemfield gemfield    4096 3月  16 09:13 core_tmp
-rw-rw-r-- 1 gemfield gemfield    8476 3月  16 09:13 CPUBoolType.cpp
-rw-rw-r-- 1 gemfield gemfield    2266 3月  16 09:13 CPUBoolType.h
......
-rw-rw-r-- 1 gemfield gemfield    1657 3月  16 09:13 CUDABoolType.cpp
-rw-rw-r-- 1 gemfield gemfield     827 3月  16 09:13 CUDABoolType.h
......
-rw-rw-r-- 1 gemfield gemfield  188888 3月  16 09:13 CUDAShortType.cpp
-rw-rw-r-- 1 gemfield gemfield   49331 3月  16 09:13 CUDAShortType.h
-rw-rw-r-- 1 gemfield gemfield 1409757 3月  16 09:13 Declarations.yaml
-rw-rw-r-- 1 gemfield gemfield     596 3月  16 09:13 ExtensionBackendRegistration.h
-rw-rw-r-- 1 gemfield gemfield  488123 3月  16 09:13 Functions.h
-rw-rw-r-- 1 gemfield gemfield     199 3月  16 09:13 LegacyTHCPUBoolDispatcher.cpp
-rw-rw-r-- 1 gemfield gemfield     239 3月  16 09:13 LegacyTHCPUBoolDispatcher.h
......
-rw-rw-r-- 1 gemfield gemfield     202 3月  16 09:13 LegacyTHCPUShortDispatcher.cpp
-rw-rw-r-- 1 gemfield gemfield     241 3月  16 09:13 LegacyTHCPUShortDispatcher.h
-rw-rw-r-- 1 gemfield gemfield     203 3月  16 09:13 LegacyTHCUDABoolDispatcher.cpp
-rw-rw-r-- 1 gemfield gemfield     241 3月  16 09:13 LegacyTHCUDABoolDispatcher.h
......
-rw-rw-r-- 1 gemfield gemfield     206 3月  16 09:13 LegacyTHCUDAShortDispatcher.cpp
-rw-rw-r-- 1 gemfield gemfield     243 3月  16 09:13 LegacyTHCUDAShortDispatcher.h
-rw-rw-r-- 1 gemfield gemfield     140 3月  16 09:13 LegacyTHDispatcher.cpp
-rw-rw-r-- 1 gemfield gemfield     333 3月  16 09:13 LegacyTHDispatcher.h
-rw-rw-r-- 1 gemfield gemfield     849 3月  16 09:13 LegacyTHFunctions.h
-rw-rw-r-- 1 gemfield gemfield     435 3月  16 09:13 MSNPUBoolType.cpp
-rw-rw-r-- 1 gemfield gemfield     354 3月  16 09:13 MSNPUBoolType.h
......
-rw-rw-r-- 1 gemfield gemfield  565480 3月  16 09:13 MSNPUType.cpp
-rw-rw-r-- 1 gemfield gemfield  191815 3月  16 09:13 MSNPUType.h
-rw-rw-r-- 1 gemfield gemfield  147261 3月  16 09:13 NativeFunctions.h
-rw-rw-r-- 1 gemfield gemfield    4314 3月  16 09:13 RegisterCPU.cpp
-rw-rw-r-- 1 gemfield gemfield     147 3月  16 09:13 RegisterCPU.h
-rw-rw-r-- 1 gemfield gemfield    2266 3月  16 09:13 RegisterCUDA.cpp
-rw-rw-r-- 1 gemfield gemfield     148 3月  16 09:13 RegisterCUDA.h
-rw-rw-r-- 1 gemfield gemfield    9238 3月  16 09:13 SparseCPUBoolType.cpp
-rw-rw-r-- 1 gemfield gemfield    3784 3月  16 09:13 SparseCPUBoolType.h
......
-rw-rw-r-- 1 gemfield gemfield    9286 3月  16 09:13 SparseCPUShortType.cpp
-rw-rw-r-- 1 gemfield gemfield    3786 3月  16 09:13 SparseCPUShortType.h
-rw-rw-r-- 1 gemfield gemfield    9437 3月  16 09:13 SparseCUDABoolType.cpp
-rw-rw-r-- 1 gemfield gemfield    3928 3月  16 09:13 SparseCUDABoolType.h
......
-rw-rw-r-- 1 gemfield gemfield  407809 3月  16 09:13 TypeDefault.cpp
-rw-rw-r-- 1 gemfield gemfield  199659 3月  16 09:13 TypeDefault.h
-rw-rw-r-- 1 gemfield gemfield  170316 3月  16 09:13 TypeExtendedInterface.h
......
-rw-rw-r-- 1 gemfield gemfield     348 3月  16 09:13 XLAShortType.h
-rw-rw-r-- 1 gemfield gemfield  558946 3月  16 09:13 XLAType.cpp
-rw-rw-r-- 1 gemfield gemfield  191809 3月  16 09:13 XLAType.h

Caffe2 OP的转发逻辑

这部分会生成pytorch/build/caffe2/contrib/aten/aten_op.h文件,这里面定义了ATenOp这个class,用来将caffe2的operator来mapping到ATen上,这也是caffe2和pytorch共享代码的一个层面。生成cpp代码的命令如下:

gemfield@ThinkPad-X1C:~/github/pytorch$ python3 \
    /home/gemfield/github/pytorch/caffe2/contrib/aten/gen_op.py \
    --aten_root=/home/gemfield/github/pytorch/aten \
    --template_dir=/home/gemfield/github/pytorch/caffe2/contrib/aten \
    --yaml_dir=/home/gemfield/github/pytorch/build/aten/src/ATen \
    --install_dir=/home/gemfield/github/pytorch/build/caffe2/contrib/aten
Skipping _th_multinomial Because of Arg: Generator * (Generator*) 
Skipping _th_normal Because of Arg: Generator * (Generator*) 
Skipping _th_normal Because of Arg: Generator * (Generator*) 
Skipping _th_normal Because of Arg: Generator * (Generator*) 
......

PyTorch Autograd代码的动态生成

这部分已经在专栏文章 Gemfield:PyTorch Autograd代码的动态生成 中详细描述过了。简单来说,就是:

# 依赖 tools/shared/cwrap_common.py,tools/shared/_utils_internal.py (by civilnet)

gemfield@ThinkPad-X1C:~/github/pytorch$ python3 tools/setup_helpers/generate_code.py \
        --declarations-path \
        /home/gemfield/github/pytorch/build/aten/src/ATen/Declarations.yaml \
        --nn-path aten/src/
Writing torch/csrc/nn/THNN.cpp
Writing torch/csrc/nn/THCUNN.cpp
WARNING: derivative ignored for _indices
WARNING: derivative ignored for _values
WARNING: derivative ignored for indices
Writing torch/csrc/autograd/generated/VariableType.h
WARNING: derivative ignored for _indices
WARNING: derivative ignored for indices
Writing torch/csrc/autograd/generated/VariableType_0.cpp
WARNING: derivative ignored for _values
Writing torch/csrc/autograd/generated/VariableType_1.cpp
Writing torch/csrc/autograd/generated/VariableType_2.cpp
Writing torch/csrc/autograd/generated/VariableType_3.cpp
Writing torch/csrc/autograd/generated/VariableType_4.cpp
WARNING: derivative ignored for _indices
WARNING: derivative ignored for _values
WARNING: derivative ignored for indices
Writing torch/csrc/autograd/generated/VariableTypeEverything.cpp
Writing torch/csrc/autograd/generated/Functions.h
Writing torch/csrc/autograd/generated/Functions.cpp
Writing torch/csrc/autograd/generated/python_functions.h
Writing torch/csrc/autograd/generated/python_functions.cpp
Writing torch/csrc/autograd/generated/python_variable_methods.cpp
Writing torch/csrc/autograd/generated/python_variable_methods_dispatch.h
Writing torch/csrc/autograd/generated/python_torch_functions.cpp
Writing torch/csrc/autograd/generated/python_torch_functions_dispatch.h
Writing torch/csrc/autograd/generated/python_nn_functions.cpp
Writing torch/csrc/autograd/generated/python_nn_functions.h
Writing torch/csrc/autograd/generated/python_nn_functions_dispatch.h
Writing torch/csrc/autograd/generated/variable_factories.h
Writing torch/csrc/jit/generated/register_aten_ops_0.cpp
Writing torch/csrc/jit/generated/register_aten_ops_1.cpp
Writing torch/csrc/jit/generated/register_aten_ops_2.cpp

Python Interface生成

用来生成python3的interface,或者stub,可以用来做类型检查。简单来说,就是:

gemfield@ThinkPad-X1C:~/github/pytorch$ python3 -mtools.pyi.gen_pyi --declarations-path \
    /home/gemfield/github/pytorch/build/aten/src/ATen/Declarations.yaml
writing ./torch/__init__.pyi

把OP从Legacy的TH移植到ATen

目前(2019年),这个移植工作还没有结束。也就是还有少数的op残留在TH下。Gemfield简单介绍下如果要移植一个op,需要做什么工作。本文以AdaptiveMaxPooling2d 操作符的移植为例,这个op以前是定义在TH下的,现在则已经移植到了ATen native中了。首先要明白移植会涉及到哪些文件,这些文件是做什么的:

#删除
aten/src/THCUNN/SpatialAdaptiveMaxPooling.cu
aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu
aten/src/THNN/generic/SpatialAdaptiveMaxPooling.c

#新增
aten/src/ATen/native/AdaptiveMaxPooling2d.cpp
aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu

#yaml配置文件
aten/src/ATen/native/native_functions.yaml
aten/src/ATen/nn.yaml

#更改CMake文件
ten/src/THCUNN/CMakeLists.txt

#更改dispatch文件
aten/src/ATen/native/LegacyNNDefinitions.cpp
aten/src/THCUNN/generic/THCUNN.h
aten/src/THNN/generic/THNN.h
aten/src/THNN/init.cpp
torch/nn/_functions/thnn/auto.py

删掉legacy的TH文件

移植AdaptiveMaxPooling2d op需要把旧的TH实现删掉:

1,删除ATen op的CPU实现

aten/src/THNN/generic/SpatialAdaptiveMaxPooling.c

2,删除ATen op的CUDA实现:

aten/src/THCUNN/SpatialAdaptiveMaxPooling.cu

aten/src/THCUNN/generic/SpatialAdaptiveMaxPooling.cu

新增ATen文件

AdaptiveMaxPooling2d op现在要实现在ATen native 下:

1,新增ATen op的CPU实现:

新增aten/src/ATen/native/AdaptiveMaxPooling2d.cpp文件,在此文件中,定义了如下函数,调用栈如下所示:

#forward
adaptive_max_pool2d_out_cpu  / adaptive_max_pool2d_cpu
|
V
adaptive_max_pool2d_out_cpu_template
|
V
adaptive_max_pool2d_out_frame / adaptive_max_pool2d_out_cpu_template
|
V
adaptive_max_pool2d_single_out_frame

#backward
adaptive_max_pool2d_backward_out_cpu / adaptive_max_pool2d_backward_cpu
|
V
adaptive_max_pool2d_backward_out_frame / adaptive_max_pool2d_backward_out_cpu_template
|
V
adaptive_max_pool2d_backward_single_out_frame

可以看到分别实现了AdaptiveMaxPooling2d的前向和反向。

2,新增ATen op的CUDA实现:

新增aten/src/ATen/native/cuda/AdaptiveMaxPooling2d.cu文件,在此文件中,定义了如下函数,调用栈如下所示:

adaptive_max_pool2d_out_cuda / adaptive_max_pool2d_cuda
|
V
adaptive_max_pool2d_out_cuda_template
|
V
adaptivemaxpool

adaptive_max_pool2d_backward_out_cuda / adaptive_max_pool2d_backward_cuda
|
V
adaptive_max_pool2d_backward_out_cuda_template
|
V
atomicadaptivemaxgradinput / adaptivemaxgradinput

可以看到分别实现了AdaptiveMaxPooling2d的前向和反向。

更改yaml配置文件

1,aten/src/ATen/native/native_functions.yaml

ATen的native函数是PyTorch目前主推的operator机制,作为对比,老旧的TH/THC函数(使用cwrap定义)将逐渐被ATen的native替代。ATen的native函数声明在native_functions.yaml文件中,然后实现在ATen/native目录下。移植AdaptiveMaxPooling2d op需要修改这个yaml文件:

-- func: adaptive_max_pool2d(Tensor self, int[2] output_size, *, Tensor(a!) output, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!)) +- func: adaptive_max_pool2d(Tensor self, int[2] output_size, *, Tensor(a!) out, Tensor(b!) indices) -> (Tensor(a!), Tensor(b!))  python_module: nn + dispatch: + CPU: adaptive_max_pool2d_out_cpu + CUDA: adaptive_max_pool2d_out_cuda   # Return: (Tensor output, Tensor indices)  - func: adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)  python_module: nn + dispatch: + CPU: adaptive_max_pool2d_cpu + CUDA: adaptive_max_pool2d_cuda   - func: adaptive_max_pool2d_backward(Tensor grad_output, Tensor self, Tensor indices, *, Tensor(a!) grad_input) -> Tensor(a!)  python_module: nn + dispatch: + CPU: adaptive_max_pool2d_backward_out_cpu + CUDA: adaptive_max_pool2d_backward_out_cuda   - func: adaptive_max_pool2d_backward(Tensor grad_output, Tensor self, Tensor indices) -> Tensor  python_module: nn + dispatch: + CPU: adaptive_max_pool2d_backward_cpu + CUDA: adaptive_max_pool2d_backward_cuda

增加dispatch项,正好将函数的执行逻辑分发到了对应的ATen native的实现上。

2,aten/src/ATen/nn.yaml

该文件使用nn_parse进行解析,得到_thnn_前缀的函数的信息。这些信息会放入top env,在后续步骤中用于替换各template文件中的占位符,这是服务于老旧的TH体系的。移植AdaptiveMaxPooling2d op,将dispatch逻辑从这个nn.yaml中移除:

-- name: _thnn_adaptive_max_pool2d(Tensor self, IntArrayRef[2] output_size)
-  cname: SpatialAdaptiveMaxPooling
-  scalar_check:
-    output: 'false'
-    grad_input: 'false'
-

更改dispatch文件

1,aten/src/ATen/native/LegacyNNDefinitions.cpp

去掉以下函数的定义:

std::tuple<Tensor &,Tensor &> adaptive_max_pool2d_out(Tensor & output, Tensor & indices, const Tensor & self, IntArrayRef output_size) {
  return at::legacy::th::_thnn_adaptive_max_pool2d_forward_out(output, indices, self, output_size);
}

std::tuple<Tensor,Tensor> adaptive_max_pool2d(const Tensor & self, IntArrayRef output_size) {
  return at::legacy::th::_thnn_adaptive_max_pool2d_forward(self, output_size);
}

Tensor & adaptive_max_pool2d_backward_out(Tensor & grad_input, const Tensor & grad_output, const Tensor & self, const Tensor & indices) {
  return at::legacy::th::_thnn_adaptive_max_pool2d_backward_out(grad_input, grad_output, self, indices);
}

Tensor adaptive_max_pool2d_backward(const Tensor & grad_output, const Tensor & self, const Tensor & indices) {
  return at::legacy::th::_thnn_adaptive_max_pool2d_backward(grad_output, self, indices);
}

2,aten/src/THCUNN/generic/THCUNN.h

去掉以下函数的声明:

-THC_API void THNN_(SpatialAdaptiveMaxPooling_updateOutput)(
-                  THCState *state,
-                  THCTensor *input,
-                  THCTensor *output,
-                  THCIndexTensor *indices,
-                  int osizeW,
-                  int osizeH);
-
-THC_API void THNN_(SpatialAdaptiveMaxPooling_updateGradInput)(
-                  THCState *state,
-                  THCTensor *input,
-                  THCTensor *gradOutput,
-                  THCTensor *gradInput,
-                  THCIndexTensor *indices);

3,aten/src/THNN/generic/THNN.h

去掉以下函数的声明:

-TH_API void THNN_(SpatialAdaptiveMaxPooling_updateOutput)(
-          THNNState *state,
-          THTensor *input,
-          THTensor *output,
-          THIndexTensor *indices,
-          int osizeW, int osizeH);
-TH_API void THNN_(SpatialAdaptiveMaxPooling_updateGradInput)(
-          THNNState *state,
-          THTensor *input,
-          THTensor *gradOutput,
-          THTensor *gradInput,
-          THIndexTensor *indices);

4,torch/nn/_functions/thnn/auto.py

从exceptions数组中去掉SpatialAdaptiveMaxPooling:

def _generate_function_classes(scope_dict):
    global function_list, function_by_name
    function_list = parse_header(THNN_H_PATH)
    function_by_name = {fn.name: fn for fn in function_list}
    classes_to_generate = {fn.name.partition('_')[0] for fn in function_list}
    exceptions = {
        'Linear',
        'IndexLinear',
        'SpatialFullConvolution',
        'SpatialConvolutionMM',
        'TemporalConvolution',
        'SpatialAveragePooling',
        'SpatialMaxPooling',
        'SpatialDilatedMaxPooling',
        'SpatialMaxUnpooling',
-       'SpatialAdaptiveMaxPooling',
        'VolumetricAveragePooling',
        'VolumetricMaxPooling',
        'VolumetricMaxUnpooling',
......

总结

以上就是在编译PyTorch的过程中创建出来的cpp代码。这个创建过程是由python脚本完成的,主要有ONNX proto(三方库)、PyTorch ATen代码的动态生成、Caffe2 OP的转发逻辑、PyTorch Autograd代码的动态生成、Python Interface生成。

    原文作者:Gemfield
    原文地址: https://zhuanlan.zhihu.com/p/59425970
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞