部署PyTorch到iOS上

背景

在merge了Gemfield相关的PR后,PyTorch在iOS上的使用也变得直截了当了。Gemfield得承认,“部署PyTorch到iOS上”应该是“部署caffe2到iOS上”,只不过caffe2现在被合并到PyTorch仓库里了,所以这么写。因此在本文中,如果说的是iOS上的PyTorch,那么就等价于iOS上的caffe2。

好了,如果要将PyTorch在GPU Server上训练的模型部署到iPhone手机上,应该怎么做呢?有两种方法,你可以任选一种来实现:

1,将PyTorch的模型转换为ONNX,然后再将ONNX转换为苹果的coreml模型。这个步骤在专栏文章Gemfield:部署PyTorch模型到终端 中已经描述过了,本文不再赘述;

2,将PyTorch的模型转换为ONNX,然后再将ONNX转换为caffe2的pb模型文件。与此同时,需要将PyTorch库编译为iOS版本,将编译好的库载入xcode项目,然后进行后续的工程开发。

本文将介绍第2种实现。不过,这只是模型转换过去了,要完整的将一个App(哪怕是简化的demo)在iPhone手机上运行起来,必须要有一些工程上的封装诸如输入输出、网络的加载初始化、调用forward并取得前向的结果等。要做到这些功能,我们就得额外做以下的工作:

1,使用Xcode,这是iOS开发的IDE;

2,在Xcode上配置PyTorch/caffe2静态库和编译链接参数;

3,额外的工程源代码,有基本的图像的输入输出处理,有模型的加载、前向等。在iOS上,模型的加载、前向等功能的实现需要调用PyTorch/caffe2 library。这就带来了一个问题:编译iOS程序时,项目代码必然要include PyTorch/caffe2的头文件并且要链接PyTorch/caffe2的library。

Gemfield本文将要介绍如何编译PyTorch的iOS库,并整理出需要include的PyTorch头文件

PyTorch iOS库编译的入口

1,首先编译必须在MacOS X操作系统上;

2,安装依赖:

安装依赖需要使用brew命令,MacOS X默认是没有安装的,使用下面的命令来安装brew命令:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

安装automake和libtool

brew install automake libtool

安装cmake

brew install cmake

3,安装Xcode

从你的MacOS X的app store里安装Xcode。从商店安装完Xcode后需要打开Xcode进行初始化设置,在你打开的时候它就会自动设置和安装一些东西。

4,克隆PyTorch仓库

5,编译前的设置

你可以设置verbose来debug编译的参数等信息:

export VERBOSE=1

你必须打开Xcode来设置Command Line Tools(菜单栏 -> Xcode -> Preferences -> Locations -> Command Line Tools),Gemfield设置的是“Xcode 10.1(10B61)”。不然会出现下面的错误:

CMake Error at third_party/ios-cmake/toolchain/iOS.cmake:151 (message):
  No iOS SDK's found in default search path .  Manually set
  CMAKE_IOS_SDK_ROOT or install the iOS SDK.
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.14.0/share/cmake/Modules/CMakeDetermineSystem.cmake:93 (include)
  CMakeLists.txt:11 (project)


CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
-- Configuring incomplete, errors occurred!

这也是如何更改xcodebuild所使用的XCode版本的方法;另外上述设置可以通过xcode-select命令来进行。

6,开始编译

在PyTorch源代码目录下,要编译PyTorch的iOS库,使用命令:

bash scripts/build_ios.sh

首先调用scripts/build_host_protoc.sh脚本编译出host上的protoc compiler,这是因为我们需要将protobuf 源文件编译/转换成c++源文件。这一步完成后,将生成build_host_protoc/bin/protoc可执行程序;第二步就是编译caffe2了,此时CMake检测完成后得到的配置如下:

-- ******** Summary ********
-- General:
--   CMake version         : 3.13.4
--   CMake command         : /Applications/CMake.app/Contents/bin/cmake
--   System                : Darwin
--   C++ compiler          : /usr/bin/g++
--   C++ compiler id       : AppleClang
--   C++ compiler version  : 10.0.0.10001145
--   BLAS                  : Eigen
......
--   TORCH_VERSION         : 1.0.0
--   CAFFE2_VERSION        : 1.0.0
--   BUILD_ATEN_MOBILE     : ON
--   BUILD_ATEN_ONLY       : OFF
--   BUILD_BINARY          : OFF
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_SHARED_LIBS     : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_METAL             : ON
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : OFF
--   USE_DISTRIBUTED       : ON
--   Public Dependencies  : Threads::Threads
--   Private Dependencies : nnpack;cpuinfo;fp16;foxi_loader
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/gemfield/github/pytorch/build_ios

可以看到即将要编译BUILD_ATEN_MOBILE、BUILD_CAFFE2_OPS、USE_METAL、USE_NNPACK等,这些都具有鲜明的移动平台特色。

PyTorch 的iOS编译原理

假设PyTorch根目录在/home/gemfield/pytorch下,要编译PyTorch的iOS库,主要有以下步骤:

1,编译proto文件

参考Gemfield:PyTorch的Android编译,和安卓类似,都是编译宿主机上的proto。

2,编译生成iOS上的静态库和动态库

下面就开始交叉编译PyTorch C++源文件了,整个PyTorch项目总共会编译出以下15个.a静态库、2个.dylib共享库:

libclog.a
libpthreadpool.a
libonnxifi_loader.a
libonnxifi_dummy.dylib
libfoxi_loader.a
libfoxi_dummy.dylib
libprotobuf-lite.a
libnnpack_reference_layers.a
libcpuinfo.a
libcpuinfo_internals.a
libnnpack.a
libc10.a
libprotobuf.a
libcaffe2_protos.a
libonnx_proto.a
libonnx.a
libcaffe2.a

和Android库编译一样,在iOS上每个库的编译一般都会经历3步:

  • 从C++源文件编译出.o文件;
  • 使用ar -qc合并多个.o文件成为一个.a文件;
  • 使用ranlib生成index来加速访问.a库;

从Cpp源文件编译出.o文件的过程:

/usr/bin/g++ -DCPUINFO_SUPPORTED_PLATFORM=1 
-DNNP_CONVOLUTION_ONLY=0 
-DNNP_INFERENCE_ONLY=0 
-DONNX_NAMESPACE=onnx_c2 
-Ixxxx 
-isystem xxxx 
-O2 -fPIC -O3 -DNDEBUG -arch armv7 -arch armv7s -arch arm64 
-isysroot /Applications/Xcode10.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS12.1.sdk   
-mfpu=neon-fp16 
-DCAFFE2_BUILD_MAIN_LIB -O2 -std=gnu++11 
-o xxxx.o 
-c xxxx.cc

合并.o文件为.a文件的过程:

ar qc xxxx.a. xxx1.a xxx2.a ...

生成index来加快.a库中符号的访问:

ranlib xxxx.a

3,libcaffe2.a

PyTorch的iOS编译一共生成了15个静态库,其中最重要的运算都是在libcaffe2.a中。这个库由以下442个编译单元组成:

aten/src/ATen/core/ATenGeneral.cpp.o
aten/src/ATen/core/Formatting.cpp.o
aten/src/ATen/core/LegacyDeviceTypeInit.cpp.o
aten/src/ATen/core/LegacyTypeDispatch.cpp.o
aten/src/ATen/core/Range.cpp.o
aten/src/ATen/core/Tensor.cpp.o
aten/src/ATen/core/VariableHooksInterface.cpp.o
aten/src/ATen/core/blob.cpp.o
aten/src/ATen/core/context_base.cpp.o
aten/src/ATen/core/dispatch/Dispatcher.cpp.o
aten/src/ATen/core/interned_strings.cpp.o
aten/src/ATen/core/ivalue.cpp.o
aten/src/ATen/core/register_symbols.cpp.o
aten/src/ATen/core/type.cpp.o
caffe2/core/allocator.cc.o
caffe2/core/blob_serialization.cc.o
caffe2/core/blob_stats.cc.o
caffe2/core/common.cc.o
caffe2/core/context.cc.o
caffe2/core/context_base.cc.o
caffe2/core/db.cc.o
caffe2/core/event.cc.o
caffe2/core/graph.cc.o
caffe2/core/init.cc.o
caffe2/core/init_denormals.cc.o
caffe2/core/init_intrinsics_check.cc.o
caffe2/core/init_omp.cc.o
caffe2/core/int8_serialization.cc.o
caffe2/core/memonger.cc.o
caffe2/core/module.cc.o
caffe2/core/net.cc.o
caffe2/core/net_async_base.cc.o
caffe2/core/net_async_scheduling.cc.o
caffe2/core/net_async_task.cc.o
caffe2/core/net_async_task_future.cc.o
caffe2/core/net_async_task_graph.cc.o
caffe2/core/net_async_tracing.cc.o
caffe2/core/net_dag_utils.cc.o
caffe2/core/net_parallel.cc.o
caffe2/core/net_simple.cc.o
caffe2/core/net_simple_refcount.cc.o
caffe2/core/numa.cc.o
caffe2/core/operator.cc.o
caffe2/core/operator_c10wrapper.cc.o
caffe2/core/operator_schema.cc.o
caffe2/core/plan_executor.cc.o
caffe2/core/prof_dag_counters.cc.o
caffe2/core/qtensor.cc.o
caffe2/core/qtensor_serialization.cc.o
caffe2/core/stats.cc.o
caffe2/core/tensor.cc.o
caffe2/core/tensor_int8.cc.o
caffe2/core/test_utils.cc.o
caffe2/core/transform.cc.o
caffe2/core/types.cc.o
caffe2/core/workspace.cc.o
caffe2/utils/bench_utils.cc.o
caffe2/utils/cpuid.cc.o
caffe2/utils/math/broadcast.cc.o
caffe2/utils/math/elementwise.cc.o
caffe2/utils/math/reduce.cc.o
caffe2/utils/math/transpose.cc.o
caffe2/utils/math/utils.cc.o
caffe2/utils/math_cpu.cc.o
caffe2/utils/murmur_hash3.cc.o
caffe2/utils/proto_convert.cc.o
caffe2/utils/proto_utils.cc.o
caffe2/utils/proto_wrap.cc.o
caffe2/utils/signal_handler.cc.o
caffe2/utils/smart_tensor_printer.cc.o
caffe2/utils/string_utils.cc.o
caffe2/utils/threadpool/ThreadPool.cc.o
caffe2/utils/threadpool/pthreadpool.cc.o
caffe2/utils/threadpool/pthreadpool_impl.cc.o
caffe2/predictor/predictor.cc.o
caffe2/predictor/predictor_utils.cc.o
caffe2/predictor/predictor_config.cc.o
caffe2/core/nomnigraph/Representations/NeuralNet.cc.o
caffe2/core/nomnigraph/tests/test_util.cc.o
caffe2/__/third_party/miniz-2.0.8/miniz.c.o
caffe2/serialize/inline_container.cc.o
caffe2/serialize/istream_adapter.cc.o
caffe2/serialize/file_adapter.cc.o
caffe2/serialize/read_adapter_interface.cc.o
caffe2/db/create_db_op.cc.o
caffe2/db/protodb.cc.o
caffe2/distributed/file_store_handler.cc.o
caffe2/distributed/file_store_handler_op.cc.o
caffe2/distributed/store_handler.cc.o
caffe2/distributed/store_ops.cc.o
caffe2/mobile/contrib/ios/ios_caffe.cc.o
caffe2/mobile/contrib/ios/ios_caffe_predictor.cc.o
caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm.o
caffe2/mobile/contrib/ios/mpscnn/mpscnn_context.mm.o
caffe2/mobile/contrib/ios/mpscnn/mpscnn_graph.mm.o
caffe2/mobile/contrib/ios/mpscnn/mpscnn_graph_mask.mm.o
caffe2/mobile/contrib/ios/mpscnn/mpscnn_test.mm.o
caffe2/onnx/backend.cc.o
caffe2/onnx/backend_rep.cc.o
caffe2/onnx/device.cc.o
caffe2/onnx/helper.cc.o
caffe2/onnx/onnx_exporter.cc.o
caffe2/onnx/onnxifi_graph_info.cc.o
caffe2/onnx/onnxifi_init.cc.o
caffe2/operators/abs_op.cc.o
caffe2/operators/accumulate_op.cc.o
caffe2/operators/accuracy_op.cc.o
caffe2/operators/acos_op.cc.o
caffe2/operators/adjust_batch_op.cc.o
caffe2/operators/affine_channel_op.cc.o
caffe2/operators/apmeter_op.cc.o
caffe2/operators/arg_ops.cc.o
caffe2/operators/asin_op.cc.o
caffe2/operators/assert_op.cc.o
caffe2/operators/atan_op.cc.o
caffe2/operators/atomic_ops.cc.o
caffe2/operators/batch_box_cox_op.cc.o
caffe2/operators/batch_bucketize_op.cc.o
caffe2/operators/batch_gather_ops.cc.o
caffe2/operators/batch_matmul_op.cc.o
caffe2/operators/batch_moments_op.cc.o
caffe2/operators/batch_sparse_to_dense_op.cc.o
caffe2/operators/bbox_transform_op.cc.o
caffe2/operators/bisect_percentile_op.cc.o
caffe2/operators/boolean_mask_ops.cc.o
caffe2/operators/boolean_unmask_ops.cc.o
caffe2/operators/box_with_nms_limit_op.cc.o
caffe2/operators/byte_weight_dequant_op.cc.o
caffe2/operators/cast_op.cc.o
caffe2/operators/cbrt_op.cc.o
caffe2/operators/ceil_op.cc.o
caffe2/operators/channel_backprop_stats_op.cc.o
caffe2/operators/channel_shuffle_op.cc.o
caffe2/operators/channel_stats_op.cc.o
caffe2/operators/clip_op.cc.o
caffe2/operators/collect_and_distribute_fpn_rpn_proposals_op.cc.o
caffe2/operators/communicator_op.cc.o
caffe2/operators/concat_split_op.cc.o
caffe2/operators/conditional_op.cc.o
caffe2/operators/conv_gradient_op.cc.o
caffe2/operators/conv_op.cc.o
caffe2/operators/conv_op_eigen.cc.o
caffe2/operators/conv_op_shared.cc.o
caffe2/operators/conv_transpose_gradient_op.cc.o
caffe2/operators/conv_transpose_op.cc.o
caffe2/operators/conv_transpose_op_mobile.cc.o
caffe2/operators/copy_op.cc.o
caffe2/operators/cos_op.cc.o
caffe2/operators/cosh_op.cc.o
caffe2/operators/cosine_embedding_criterion_op.cc.o
caffe2/operators/counter_ops.cc.o
caffe2/operators/crash_op.cc.o
caffe2/operators/create_scope_op.cc.o
caffe2/operators/crf_viterbi_op.cc.o
caffe2/operators/cross_entropy_op.cc.o
caffe2/operators/ctc_beam_search_decoder_op.cc.o
caffe2/operators/ctc_greedy_decoder_op.cc.o
caffe2/operators/cube_op.cc.o
caffe2/operators/data_couple.cc.o
caffe2/operators/dataset_ops.cc.o
caffe2/operators/deform_conv_gradient_op.cc.o
caffe2/operators/deform_conv_op.cc.o
caffe2/operators/dense_vector_to_id_list_op.cc.o
caffe2/operators/distance_op.cc.o
caffe2/operators/do_op.cc.o
caffe2/operators/dropout_op.cc.o
caffe2/operators/elementwise_add_gradient_op.cc.o
caffe2/operators/elementwise_add_op.cc.o
caffe2/operators/elementwise_div_gradient_op.cc.o
caffe2/operators/elementwise_div_op.cc.o
caffe2/operators/elementwise_linear_op.cc.o
caffe2/operators/elementwise_logical_ops.cc.o
caffe2/operators/elementwise_mul_gradient_op.cc.o
caffe2/operators/elementwise_mul_op.cc.o
caffe2/operators/elementwise_ops.cc.o
caffe2/operators/elementwise_ops_schema.cc.o
caffe2/operators/elementwise_ops_utils.cc.o
caffe2/operators/elementwise_sub_gradient_op.cc.o
caffe2/operators/elementwise_sub_op.cc.o
caffe2/operators/elementwise_sum_op.cc.o
caffe2/operators/elu_op.cc.o
caffe2/operators/enforce_finite_op.cc.o
caffe2/operators/ensure_clipped_op.cc.o
caffe2/operators/ensure_cpu_output_op.cc.o
caffe2/operators/erf_op.cc.o
caffe2/operators/exp_op.cc.o
caffe2/operators/expand_op.cc.o
caffe2/operators/expand_squeeze_dims_op.cc.o
caffe2/operators/fc_inference.cc.o
caffe2/operators/feature_maps_ops.cc.o
caffe2/operators/feed_blob_op.cc.o
caffe2/operators/filler_op.cc.o
caffe2/operators/find_duplicate_elements_op.cc.o
caffe2/operators/find_op.cc.o
caffe2/operators/flatten_op.cc.o
caffe2/operators/flexible_top_k.cc.o
caffe2/operators/floor_op.cc.o
caffe2/operators/free_op.cc.o
caffe2/operators/fully_connected_op.cc.o
caffe2/operators/fused_rowwise_8bit_conversion_ops.cc.o
caffe2/operators/fused_rowwise_random_quantization_ops.cc.o
caffe2/operators/gather_fused_8bit_rowwise_op.cc.o
caffe2/operators/gather_op.cc.o
caffe2/operators/gather_ranges_to_dense_op.cc.o
caffe2/operators/generate_proposals_op.cc.o
caffe2/operators/given_tensor_byte_string_to_uint8_fill_op.cc.o
caffe2/operators/given_tensor_fill_op.cc.o
caffe2/operators/glu_op.cc.o
caffe2/operators/group_norm_op.cc.o
caffe2/operators/gru_unit_op.cc.o
caffe2/operators/h_softmax_op.cc.o
caffe2/operators/half_float_ops.cc.o
caffe2/operators/hard_sigmoid_op.cc.o
caffe2/operators/heatmap_max_keypoint_op.cc.o
caffe2/operators/if_op.cc.o
caffe2/operators/im2col_op.cc.o
caffe2/operators/index_hash_ops.cc.o
caffe2/operators/index_ops.cc.o
caffe2/operators/inference_lstm_op.cc.o
caffe2/operators/instance_norm_gradient_op.cc.o
caffe2/operators/instance_norm_op.cc.o
caffe2/operators/integral_image_op.cc.o
caffe2/operators/is_empty_op.cc.o
caffe2/operators/jsd_op.cc.o
caffe2/operators/key_split_ops.cc.o
caffe2/operators/last_n_window_collector.cc.o
caffe2/operators/layer_norm_op.cc.o
caffe2/operators/leaky_relu_op.cc.o
caffe2/operators/length_split_op.cc.o
caffe2/operators/lengths_pad_op.cc.o
caffe2/operators/lengths_reducer_fused_8bit_rowwise_ops.cc.o
caffe2/operators/lengths_reducer_ops.cc.o
caffe2/operators/lengths_reducer_rowwise_8bit_ops.cc.o
caffe2/operators/lengths_tile_op.cc.o
caffe2/operators/lengths_top_k_op.cc.o
caffe2/operators/listwise_l2r_op.cc.o
caffe2/operators/load_save_op.cc.o
caffe2/operators/local_response_normalization_op.cc.o
caffe2/operators/locally_connected_op.cc.o
caffe2/operators/locally_connected_op_util.cc.o
caffe2/operators/log_op.cc.o
caffe2/operators/logit_op.cc.o
caffe2/operators/loss_op.cc.o
caffe2/operators/lp_pool_op.cc.o
caffe2/operators/lpnorm_op.cc.o
caffe2/operators/lstm_unit_op.cc.o
caffe2/operators/map_ops.cc.o
caffe2/operators/margin_ranking_criterion_op.cc.o
caffe2/operators/matmul_op.cc.o
caffe2/operators/mean_op.cc.o
caffe2/operators/merge_id_lists_op.cc.o
caffe2/operators/minmax_gradient_ops.cc.o
caffe2/operators/minmax_ops.cc.o
caffe2/operators/mod_op.cc.o
caffe2/operators/moments_op.cc.o
caffe2/operators/multi_class_accuracy_op.cc.o
caffe2/operators/negate_gradient_op.cc.o
caffe2/operators/negative_op.cc.o
caffe2/operators/ngram_ops.cc.o
caffe2/operators/norm_planar_yuv_op.cc.o
caffe2/operators/normalize_l1_op.cc.o
caffe2/operators/normalize_op.cc.o
caffe2/operators/numpy_tile_op.cc.o
caffe2/operators/one_hot_ops.cc.o
caffe2/operators/onnx_while_op.cc.o
caffe2/operators/onnxifi_op.cc.o
caffe2/operators/order_switch_ops.cc.o
caffe2/operators/pack_rnn_sequence_op.cc.o
caffe2/operators/pack_segments.cc.o
caffe2/operators/pad_op.cc.o
caffe2/operators/partition_ops.cc.o
caffe2/operators/percentile_op.cc.o
caffe2/operators/perplexity_op.cc.o
caffe2/operators/piecewise_linear_transform_op.cc.o
caffe2/operators/pool_gradient_op.cc.o
caffe2/operators/pool_op.cc.o
caffe2/operators/pool_op_util.cc.o
caffe2/operators/pow_op.cc.o
caffe2/operators/prelu_op.cc.o
caffe2/operators/prepend_dim_op.cc.o
caffe2/operators/quant_decode_op.cc.o
caffe2/operators/rank_loss_op.cc.o
caffe2/operators/reciprocal_gradient_op.cc.o
caffe2/operators/reciprocal_op.cc.o
caffe2/operators/reduce_front_back_max_ops.cc.o
caffe2/operators/reduce_front_back_mean_ops.cc.o
caffe2/operators/reduce_front_back_sum_ops.cc.o
caffe2/operators/reduce_ops.cc.o
caffe2/operators/reduction_ops.cc.o
caffe2/operators/relu_n_op.cc.o
caffe2/operators/relu_op.cc.o
caffe2/operators/remove_data_blocks_op.cc.o
caffe2/operators/replace_nan_op.cc.o
caffe2/operators/reservoir_sampling.cc.o
caffe2/operators/reshape_op.cc.o
caffe2/operators/resize_op.cc.o
caffe2/operators/reverse_packed_segs_op.cc.o
caffe2/operators/rmac_regions_op.cc.o
caffe2/operators/roi_align_gradient_op.cc.o
caffe2/operators/roi_align_op.cc.o
caffe2/operators/roi_align_rotated_gradient_op.cc.o
caffe2/operators/roi_align_rotated_op.cc.o
caffe2/operators/roi_pool_op.cc.o
caffe2/operators/rowmul_op.cc.o
caffe2/operators/rsqrt_op.cc.o
caffe2/operators/scale_op.cc.o
caffe2/operators/segment_reduction_op.cc.o
caffe2/operators/selu_op.cc.o
caffe2/operators/sequence_ops.cc.o
caffe2/operators/shape_op.cc.o
caffe2/operators/sigmoid_gradient_op.cc.o
caffe2/operators/sigmoid_op.cc.o
caffe2/operators/sin_op.cc.o
caffe2/operators/sinh_op.cc.o
caffe2/operators/sinusoid_position_encoding_op.cc.o
caffe2/operators/slice_op.cc.o
caffe2/operators/softmax_op.cc.o
caffe2/operators/softmax_shared.cc.o
caffe2/operators/softmax_with_loss_op.cc.o
caffe2/operators/softplus_op.cc.o
caffe2/operators/softsign_op.cc.o
caffe2/operators/space_batch_op.cc.o
caffe2/operators/sparse_normalize_op.cc.o
caffe2/operators/sparse_to_dense_mask_op.cc.o
caffe2/operators/sparse_to_dense_op.cc.o
caffe2/operators/spatial_batch_norm_gradient_op.cc.o
caffe2/operators/spatial_batch_norm_op.cc.o
caffe2/operators/spatial_softmax_with_loss_op.cc.o
caffe2/operators/sqr_op.cc.o
caffe2/operators/sqrt_op.cc.o
caffe2/operators/square_root_divide_op.cc.o
caffe2/operators/stats_ops.cc.o
caffe2/operators/stats_put_ops.cc.o
caffe2/operators/stop_gradient.cc.o
caffe2/operators/string_ops.cc.o
caffe2/operators/stump_func_op.cc.o
caffe2/operators/stylizer_ops.cc.o
caffe2/operators/summarize_op.cc.o
caffe2/operators/swish_op.cc.o
caffe2/operators/tan_op.cc.o
caffe2/operators/tanh_gradient_op.cc.o
caffe2/operators/tanh_op.cc.o
caffe2/operators/tensor_protos_db_input.cc.o
caffe2/operators/text_file_reader.cc.o
caffe2/operators/text_file_reader_utils.cc.o
caffe2/operators/thresholded_relu_op.cc.o
caffe2/operators/tile_op.cc.o
caffe2/operators/top_k.cc.o
caffe2/operators/transpose_op.cc.o
caffe2/operators/tt_linear_op.cc.o
caffe2/operators/unique_ops.cc.o
caffe2/operators/upsample_op.cc.o
caffe2/operators/utility_ops.cc.o
caffe2/operators/variable_length_sequence_padding.cc.o
caffe2/operators/weighted_multi_sampling_op.cc.o
caffe2/operators/weighted_sample_op.cc.o
caffe2/operators/while_op.cc.o
caffe2/operators/workspace_ops.cc.o
caffe2/operators/zero_gradient_op.cc.o
caffe2/operators/experimental/c10/cpu/flatten_cpu.cc.o
caffe2/operators/experimental/c10/cpu/averaged_loss_cpu.cc.o
caffe2/operators/experimental/c10/cpu/mul_cpu.cc.o
caffe2/operators/experimental/c10/cpu/relu_cpu.cc.o
caffe2/operators/experimental/c10/cpu/expand_dims_cpu.cc.o
caffe2/operators/experimental/c10/cpu/filler_cpu.cc.o
caffe2/operators/experimental/c10/cpu/sparse_lengths_sum_cpu.cc.o
caffe2/operators/experimental/c10/cpu/sigmoid_cpu.cc.o
caffe2/operators/experimental/c10/cpu/cast_cpu.cc.o
caffe2/operators/experimental/c10/cpu/stop_gradient_cpu.cc.o
caffe2/operators/experimental/c10/cpu/batch_gather_cpu.cc.o
caffe2/operators/experimental/c10/cpu/concat_cpu.cc.o
caffe2/operators/experimental/c10/cpu/batch_matmul_cpu.cc.o
caffe2/operators/experimental/c10/cpu/sigmoid_cross_entropy_with_logits_cpu.cc.o
caffe2/operators/experimental/c10/cpu/fc_cpu.cc.o
caffe2/operators/experimental/c10/cpu/enforce_finite_cpu.cc.o
caffe2/operators/experimental/c10/cpu/add_cpu.cc.o
caffe2/operators/experimental/c10/schemas/sigmoid.cc.o
caffe2/operators/experimental/c10/schemas/filler.cc.o
caffe2/operators/experimental/c10/schemas/expand_dims.cc.o
caffe2/operators/experimental/c10/schemas/mul.cc.o
caffe2/operators/experimental/c10/schemas/relu.cc.o
caffe2/operators/experimental/c10/schemas/stop_gradient.cc.o
caffe2/operators/experimental/c10/schemas/sigmoid_cross_entropy_with_logits.cc.o
caffe2/operators/experimental/c10/schemas/enforce_finite.cc.o
caffe2/operators/experimental/c10/schemas/cast.cc.o
caffe2/operators/experimental/c10/schemas/averaged_loss.cc.o
caffe2/operators/experimental/c10/schemas/batch_matmul.cc.o
caffe2/operators/experimental/c10/schemas/batch_gather.cc.o
caffe2/operators/experimental/c10/schemas/fc.cc.o
caffe2/operators/experimental/c10/schemas/concat.cc.o
caffe2/operators/experimental/c10/schemas/sparse_lengths_sum.cc.o
caffe2/operators/experimental/c10/schemas/add.cc.o
caffe2/operators/experimental/c10/schemas/flatten.cc.o
caffe2/operators/rnn/recurrent_network_blob_fetcher_op.cc.o
caffe2/operators/rnn/recurrent_network_executor.cc.o
caffe2/operators/rnn/recurrent_network_op.cc.o
caffe2/opt/annotations.cc.o
caffe2/opt/backend_cutting.cc.o
caffe2/opt/backend_transformer_base.cc.o
caffe2/opt/bound_shape_inferencer.cc.o
caffe2/opt/converter.cc.o
caffe2/opt/dead_code_elim.cc.o
caffe2/opt/device.cc.o
caffe2/opt/distributed.cc.o
caffe2/opt/distributed_converter.cc.o
caffe2/opt/fusion.cc.o
caffe2/opt/mobile.cc.o
caffe2/opt/onnxifi_transformer.cc.o
caffe2/opt/optimize_ideep.cc.o
caffe2/opt/optimizer.cc.o
caffe2/opt/passes.cc.o
caffe2/opt/shape_info.cc.o
caffe2/perfkernels/adagrad.cc.o
caffe2/perfkernels/embedding_lookup.cc.o
caffe2/perfkernels/fused_8bit_rowwise_embedding_lookup.cc.o
caffe2/perfkernels/math_cpu_base.cc.o
caffe2/perfkernels/typed_axpy.cc.o
caffe2/queue/blobs_queue.cc.o
caffe2/queue/blobs_queue_db.cc.o
caffe2/queue/queue_ops.cc.o
caffe2/queue/rebatching_queue.cc.o
caffe2/queue/rebatching_queue_ops.cc.o
caffe2/sgd/adadelta_op.cc.o
caffe2/sgd/adagrad_op.cc.o
caffe2/sgd/adam_op.cc.o
caffe2/sgd/clip_tensor_op.cc.o
caffe2/sgd/ftrl_op.cc.o
caffe2/sgd/gftrl_op.cc.o
caffe2/sgd/iter_op.cc.o
caffe2/sgd/lars_op.cc.o
caffe2/sgd/learning_rate_adaption_op.cc.o
caffe2/sgd/learning_rate_op.cc.o
caffe2/sgd/momentum_sgd_op.cc.o
caffe2/sgd/rmsprop_op.cc.o
caffe2/sgd/wngrad_op.cc.o
caffe2/sgd/yellowfin_op.cc.o
caffe2/share/contrib/nnpack/conv_op.cc.o
caffe2/share/contrib/depthwise/depthwise3x3_conv_op.cc.o
caffe2/transforms/common_subexpression_elimination.cc.o
caffe2/transforms/conv_to_nnpack_transform.cc.o
caffe2/transforms/pattern_net_transform.cc.o
caffe2/transforms/single_op_transform.cc.o

注意其中来自caffe2/mobile/contrib/ios的编译单元,里面定义了一些iOS特供的算法。

在Xcode中使用这些库

1,将头文件和库文件引入到Xcode项目中;

2,Xcode项目代码中像如下方式调用caffe2的API

#include <caffe2/predictor/predictor.h> #include <caffe2/core/operator.h> #include <caffe2/core/timer.h> #include <caffe2/core/flags.h> #include <caffe2/core/init.h> #include <caffe2/utils/proto_utils.h> 
static caffe2::NetDef _initNet, _predictNet;
static caffe2::Predictor *_predictor;

int argc = 0;
char** argv;
caffe2::GlobalInit(&argc, &argv);

const char *gemfield_init_net_path = [[NSBundle mainBundle] pathForResource:@"gemfield_init_net.pb" ofType:nil].UTF8String;
const char *gemfield_predict_net_path = [[NSBundle mainBundle] pathForResource:@"gemfield_predict_net.pb" ofType:nil].UTF8String;

CAFFE_ENFORCE(ReadProtoFromFile(gemfield_init_net_path, &_initNet));
CAFFE_ENFORCE(ReadProtoFromFile(gemfield_predict_net_path, &_predictNet));

_predictor = new caffe2::Predictor(_initNet, _predictNet);


caffe2::TensorCPU input = caffe2::Tensor(1, caffe2::DeviceType::CPU);
input.Resize(std::vector<int>({1, IMG_C, IMG_W, IMG_H}));
memcpy(input.mutable_data<float>(), input_data, IMG_H * IMG_W * IMG_C * sizeof(float));
    
caffe2::Predictor::TensorList input_vec;
input_vec.push_back(input.UnsafeSharedInstance());
caffe2::Predictor::TensorList output_vec;
    
_predictor->operator()(input_vec, &output_vec);
memcpy(output_data, output_vec[4].mutable_data<float>(), IMG_H * IMG_W * 7 * sizeof(float));

3,可能的错误

在编译你的代码时,你可能会遇到如下链接错误:

[F operator.h:1278] You might have made a build error: the Caffe2 library does not seem to be linked with whole-static library option. To do so, use -Wl,-force_load (clang) or -Wl,--whole-archive (gcc) to link the Caffe2 library.

在项目的”Build Settings”下的”Linking”下的”Other Linker Flags”中追加-force_load libcaffe2.a即可。

总结

在PyTorch的iOS编译上,PyTorch主要编译了protobuf、ATen、C10、Caffe2、caffe2/mobile/contrib/ios等。

    原文作者:Gemfield
    原文地址: https://zhuanlan.zhihu.com/p/60563168
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞