Caffe2源码理解系列之存储

2024年6月9日 151次阅读

Caffe2存储

Caffe2中的存储结构层次从上到下依次是Workspace, Blob, Tensor。Workspace存储了运行时所有的Blob和实例化的Net。Blob可以视为对任意类型的一个封装的类，比如封装Tensor, float, string等等。Tensor就是一个多维数组，这个Tensor就类似于Caffe1中的Blob。Caffe2中真正涉及到分配存储空间的调用则在Context中，分为CPUContext和CUDAContext。下面按照从下到上的顺序分析一下Caffe2的存储分配过程。

Context
Tensor
Blob
Workspace
总结

Context

本节主要集中在CPU部分的存储管理，GPU部分的管理以后再补上。

CPUContext

CPUContext的摘要如下：

class CPUContext final {
 public:
   ......
   // 分配存储空间
   static std::pair<void*, MemoryDeleter> New(size_t nbytes) {
      auto data_and_deleter = GetCPUAllocator()->New(nbytes);
      if (FLAGS_caffe2_report_cpu_memory_usage) {
          reporter_.New(data_and_deleter.first, nbytes);
          data_and_deleter.second = ReportAndDelete;
      }
      return data_and_deleter;
   }
   // 复制数据
   template <class T, class SrcContext, class DstContext>
   inline void Copy(size_t nbytes, const T* src, T* dst) {
     if (std::is_fundamental<T>::value) { 
      CopyBytes<SrcContext, DstContext>(n * sizeof(T),
                                     static_cast<const void*>(src),
                                     static_cast<void*>(dst));
     } else {
       for (int i = 0; i < n; ++i) {
         dst[i] = src[i];
      }
    }
   }

   ......
 protected:
   static MemoryAllocationReporter reporter_;
   ......
};
CPUAllocator* GetCPUAllocator() {
  return g_cpu_allocator.get();
}

CPUContext的基本功能就是分配nbytes字节的内存空间，以及在相同或者不同Context复制数据。
GetCPUAllocator返回一个指向CPUAllocator类的unique_ptr。而CPUAllocator则是提供分配空间的接口类（虚基类）, Caffe2提供了一个默认的主机端内存分配器DefaultCPUAllocator，它返回指定字节的对齐的内存，当然你也可以自定义实现高效的内存分配器，如同stl中的两级内存分配。

// A virtual allocator class to do memory allocation and deallocation.
struct CPUAllocator {
  CPUAllocator() {}
  virtual ~CPUAllocator() noexcept {}
  virtual std::pair<void*, MemoryDeleter> New(size_t nbytes) = 0; //返回分配内存的首地址以及销毁该段内存的可调用函数指针
  virtual MemoryDeleter GetDeleter() = 0;
}; //这里的MemoryDeleter就是一个函数指针，using MemoryDeleter = void (*)(void*);
static std::unique_ptr<CPUAllocator> g_cpu_allocator(new DefaultCPUAllocator());

这个Caffe2默认的内存分配器DefaultAllocator的功能就是调用_alligned_malloc（Linux调用posix_memalign）分配32字节对齐内存，然后调用free(Linux上调用_aligned_free)来释放内存。
一个典型的aligned_malloc分类代码如下:

void* aligned_malloc(size_t size, size_t alignment) {
    if(alignment & (alignment - 1)) { //分配2^n字节对齐的内存
        return nullptr
    } else {
        void *praw = malloc(sizeof(void*) + size + alignment);
        if(praw) {
            void *pbuf = reinterpret_cast<void*>(reinterpret_cast<size_t>(praw) + sizeof(void*));
            void *palignedbuf = reinterpret_cast<void*>((reinterpret_cast<size_t>(pbuf) | (alignment - 1)) + 1);
            (static_cast<void**>palignedbuf)[-1] = praw;
            return palignedbuf;
        }
        else {
           return nullptr;
        }
    }
}

void aligned_free(void *palignedmem) {
    free(reinterpret_cast<void*>((static_cast<void**>palignedmem)[-1]));
}

MemoryAllocationReporter则是用来记录分配内存的信息的线程安全类，包括每一段内存首地址以及该段内存的大小，和所有已经分配的内存大小。该类应该只产生一个实例，在CPUContext被声明为静态成员变量。

class MemoryAllocationReporter {
 public:
 ......

 private:
  std::mutex mutex_;
  std::unordered_map<void*, size_t> size_table_; //内存首地址,与该段内存的大小信息
  size_t allocated_;//记录Caffe2所有已经分配的内存大小，以字节计。
};

Tensor

Caffe2中的核心数据结构，所有参与矩阵运算的Op比如FCOp, ConvOp, ReluOp, PoolOp等等，它们的输入的核心就是这个Tensor。 Tensor就是一个与设备相关的多维数组，它封装了一段连续内存以及该Tensor的各个维度信息，基本上类似于Caffe1中的Blob, numpy中的ndarray.

template <class Context>
class Tensor {
  protected:
    vector<TIndex> dims_; //存储tensor维度信息
    TIndex size_ = -1; //该tensor的占用内存大小，以字节计。
    TypeMeta meta_; // 由于data是指向任意类型的指针，所以需要它来指定指向空间的类型
    std::shared_ptr<void> data_; //所有实际内存读写操作的就是该智能指针管理的部分。
  ...
 public:
  Tensor() {} //初始化一个空的tensor

  //创建一个指定各个维度的tensor，Resize并不会分配空间，这里采取了一种延迟分配空间技术，Caffe1其实也采取了这种技术，真正分配空间会等到第一次调用mutable_data时候。
  explicit Tensor(const vector<int>& dims) { Resize(dims); } 
  //返回指定类型的内存首地址，比如data<int>,data<float>, data<double>，该函数会确保真正的内存已经分配，它的调用规则就如同Caffe1中Blob的cpu_data或者gpu_data，当不需要修改数据时，调用它。
  template <typename T>
  inline const T* data() const {
    ...//做一些必要的类型检查
    return static_cast<T*>(data_.get());
  }
  //类似Caffe1中的Blob的mutable_cpu_data，mutable_gpu_data.
   template <typename T>
   inline T* mutable_data() {
      if ((size_ == 0 || data_.get()) && IsType<T>()) {
        return static_cast<T*>(data_.get());
      }
      return static_cast<T*>(raw_mutable_data(TypeMeta::Make<T>()));
    }

//真正涉及到销毁和分配新空间的函数就是这个raw_mutable_data
  inline void* raw_mutable_data(const TypeMeta& meta) {
    // For 0-size tensors it's fine to return any pointer (including nullptr)
    if (meta_ == meta && (data_.get() || size_ == 0)) {
      return data_.get();
    } else {
      bool had_special_dtor = meta_.dtor() != nullptr;
      meta_ = meta;
      CAFFE_ENFORCE_WITH_CALLER(
          size_ >= 0,
          "Tensor is not initialized. You probably need to call Resize() "
          "before calling mutable_data()");

      // We can reuse the existing buffer if the current data does not have
      // a special destructor and the new data doesn't have a special
      // constructor.
      if (size_ == 0 ||
          (meta.ctor() == nullptr && !had_special_dtor &&
           capacity_ >= size_ * meta_.itemsize())) {
        return data_.get();
      }
      if (meta.ctor()) { //一般这个if不会执行，因为Tensor主要是用来做数值计算，meta都是基本的数据类型，比如float,int,double等，如果是分配指向一些具有特别的构造和析构函数的对象，就要调用它。
        // For types that need placement new, we will call it, as well as
        // making sure that when the data is freed, it calls the right
        // destruction procedure.
        auto size = size_;
        auto dtor = meta_.dtor();
        auto ptr_and_deleter = Context::New(size_ * meta_.itemsize());//这里就是调用DefaultAllocator的New或者CUDAAllocator的New，返回分配空间的首地址，以及如何销毁这段空间的函数指针。
        auto deleter = ptr_and_deleter.second; //释放空间的指针
        data_.reset(//这里的功能类似一个类的对象在析构时候的行为，析构时先调用析构函数做一些清理工作，然后在free或者delete掉该对象占用的内存空间。
            ptr_and_deleter.first, [size, dtor, deleter](void* ptr) -> void {
              dtor(ptr, size); //调用析构函数
              deleter(ptr); //释放空间free或者_aligned_free
            });//销毁之前的内存，开辟新的空间
        meta_.ctor()(data_.get(), size_);
      } else { //基本的数值计算就执行这条分支了。
        // For fundamental type, new and delete is easier.
        auto ptr_and_deleter = Context::New(size_ * meta_.itemsize());
        data_.reset(ptr_and_deleter.first, ptr_and_deleter.second);
      }
      capacity_ = size_ * meta_.itemsize();
      return data_.get();
    }
  }
};

Blob

Caffe2代码注释对这个Blob的介绍非常清晰。

A Blob hosts a pointer as well as its type, and takes charge of deleting it properly when the blob is deallocated or re-allocated with a new type. A blob could contain anything, although the most common case is to contain a Tensor. —摘自caffe2/core/blob.h

class Blob {
 public:
  Blob() : meta_(), pointer_(nullptr) {}
  ~Blob() { Reset(); }
  /** * @brief Gets the const reference of the stored object. The code checks if * the stored object is of the desired type. */
  template <class T>
  const T& Get() const {
    CAFFE_ENFORCE(IsType<T>(),
        "wrong type for the Blob instance. Blob contains ",
        meta_.name(), " while caller expects ", TypeMeta::Name<T>());
    return *static_cast<const T*>(pointer_);
  }

  /** * @brief Gets a mutable pointer to the stored object. * * If the current object is not of the right type, a new object is created * and the old object is freed. Note that type T should have a default * constructor. Otherwise, create the object yourself first, and use * Reset(). */
  template <class T>
  T* GetMutable(bool* is_new_object=nullptr) {
    if (IsType<T>()) {
      if (is_new_object) *is_new_object = false;
      return static_cast<T*>(pointer_);
    } else {
      if (is_new_object) *is_new_object = true;
      VLOG(1) << "Create new mutable object " << TypeMeta::Name<T>();
      return Reset<T>(new T());
    }
  }

  /** * Sets the underlying object to the allocated one. The Blob then takes over * the ownership of the passed in pointer. If there is already an object in * the Blob, the old object is freed. * * This is used when the underlying class T does not have a default ctor, or * complex initializations needs to be done outside the blob. */
  template <class T>
  T* Reset(T* allocated) {
    if (pointer_ && destroy_) {
      destroy_(pointer_);
    }
    meta_ = TypeMeta::Make<T>();
    pointer_ = static_cast<void*>(allocated);
    destroy_ = &Destroy<T>;
    return allocated;
  }
...
 private:
  /** * @brief A destroy call that is used to properly deconstruct objects. */
  template <class T>
  static void Destroy(void* pointer) {
    delete static_cast<T*>(pointer);
  }
  typedef void (*DestroyCall)(void *);
  TypeMeta meta_;
  void* pointer_ = nullptr;
  DestroyCall destroy_ = nullptr;
  DISABLE_COPY_AND_ASSIGN(Blob);
};

这个Blob实现较为简单，基本上就是做了一层装，以及加入一些能够序列化合反序列化的操作。
下面摘自Caffe2/binaries/tutorial_blob.cc中关于这个使用Blob的一段代码，它展示了同一个blob对象可包含int，float, double甚至是string对象。：

  Blob myblob;
  int* myint = myblob.GetMutable<int>();
  *myint = 10;
  const int& myint_const = myblob.Get<int>();
  // const float& myfloat = myblob.Get<float>();//wrong!抛出异常，类型不匹配。
  double* mydouble = myblob.GetMutable<double>(); //释放myint 4字节空间，分配mydouble 8字节空间
  *mydouble = 3.14;
  std::string* pvec = new std::string();
  myblob.Reset(pvec); // no need to release pvec, myblob takes ownership.

Workspace

Workspace is a class that holds all the related objects created during runtime: (1) all blobs, and (2) all instantiated networks. It is the owner of all these objects and deals with the scaffolding logistics.

workspace就是Caffe2中几乎所有Blob和Net所在的，一般地，Blob的申请只能通过它来完成。每一段内存都有的”键“进行实名制管理。工作区之间的内存是隔离的，所有的Operator的构造函数都需要一个Workspace的指针，通常情况下只有一个workspace，这就意味着所有的Operator的构造函数中传入的workspace指针是同一个。一个Operator的输入输出都存储在workspace中，这为内存优化提供了便利。
Workspace中核心数据成员：

  std::map<string, unique_ptr<Blob> > blob_map_;
  std::map<string, unique_ptr<NetBase> > net_blob_;

当创建一个新的Blob时，需要提供一个该Blob的名字。CreateBlob成员函数如下：

Blob* Workspace::CreateBlob(const string& name) {
  if (HasBlob(name)) {
    VLOG(1) << "Blob " << name << " already exists. Skipping.";
  } else if (forwarded_blobs_.count(name)) {
    // possible if parent workspace deletes forwarded blob
    VLOG(1) << "Blob " << name << " is already forwarded from parent workspace "
            << "(blob " << forwarded_blobs_[name].second << "). Skipping.";
  } else {
    VLOG(1) << "Creating blob " << name;
    blob_map_[name] = unique_ptr<Blob>(new Blob());
  }
  return GetBlob(name);//如果是新创建的Blob,则返回之前new Blob()那句话，否则就从blob_map中直接返回。
}

总结

下面是Operator中从创建Blob到实际分配空间的流程:

Created with Raphaël 2.1.0 ws->CreateBlob(output_str) blob_map_[name] = unique_ptr<Blob>(new Blob()); Blob::Resize(); Blob::mutable_data<Tensor>(new Tensor()); Tensor::raw_mutable_data(); CPUContext::New(); CPUAllocator::New(); alligned_malloc(); 分配内存结束

以上就是Caffe2分配Blob所涉及到的流程。