當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

caffe common 程序分析类中定义类

發布時間：2023/12/20 编程问答 42 豆豆

生活随笔收集整理的這篇文章主要介紹了 caffe common 程序分析类中定义类小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

caffe中有 common.hpp 和common.cpp

// The main singleton of Caffe class and encapsulates the boost and CUDA random number
// generation function, providing a unified interface.

caffe的singleton?類，封裝boost和cuda等操作。提供一個統一的接口，是一種常見的設計模式

（1）設置cuda 隨機數

在具體實現中，這里還在類中定義一個類，例如：

class Caffe {
?public:
??~Caffe();
??inline static Caffe& Get() {
????if (!singleton_.get()) {
??????singleton_.reset(new Caffe());
????}
????return *singleton_;
??}
??enum Brew { CPU, GPU };

??// This random number generator facade hides boost and CUDA rng
??// implementation from one another (for cross-platform compatibility).
??class RNG {
???public:
????RNG();
????explicit RNG(unsigned int seed);
????explicit RNG(const RNG&);
????RNG& operator=(const RNG&);
????void* generator();
???private:
????class Generator;
????shared_ptr<Generator> generator_;
??};

}

類中定義一個類，雖然可以，但是建議盡量不要用，可讀性不好。類都應當對是可以獨立存在的抽象

這種方法主要是用于封裝，要訪問 RNG類，可以通過使用Caffe::RNG來用

這種方法可以在類中封裝結構體。但是在c++中結構體和類其實是一個東西，唯一區別是類的成員默認是private，而結構體是public

但是由于一直以來的習慣，結構體一般只是作為存儲數據用的數據結構，沒有具體行為，這點也可以看做和類的區別，因為類是有行為的（成員函數）

結構體定義在類的內部和外部都是可以的，但是為了程序的可讀性，一般定義在類的外部。

----------------------------------------------------------------------------------------------------------------------------

其中用到一個宏定義CUDA_KERNEL_LOOP

在common.hpp中有。

#defineCUDA_KERNEL_LOOP(i,n) \

for(inti = blockIdx.x * blockDim.x + threadIdx.x; \

i < (n); \

i +=blockDim.x * gridDim.x)

先看看caffe采取的線程格和線程塊的維數設計，

還是從common.hpp可以看到

CAFFE_CUDA_NUM_THREADS

CAFFE_GET_BLOCKS(constintN)

明顯都是一維的。

整理一下CUDA_KERNEL_LOOP格式看看，

for(inti = blockIdx.x * blockDim.x + threadIdx.x;

i< (n);

i+= blockDim.x * gridDim.x)

blockDim.x* gridDim.x表示的是該線程格所有線程的數量。

n表示核函數總共要處理的元素個數。

有時候，n會大于blockDim.x* gridDim.x，因此并不能一個線程處理一個元素。

由此通過上面的方法，讓一個線程串行（for循環）處理幾個元素。

這其實是常用的伎倆，得借鑒學習一下。

再來看一下這個核函數的實現。

template<typename Dtype>

__global__void mul_kernel(const int n, const Dtype* a,

constDtype* b, Dtype* y)

{

CUDA_KERNEL_LOOP(index,n)

{

y[index]= a[index] * b[index];

}

明顯就是算兩個向量的點積了。

由于向量的維數可能大于該kernel函數線程格的總線程數量。

因此有些線程可以要串行處理幾個元素。