日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 人文社科 > 生活经验 >内容正文

生活经验

OpenCL框架与示例

發布時間:2023/11/28 生活经验 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 OpenCL框架与示例 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

OpenCL框架與示例
下面的圖簡單說明了OpenCL的編程框架,圖是用的GPU,其他類似;

名詞的概念:
Platform (平臺):主機加上OpenCL框架管理下的若干設備構成了這個平臺,通過這個平臺,應用程序可以與設備共享資源并在設備上執行kernel。實際使用中基本上一個廠商對應一個Platform,比如Intel, AMD都是這樣。
Device(設備):官方的解釋是計算單元(Compute Units)的集合。舉例來說,GPU是典型的device。Intel和AMD的多核CPU也提供OpenCL接口,所以也可以作為Device。
Context(上下文):OpenCL的Platform上共享和使用資源的環境,包括kernel、device、memory objects、command queue等。使用中一般一個Platform對應一個Context。
Program:OpenCL程序,由kernel函數、其他函數和聲明等組成。
Kernel(核函數):可以從主機端調用,運行在設備端的函數。
Memory Object(內存對象):在主機和設備之間傳遞數據的對象,一般映射到OpenCL程序中的global memory。有兩種具體的類型:Buffer Object(緩存對象)和Image Object(圖像對象)。
Command Queue(指令隊列):在指定設備上管理多個指令(Command)。隊列里指令執行可以順序也可以亂序。一個設備可以對應多個指令隊列。
NDRange:主機端運行設備端kernel函數的主要接口。實際上還有其他的,NDRange是非常常見的,用于分組運算,以后具體用到的時候就知道區別了。
Host端來看,OpenCL的組要執行流程是這樣的:

內存模型
最后寫一下Opencl的內存模型,看下面的示意圖:

用核函數中的內存變量來簡單地解釋:用clCreateBuffer 創建、用clSetKernelArg 傳遞的數據在global memory 和constant memory中;核函數中的寄存器變量在private memory 中;核函數的內部變量、緩存等,在local memory 中。圖例中可以看到Device 并不直接訪問global memory,而是通過Cache 來訪問。可以想象當同時運行的work-item,使用的內存都在同一塊cache 中,則內存吞吐的效率最高。對應到work group 中,就是在程序設計上盡量使同一個work group 中的work item 操作連續的內存,以提高訪存效率。

TVM OpenCL示例

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* “License”); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

/*!* \file opencl_module.cc*/
#include "opencl_module.h"#include <dmlc/memory_io.h>
#include <tvm/runtime/registry.h>#include <string>
#include <unordered_map>
#include <vector>#include "../source_utils.h"
#include "opencl_common.h"namespace tvm {
namespace runtime {class OpenCLWrappedFunc {public:// initialize the OpenCL function.void Init(OpenCLModuleNode* m, ObjectPtr<Object> sptr, OpenCLModuleNode::KTRefEntry entry,std::string func_name, std::vector<size_t> arg_size,const std::vector<std::string>& launch_param_tags) {w_ = m->GetGlobalWorkspace();m_ = m;sptr_ = sptr;entry_ = entry;func_name_ = func_name;arg_size_ = arg_size;launch_param_config_.Init(arg_size.size(), launch_param_tags);}// invoke the function with void argumentsvoid operator()(TVMArgs args, TVMRetValue* rv, void** void_args) const {ICHECK(w_->context != nullptr) << "No OpenCL device";cl::OpenCLThreadEntry* t = w_->GetThreadEntry();// get the kernel from thread local kernel table.if (entry_.kernel_id >= t->kernel_table.size()) {t->kernel_table.resize(entry_.kernel_id + 1);}const auto& e = t->kernel_table[entry_.kernel_id];cl_kernel kernel = e.kernel;if (kernel == nullptr || e.version != entry_.version) {kernel = m_->InstallKernel(w_, t, func_name_, entry_);}// setup arguments.for (cl_uint i = 0; i < arg_size_.size(); ++i) {void* arg = nullptr;if (args.type_codes[i] == DLDataTypeCode::kDLOpaqueHandle) {arg = static_cast<cl::BufferDescriptor*>(void_args[i])->buffer;} else {arg = void_args[i];}OPENCL_CALL(clSetKernelArg(kernel, i, arg_size_[i], arg));}cl_command_queue queue = w_->GetQueue(t->device);ThreadWorkLoad wl = launch_param_config_.Extract(args);cl_uint work_dim = static_cast<cl_uint>(launch_param_config_.work_dim());for (cl_uint i = 0; i < work_dim; ++i) {wl.work_size[i] *= wl.work_size[i + 3];}// launch kernelOPENCL_CALL(clEnqueueNDRangeKernel(queue, kernel, work_dim, nullptr, wl.work_size,wl.work_size + 3, 0, nullptr, nullptr));}private:// global workspace.cl::OpenCLWorkspace* w_;// The moduleOpenCLModuleNode* m_;// resource handleObjectPtr<Object> sptr_;// global kernel id in the kernel table.OpenCLModuleNode::KTRefEntry entry_;// The name of the function.std::string func_name_;// convert code for void argumentstd::vector<size_t> arg_size_;// launch parameters configLaunchParamConfig launch_param_config_;
};OpenCLModuleNode::~OpenCLModuleNode() {{// free the kernel ids in global table.std::lock_guard<std::mutex> lock(workspace_->mu);for (auto& kv : kid_map_) {workspace_->free_kernel_ids.push_back(kv.second.kernel_id);}}// free the kernelsfor (cl_kernel k : kernels_) {OPENCL_CALL(clReleaseKernel(k));}// free the programsfor (auto& kv : programs_) {for (auto& program : kv.second) {if (program) {OPENCL_CALL(clReleaseProgram(program));}}}
}cl::OpenCLWorkspace* OpenCLModuleNode::GetGlobalWorkspace() {return cl::OpenCLWorkspace::Global();
}PackedFunc OpenCLModuleNode::GetFunction(const std::string& name,const ObjectPtr<Object>& sptr_to_self) {ICHECK_EQ(sptr_to_self.get(), this);ICHECK_NE(name, symbol::tvm_module_main) << "Device function do not have main";auto it = fmap_.find(name);if (it == fmap_.end()) return PackedFunc();const FunctionInfo& info = it->second;OpenCLWrappedFunc f;std::vector<size_t> arg_size(info.arg_types.size());for (size_t i = 0; i < info.arg_types.size(); ++i) {DLDataType t = info.arg_types[i];ICHECK_EQ(t.lanes, 1U);if (t.code == kTVMOpaqueHandle) {// specially store pointer type size in OpenCL driverarg_size[i] = sizeof(void*);} else {uint32_t bits = t.bits;ICHECK_EQ(bits % 8, 0U);arg_size[i] = bits / 8;}}// initialize the wrapped func.f.Init(this, sptr_to_self, kid_map_.at(name), name, arg_size, info.launch_param_tags);return PackFuncVoidAddr(f, info.arg_types);
}void OpenCLModuleNode::SaveToFile(const std::string& file_name, const std::string& format) {std::string fmt = GetFileFormat(file_name, format);ICHECK_EQ(fmt, fmt_) << "Can only save to format=" << fmt_;std::string meta_file = GetMetaFilePath(file_name);SaveMetaDataToFile(meta_file, fmap_);SaveBinaryToFile(file_name, data_);
}void OpenCLModuleNode::SaveToBinary(dmlc::Stream* stream) {stream->Write(fmt_);stream->Write(fmap_);stream->Write(data_);
}std::string OpenCLModuleNode::GetSource(const std::string& format) {if (format == fmt_) return data_;if (fmt_ == "cl") {return data_;} else {return source_;}
}void OpenCLModuleNode::Init() {workspace_ = GetGlobalWorkspace();workspace_->Init();// initialize the kernel id, need to lock global table.std::lock_guard<std::mutex> lock(workspace_->mu);for (const auto& kv : fmap_) {const std::string& key = kv.first;KTRefEntry e;if (workspace_->free_kernel_ids.size() != 0) {e.kernel_id = workspace_->free_kernel_ids.back();workspace_->free_kernel_ids.pop_back();} else {e.kernel_id = workspace_->num_registered_kernels++;}e.version = workspace_->timestamp++;kid_map_[key] = e;}// split into source artifacts for each kernelparsed_kernels_ = SplitKernels(GetSource("cl"));ICHECK(!parsed_kernels_.empty()) << "The OpenCL module expects a kernel delimited "<< "source from code generation, but no kernel "<< "delimiter was found.";ICHECK_EQ(fmap_.size(), parsed_kernels_.size())<< "The number of parsed kernel sources does not match the number of kernel functions";// zero initialize cl_program pointers for each device kernelfor (auto& kv : parsed_kernels_) {programs_.insert({kv.first, std::vector<cl_program>(workspace_->devices.size(), nullptr)});}
}cl_kernel OpenCLModuleNode::InstallKernel(cl::OpenCLWorkspace* w, cl::OpenCLThreadEntry* t,const std::string& func_name, const KTRefEntry& e) {std::lock_guard<std::mutex> lock(build_lock_);int device_id = t->device.device_id;if (programs_[func_name][device_id] == nullptr) {// create programif (fmt_ == "cl") {const char* s = parsed_kernels_[func_name].c_str();size_t len = parsed_kernels_[func_name].length();cl_int err;programs_[func_name][device_id] = clCreateProgramWithSource(w->context, 1, &s, &len, &err);OPENCL_CHECK_ERROR(err);} else if (fmt_ == "xclbin" || fmt_ == "awsxclbin" || fmt_ == "aocx") {const unsigned char* s = (const unsigned char*)data_.c_str();size_t len = data_.length();cl_int err;cl_device_id dev = w->devices[device_id];programs_[func_name][device_id] =clCreateProgramWithBinary(w->context, 1, &dev, &len, &s, NULL, &err);OPENCL_CHECK_ERROR(err);} else {LOG(FATAL) << "Unknown OpenCL format " << fmt_;}// build programcl_int err;cl_device_id dev = w->devices[device_id];err = clBuildProgram(programs_[func_name][device_id], 1, &dev, nullptr, nullptr, nullptr);if (err != CL_SUCCESS) {size_t len;std::string log;clGetProgramBuildInfo(programs_[func_name][device_id], dev, CL_PROGRAM_BUILD_LOG, 0, nullptr,&len);log.resize(len);clGetProgramBuildInfo(programs_[func_name][device_id], dev, CL_PROGRAM_BUILD_LOG, len,&log[0], nullptr);LOG(FATAL) << "OpenCL build error for device=" << dev << "\n" << log;}}// build kernelcl_int err;cl_kernel kernel = clCreateKernel(programs_[func_name][device_id], func_name.c_str(), &err);OPENCL_CHECK_ERROR(err);t->kernel_table[e.kernel_id].kernel = kernel;t->kernel_table[e.kernel_id].version = e.version;kernels_.push_back(kernel);return kernel;
}Module OpenCLModuleCreate(std::string data, std::string fmt,std::unordered_map<std::string, FunctionInfo> fmap, std::string source) {auto n = make_object<OpenCLModuleNode>(data, fmt, fmap, source);n->Init();return Module(n);
}// Load module from module.
Module OpenCLModuleLoadFile(const std::string& file_name, const std::string& format) {std::string data;std::unordered_map<std::string, FunctionInfo> fmap;std::string fmt = GetFileFormat(file_name, format);std::string meta_file = GetMetaFilePath(file_name);LoadBinaryFromFile(file_name, &data);LoadMetaDataFromFile(meta_file, &fmap);return OpenCLModuleCreate(data, fmt, fmap, std::string());
}Module OpenCLModuleLoadBinary(void* strm) {dmlc::Stream* stream = static_cast<dmlc::Stream*>(strm);std::string data;std::unordered_map<std::string, FunctionInfo> fmap;std::string fmt;stream->Read(&fmt);stream->Read(&fmap);stream->Read(&data);return OpenCLModuleCreate(data, fmt, fmap, std::string());
}TVM_REGISTER_GLOBAL("runtime.module.loadfile_cl").set_body_typed(OpenCLModuleLoadFile);TVM_REGISTER_GLOBAL("runtime.module.loadfile_clbin").set_body_typed(OpenCLModuleLoadFile);TVM_REGISTER_GLOBAL("runtime.module.loadbinary_opencl").set_body_typed(OpenCLModuleLoadBinary);
}  // namespace runtime
}  // namespace tvm

參考鏈接:
參考鏈接:https://blog.csdn.net/xbinworld/article/details/46494275
https://github.com/apache/tvm/blob/main/src/runtime/opencl/opencl_module.cc

總結

以上是生活随笔為你收集整理的OpenCL框架与示例的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。