當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

OpenCL入门程序

發(fā)布時(shí)間：2023/12/9 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 OpenCL入门程序小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

注意：???如果是從顯存到顯存不是用writebuffer而是用copybuffer

?以前就聽說OPenCL，今天就特地使用了一下，我的機(jī)器是N卡，首先裝上了CUDA的開發(fā)包，由于CUDA對(duì)OPenCL支持比較好，就選擇了N卡上的GPU并行計(jì)算。

???? OPenCL是一個(gè)開放的標(biāo)準(zhǔn)和規(guī)范，全程是開放計(jì)算庫，主要是發(fā)揮計(jì)算機(jī)的所有計(jì)算資源，包括CPU、GPU、多核等。所以說OPenCL是一個(gè)跨硬件和軟件平臺(tái)的開放標(biāo)準(zhǔn)，在此框架下開發(fā)的并行計(jì)算程序很容易就能移植到其他平臺(tái)上，也許是這樣吧。其實(shí)，關(guān)于GPU的并行計(jì)算的大致思路一般都是CPU向GPU發(fā)送一個(gè)計(jì)算指令，然后把數(shù)據(jù)拷貝的GPU的顯存中參與計(jì)算，然后將計(jì)算好的顯存中的數(shù)據(jù)拷貝到主機(jī)內(nèi)存中，雖然說，過程大概就是這樣，但是其中涉及到的細(xì)節(jié)可是特別多。下面就以一個(gè)簡(jiǎn)單的例子為例講述OPenCL編程開發(fā)的一般步驟和模型。

??? 第一步，首先獲得可以參與計(jì)算的OPenCL平臺(tái)個(gè)數(shù)

cl_uint numPlatforms = 0; //GPU計(jì)算平臺(tái)個(gè)數(shù) cl_platform_id platform = NULL; clGetPlatformIDs(0,NULL,&numPlatforms);

??? ? 第二步，獲得平臺(tái)的列表，并選擇其中的一個(gè)作為計(jì)算的平臺(tái)??

//獲得平臺(tái)列表 cl_platform_id * platforms = (cl_platform_id*)malloc(numPlatforms * sizeof(cl_platform_id)); clGetPlatformIDs (numPlatforms, platforms, NULL); //輪詢各個(gè)opencl設(shè)備 for (cl_uint i = 0; i < numPlatforms; i ++) { char pBuf[100]; clGetPlatformInfo(platforms[i],CL_PLATFORM_NAME,sizeof(pBuf),pBuf,NULL); printf("%s\n",pBuf); platform = platforms[i]; } free(platforms);

第三步，獲得硬件設(shè)備以及生成上下文??

//獲得GPU設(shè)備 cl_device_id device; status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); //生成上下文 cl_context context = clCreateContext(0, 1, &device, NULL, NULL, &status);

至此，OPenCL的初始化工作已經(jīng)完成，最好將這個(gè)過程封裝成一個(gè)函數(shù)。??

第四步，裝載內(nèi)核程序代碼以及生成program

//裝載內(nèi)核程序 size_t szKernelLength = 0; size_t sourceSize[] = {strlen(kernelSourceCode1)}; char *cFileName = "kernel.cl"; char * cPathAndName= shrFindFilePath(cFileName, argv[0]); const char* kernelSourceCode = oclLoadProgSource(cPathAndName, "", &szKernelLength); cl_program program = clCreateProgramWithSource(context,1,&kernelSourceCode,&szKernelLength,&status); //為所有指定的設(shè)備生成CL_program status = clBuildProgram(program,1,&device,NULL,NULL,NULL); size_t len = 0; char buf[2048]; if (status != CL_SUCCESS) { status = clGetProgramBuildInfo(program,device,CL_PROGRAM_BUILD_LOG,sizeof(buf),buf,&len); printf("%s\n", buf); exit(1); }

第五步，創(chuàng)建一個(gè)命令隊(duì)列，將這個(gè)命令隊(duì)列放入內(nèi)核程序中執(zhí)行??

//創(chuàng)建一個(gè)opencl命令隊(duì)列 cl_command_queue commandQueue = clCreateCommandQueue(context,device,0,&status); //創(chuàng)建opencl buffer對(duì)象 cl_mem outputBuffer = clCreateBuffer(context,CL_MEM_ALLOC_HOST_PTR,4*4*4,NULL,&status); //得到指定名字的內(nèi)核實(shí)例句柄 cl_kernel kernel = clCreateKernel(program,"hellocl",&status); //為內(nèi)核程序設(shè)置相應(yīng)的參數(shù),也就是函數(shù)傳參 status = clSetKernelArg(kernel,0,sizeof(cl_mem),&outputBuffer); //將一個(gè)kernel放入隊(duì)列 size_t globalThreads[] = {4,4}; size_t localThreads[] = {2,2}; //開始在設(shè)備上執(zhí)行核函數(shù) status = clEnqueueNDRangeKernel(commandQueue,kernel,2,NULL, globalThreads,localThreads,0,NULL,NULL); status = clFinish(commandQueue);

第六步，將計(jì)算結(jié)果拷貝到主存中??

//將GPU本地內(nèi)存中的數(shù)據(jù)拷回到host端的內(nèi)存中 unsigned int *outbuffer = new unsigned int[4*4]; memset(outbuffer,0,4*4*4); status = clEnqueueReadBuffer(commandQueue,outputBuffer, CL_TRUE,0,4*4*4,outbuffer,0,NULL,NULL);

第七步，顯示及清理內(nèi)存??

printf("out:\n"); for (int i = 0; i < 16; i ++) { printf("%x ",outbuffer[i]); if ((i+1)%4 == 0) { printf("\n"); } } //清理部分 status = clReleaseKernel(kernel); status = clReleaseProgram(program); status = clReleaseMemObject(outputBuffer); status = clReleaseCommandQueue(commandQueue); status = clReleaseContext(context); delete outbuffer;

??? ???

核函數(shù)如下：

__kernel void hellocl (__global uint *buffer) { uint dim = get_work_dim(); //獲得工作空間的維度信息 size_t gidx,gidy,gidz; size_t gsizx,gsizy,gsizz; if(dim == 1) { gidx = get_global_id(0); gsizx = get_global_size(0); buffer[gidx] = gidx; } else if(dim == 2) { gidx = get_global_id(0); gidy = get_global_id(1); gsizx = get_global_size(0); gsizy = get_global_size(1); buffer[gidx+gidy*gsizx] = (1<<gidx)|(0x10<<gidy);; } else { gidx = get_global_id(0); gidy = get_global_id(1); gidy = get_global_id(2); gsizx = get_global_size(0); gsizy = get_global_size(1); gsizz = get_global_size(2); buffer[gidx + gidy*gsizx + gidz*gsizx*gsizy] = gidx; } }

運(yùn)算結(jié)果顯示如下圖：??

這只是最簡(jiǎn)單的程序，復(fù)雜算法的并行化還需要深入研究。

總結(jié)

以上是生活随笔為你收集整理的OpenCL入门程序的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Supplemental Logging
下一篇：在WIN10专业版上安装WINCC7.4