summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* standalone utest for unified OpenCL implementation.Luo Xionghu2016-04-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | use the variable NOT_BUILD_STAND_ALONE_UTEST to control the build type: for beignet build, set it to 1; for stand alone build, do NOT need set it. remove all clXXXIntel extension call and such kind of tests since we intend to provide the unit test independently for viariant OpenCL implementation; replace the clMapBufferIntel/clMapBufferGTTIntel with clEnqueueMapBuffer/clEnqueueMapImage; link the utest binary to libOpenCL to follow the icd standard; remove the useless env in setenv.sh since we need make install the package after build. v2: fix the indent error; use function pointer for extesion case like vme and libva since we link to libOpenCL; v3: builtin_kernel_block_motion_estimate_intel released kernel twice; v4: find OpenCL library for standalone utest and link to libcl for not standalone utest; check default variables in setenv.sh whether empty before use. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* add sanity check for Image Region in runtime.Luo Xionghu2016-04-221-0/+5
| | | | | Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement printf buffer management.Yan Wang2016-04-227-123/+44
| | | | | | | Contributor: Junyan He <junyan.he@linux.intel.com> Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
* Runtime: Add SKL device id for new SKL devicePan Xiuli2016-04-082-4/+20
| | | | | | | Add skylakd workstation device and desktop GT4 Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Increase size for compile log outputGiuseppe Bilotta2016-04-081-2/+3
| | | | | | | | | | | | | | | The space to hold the output log from the compiler is currently preallocated. The previous size (1000 chars) is too small when many warnings and/or errors are present. Enlarge the buffer to 1024*1024 chars, in order to allow up to 1024 errors, each taking 1024 bytes to report. This is still a rather arbitrary choice, but should hopefully fit a wider range of cases. (This fixes an issue reported on #opencl) Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add condition checking of residuals because it may be NULL.Yan Wang2016-04-051-3/+6
| | | | | Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
* fix typo for DEBUGP to avoid print extra empty lineGuo Yejun2016-01-191-3/+5
| | | | | | | | | | | for the case without {}: if (...) DEBUGP(...) the second printf of DEBUGP is always enabled, fixed it. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: He Junyan <junyan.he@linux.intel.com>
* change built-in function name from get_sub_group_size to get_max_sub_group_sizeGuo Yejun2016-01-081-1/+1
| | | | | | | | | | | Fix bug at https://bugs.freedesktop.org/show_bug.cgi?id=93469 The fucntion is mapped to OP_SIMD_SIZE which returns the constant SIMD width, the correct function name is get_max_sub_group_size. contributor: Georg Kolling <georg.kolling@gmail.com> Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* SKL: use the hw defautl value mocs index before linux 4.3.Yang Rong2016-01-051-1/+15
| | | | | | | | From linux 4.3, kernel redefined the mocs table's value, But before 4.3, still used the hw defautl value. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* do not call memcpy for cl_enqueue_read_buffer if userptr is enabledGuo Yejun2016-01-051-1/+4
| | | | | | | | sometimes, application invokes read buffer, instead of map buffer, even if userptr is enabled. memcpy is not necessary for such case. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* output warning message if do not find a good local_work_sizeGuo Yejun2015-12-233-5/+17
| | | | | | | | | If the user provides local_work_size as NULL in clEnqueueNDRangeKernel, and we could not find a good value inside driver, output a warning message with macro DEBUGP, and also refine the macro. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Driver: Fix GPGPU delete bugPan Xiuli2015-12-231-2/+2
| | | | | | | | | The first patch 192feb51 has something wrong in rebase and takes new bug in. Now fix both the original bug and revert the wrong patch. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: because double's built-ins haven't completely support, so disable ↵Yang Rong2015-12-231-0/+21
| | | | | | | | | it by default. Add a cmake option for it, cmake with option -DEXPERIMENTAL_DOUBLE=true to enable it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* add support for build option -cl-fast-relaxed-mathGuo Yejun2015-12-211-1/+1
| | | | | | | | | current code handles this option at clang level, actually, it is also necessary at LLVM -> GEN stage. V2: check if options is NULL Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Add the threadid calculation for curbe.Junyan He2015-12-141-1/+11
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* add Broxton supportGuo Yejun2015-12-143-7/+41
| | | | | | | | special versions of linux kernel and libdrm are needed. utest and conformance test PASSED. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
* runtime: add macro DEBUGP() to handle debug printf.Ruiling Song2015-12-102-3/+9
| | | | | | | | It will output debug message under debug mode, and will not output under release mode. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* SKL: Use kernel-defined MOCS values instead of assuming hardware defaults.Francisco Jerez2015-12-091-2/+2
| | | | | | | | | Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975 Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* fix gcc build error.Luo Xionghu2015-12-092-2/+2
| | | | | | | this link fail appears on gcc 5.2.1. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: add missing supported format image_1d_buffer.Luo Xionghu2015-12-091-0/+1
| | | | | Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: fix clLinkProgram bug.Luo Xionghu2015-12-091-0/+10
| | | | | | | | | clLinkProgram need check the existence of "-cl-kernel-arg-info" build_option of all the input_programs. User may link two SPIR program and call clGetKernelArgInfo to query kernel args. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: fix clCompileProgram bug.Luo Xionghu2015-12-093-13/+6
| | | | | | | | | | | forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is BINARY_TYPE_INTERMIDIATE if create from SPIR binary. v2: refine the source_type logic: source_type is already set in clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set in clBuildProgram or clCompileProgram. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: initialize the memory content to 0.Luo Xionghu2015-12-091-0/+1
| | | | | | | v2: move the memset inside the pointer check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* make Beignet as intermedia layer of CMRTGuo Yejun2015-12-0914-14/+491
| | | | | | | | | | | | | | | | | | | | | | | | | CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt. There is a request to make Beignet as intermedia layer of CMRT, in other words, application programer write OpenCL APIs to execute the CM kernel on GPU, the following shows the key code, and please refer to the next patch of unit test for detail. prog = clCreateProgramWithBinary("cm kernel"); clBuildProgram(prog); kernel = clCreateKernel(prog, "kernel name"); image = clCreateImage(); clSetKernelArg(kernel, image); clEnqueueNDRangeKernel(kernel); Inside Beignet, once cm kernel is invoked, the following relative APIs will be directly passed to CMRT library (libcmrt.so) which is loaded via dlopen only when necessary. Since we use this simple method to keep the code clean, OpenCL spec is not strictly followed, and cl_event is not supported for this case. v2: add comments about the cm queue in fuction cmrt_enqueue Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* check image from buffer's base address alignment.Luo Xionghu2015-11-261-0/+9
| | | | | | | | | | per spec, if create image from USE_HOST_PTR buffer, the buffer's base address need be aligned. v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to kernel's ↵Zhigang Gong2015-11-255-6/+13
| | | | | | | | | | | | SIMD_WIDTH. It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the corresponding SIMD size. Then it provides a way for intel's OCL application to get SIMD width at runtime and make some SIMD width dependant optimization possible. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: return the correct error code in cl_event_check_waitlist.Yang Rong2015-11-191-2/+4
| | | | | | | | Return CL_INVALID_CONTEXT if the context associated with command_queue and events in event_wait_list are not the same. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
* Runtime: Bind the profiling buffer when profiling enabled.Junyan He2015-11-176-1/+126
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add profiling info APIs to runtime.Junyan He2015-11-172-0/+18
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* CMake: Add -lrt to the link command of libcl.soJunyan He2015-11-171-0/+1
| | | | | | | | | The clock_gettime will cause the linkage error on some version of GCC, we need to add -lrt at the end of the link command line. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Full support of cl_intel_motion_estimation extension.Chuanbo Weng2015-11-171-33/+166
| | | | | | | | | | | The following items are supported in this commit: 1. Return residuals. 2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in cl_motion_estimation_desc_intel. After this commit, cl_intel_motion_estimation is fully supported. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* runtime: extension size not enough.Luo Xionghu2015-11-113-3/+10
| | | | | | | | | | define a MACRO to hold the value. v2: use same MACRO in cl_extensions.h; add header file protection for cl_extension.h. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Add extensions intel_accelerator and basic intel_motion_estimation.Chuanbo Weng2015-11-1021-33/+914
| | | | | | | | | | | | | | | | | | | | | | | | | v2: 1. Just upload the first vme_state. 2. Remove duplicated code in check_opt1_extension. 3. Check image format before cl_gpgpu_bind_image_for_vme. 4. Fix error of getting mv. Because we suppose this kernel run in SIMD16 mode, so dword 0 of grf 1 should be __gen_ocl_region(8,vme_result.s0), not __gen_ocl_region(0,vme_result.s1). v3: Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required one. v4: Fix two conflicts after code rebase and wordaround a curbe related bug. v6: Treat simd8 and simd16 differently when getting mv. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Add extension clCreateImageFromFdINTEL to create cl image by external fd.Chuanbo Weng2015-11-106-0/+125
| | | | | | | | | | | | | | | | | | | | Before this patch, Beignet can only create cl image from external bo by its handle using clCreateImageFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type. Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Add extension clCreateBufferFromFdINTEL to create cl buffer by external ↵Chuanbo Weng2015-11-106-3/+93
| | | | | | | | | | | | | | | | | | | | | buffer object's fd. Before this patch, Beignet can only create cl buffer from external bo by its handle using clCreateBufferFromLibvaIntel. Render node is the first choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode. So it's necessary to add this extension to support buffer sharing between different libraries. v2: Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL. v3: Fix rebase conflict: add a parameter when invoke cl_mem_allocate. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* set the pitch of image from buffer to the buffer's pitch.Luo Xionghu2015-11-041-1/+6
| | | | | | | | | | | | if image from buffer, the image's pitch should be same with buffer bo's row pitch. v2: correct style. image from buffer need update both aligned_pitch and aligned_h, while image from user ptr only set aligned_pitch, so just keep them independently. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo Yejun <yejun.guo@intel.com>
* fix regression issue for climage + uesrptrGuo Yejun2015-11-031-1/+2
| | | | | | | | | | userptr requires the exact same memory layout between cpu and gpu, since the current implementation uses the value of row_pitch*h, ignoring the slice_pitch provided by the application. so, enable userptr only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* runtime: dynamically get global memory size and max alloc sizePan Xiuli2015-11-032-4/+20
| | | | | | | | | | | Now device and driver can support bigger memory, we need to abandon our old 2G hard code. We get global memory by considering device limitation, drm driver and kernel support and raw, this will ensure a bigger global memory and a more stable system. We get max mem alloc size from global memory size and the device limition. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* driver: add setup_bti_gen9 for bigger buffer up to 4GPan Xiuli2015-11-031-2/+39
| | | | | | | | Now gen9 can support bigger buffer size, and it can also support 4G global memory. We add new function to support it. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* runtime: refine the cl_device_id to support bigger memoryPan Xiuli2015-11-036-11/+73
| | | | | | | | | Now gen8 and gen9 support 4G global memory, and gen9 support 4G single buffer. Need to move the global_mem and max_mem_alloc size into each define header. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* drivers: change the buf size to size_tPan Xiuli2015-11-032-9/+12
| | | | | | | | | The uint32_t size is not enough for coming bigger gpu memory, now GEN9 support 4G buffer. Also add assertion for invalid size. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
* Runtime: Refine ext enable function for platform.Junyan He2015-10-273-19/+54
| | | | | | | | | | | | | We enable fp64 extension just on BDW platform. The platforms before Gen7 will not have fp64 support. We will enable fp64 on gen8 later platforms after this feature is stable. V3: Unify the extersion setting for FP16 and FP64. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: add CL_DEVICE_SPIR_VERSIONS to clGetDeviceInfo.Yang Rong2015-10-213-0/+4
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
* use table to define and query binary headers.Luo Xionghu2015-10-212-26/+39
| | | | | | | | | | | | currently, we support create program from 4 types of binary: SPIR(BITCODE), LLVM Compiled Object, LLVM Library and GEN Binary. The detailed formats are listed in code. also use table to match or fill gen binary header in backend. v2: use enum to replace the magic number. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
* add conditions of pitch and h to enable userptr for climage_use_host_ptrGuo Yejun2015-10-201-1/+4
| | | | | | | | | | | this regression issue is reported from conformance test, to enable userptr for climage + use_host_ptr, the memory layout between the host_ptr (for CPU) and drm bo (for GPU) must be the same. it means bo's row pitch should be the same as image's row pitch, and h should be the same as aligned h. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* pitchalignment should be set to 1.Luo Xionghu2015-10-201-1/+1
| | | | | | | fix case precesion fail: opencv_test_video/OCL_Video/PyrLKOpticalFlow. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
* Revert "return 32 could gain 0.2% performance on opencv optical flow case."Luo Xionghu2015-10-201-1/+1
| | | | | | | | This reverts commit 729b16fdb387437f97115e938745ab1135151553. ./opencv_test_imgproc --gtest_filter=OCL_Imgproc/CLAHETest.* failed due to this patch. Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
* alignment of NO TILING surface limitation shouldn't be removed.Luo Xionghu2015-10-201-3/+4
| | | | | | | it could cause sampler data mismatch on IVB. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
* enable USE_HOST_PTR for cl image with userptr to avoid extra copyingGuo Yejun2015-10-143-18/+44
| | | | | | | | the pointer must be 64 byte aligned, and only when w,h equals to its aligned value, otherwise, roll back to the old method with extra copying. Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* refine code to separate the usage of data and image2d_from_bufferGuo Yejun2015-10-143-26/+38
| | | | | | | | | | | currently, 'void* data' has two meanings: the pointer from application, and the buffer to create image2d from. It makes the code a little complex when supporting userptr for image. So, add a new function parameter to separate the two meanings. V2: fix when HAS_USERPTR is not enabled Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: xionghu.luo@intel.com