| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use the variable NOT_BUILD_STAND_ALONE_UTEST to control the build type:
for beignet build, set it to 1; for stand alone build, do NOT need set
it.
remove all clXXXIntel extension call and such kind of tests since we
intend to provide the unit test independently for viariant OpenCL
implementation; replace the clMapBufferIntel/clMapBufferGTTIntel
with clEnqueueMapBuffer/clEnqueueMapImage; link the utest binary to
libOpenCL to follow the icd standard; remove the useless env in
setenv.sh since we need make install the package after build.
v2:
fix the indent error;
use function pointer for extesion case like vme and libva since we link to libOpenCL;
v3: builtin_kernel_block_motion_estimate_intel released kernel twice;
v4:
find OpenCL library for standalone utest and link to libcl for not
standalone utest;
check default variables in setenv.sh whether empty before use.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
Contributor: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
|
|
|
|
|
| |
Add skylakd workstation device and desktop GT4
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The space to hold the output log from the compiler is currently
preallocated. The previous size (1000 chars) is too small when many
warnings and/or errors are present.
Enlarge the buffer to 1024*1024 chars, in order to allow up to 1024
errors, each taking 1024 bytes to report. This is still a rather
arbitrary choice, but should hopefully fit a wider range of cases.
(This fixes an issue reported on #opencl)
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
for the case without {}:
if (...)
DEBUGP(...)
the second printf of DEBUGP is always enabled, fixed it.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: He Junyan <junyan.he@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix bug at https://bugs.freedesktop.org/show_bug.cgi?id=93469
The fucntion is mapped to OP_SIMD_SIZE which returns the constant
SIMD width, the correct function name is get_max_sub_group_size.
contributor: Georg Kolling <georg.kolling@gmail.com>
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
From linux 4.3, kernel redefined the mocs table's value,
But before 4.3, still used the hw defautl value.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
sometimes, application invokes read buffer, instead of map buffer,
even if userptr is enabled. memcpy is not necessary for such case.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
If the user provides local_work_size as NULL in clEnqueueNDRangeKernel,
and we could not find a good value inside driver, output a warning
message with macro DEBUGP, and also refine the macro.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
The first patch 192feb51 has something wrong in rebase and takes new
bug in. Now fix both the original bug and revert the wrong
patch.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
it by default.
Add a cmake option for it, cmake with option -DEXPERIMENTAL_DOUBLE=true to enable it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
| |
current code handles this option at clang level, actually, it is
also necessary at LLVM -> GEN stage.
V2: check if options is NULL
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
special versions of linux kernel and libdrm are needed.
utest and conformance test PASSED.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
|
|
|
|
|
|
| |
It will output debug message under debug mode, and will
not output under release mode.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Reported to fix fix a ~50% performance regression (in OpenCV 3.0 and
Luxmark 2.1 among others) with v4.3 kernels on Gen9 hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92975
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
this link fail appears on gcc 5.2.1.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
clLinkProgram need check the existence of "-cl-kernel-arg-info"
build_option of all the input_programs. User may link two SPIR
program and call clGetKernelArgInfo to query kernel args.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
forgot to add FROM_LLVM_SPIR in compileProgram; the BINARY_TYPE is
BINARY_TYPE_INTERMIDIATE if create from SPIR binary.
v2: refine the source_type logic: source_type is already set in
clCreateProgramWithSource or clCreateProgramWithBinary, shouldn't be set
in clBuildProgram or clCompileProgram.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
v2: move the memset inside the pointer check.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CMRT is C for Media Runtime on Intel GPU, see https://github.com/01org/cmrt.
There is a request to make Beignet as intermedia layer of CMRT, in
other words, application programer write OpenCL APIs to execute the
CM kernel on GPU, the following shows the key code, and please refer
to the next patch of unit test for detail.
prog = clCreateProgramWithBinary("cm kernel");
clBuildProgram(prog);
kernel = clCreateKernel(prog, "kernel name");
image = clCreateImage();
clSetKernelArg(kernel, image);
clEnqueueNDRangeKernel(kernel);
Inside Beignet, once cm kernel is invoked, the following relative APIs
will be directly passed to CMRT library (libcmrt.so) which is loaded
via dlopen only when necessary. Since we use this simple method to
keep the code clean, OpenCL spec is not strictly followed, and cl_event
is not supported for this case.
v2: add comments about the cm queue in fuction cmrt_enqueue
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
per spec, if create image from USE_HOST_PTR buffer, the buffer's base
address need be aligned.
v2: return error code CL_VALID_IMAGE_FORMAT_DESCRIPTOR.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD_WIDTH.
It makes sense to set CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the
corresponding SIMD size. Then it provides a way for intel's OCL application
to get SIMD width at runtime and make some SIMD width dependant optimization
possible.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
| |
Return CL_INVALID_CONTEXT if the context associated with
command_queue and events in event_wait_list are not the same.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
The clock_gettime will cause the linkage error on some
version of GCC, we need to add -lrt at the end of the
link command line.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The following items are supported in this commit:
1. Return residuals.
2. All types of mb_block_type, subpixel_mode, sad_adjust_mode in
cl_motion_estimation_desc_intel.
After this commit, cl_intel_motion_estimation is fully supported.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
define a MACRO to hold the value.
v2: use same MACRO in cl_extensions.h; add header file protection for
cl_extension.h.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
1. Just upload the first vme_state.
2. Remove duplicated code in check_opt1_extension.
3. Check image format before cl_gpgpu_bind_image_for_vme.
4. Fix error of getting mv. Because we suppose this kernel run in SIMD16
mode, so dword 0 of grf 1 should be
__gen_ocl_region(8,vme_result.s0), not
__gen_ocl_region(0,vme_result.s1).
v3:
Return CL_IMAGE_FORMAT_NOT_SUPPORTED if image format is not the required
one.
v4:
Fix two conflicts after code rebase and wordaround a curbe related bug.
v6:
Treat simd8 and simd16 differently when getting mv.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this patch, Beignet can only create cl image from external bo by
its handle using clCreateImageFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions: clCreateBufferFromFdINTEL
and clCreateImageFromFdINTEL.
v3:
Set depth of _cl_mem_image to 0 because it's CL_MEM_OBJECT_IMAGE2D type.
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
buffer object's fd.
Before this patch, Beignet can only create cl buffer from external bo by
its handle using clCreateBufferFromLibvaIntel. Render node is the first
choice of accessing gpu in currect Beignet implementation. DRM_IOCTL_GEM_OPEN
is used by clCreateBufferFromLibvaIntel but forbidden in Render node mode.
So it's necessary to add this extension to support buffer sharing between
different libraries.
v2:
Seperate clCreateMemObjectFromFdIntel into two extensions:
clCreateBufferFromFdINTEL and clCreateImageFromFdINTEL.
v3:
Fix rebase conflict: add a parameter when invoke cl_mem_allocate.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
if image from buffer, the image's pitch should be same with
buffer bo's row pitch.
v2: correct style. image from buffer need update both aligned_pitch and
aligned_h, while image from user ptr only set aligned_pitch, so just
keep them independently.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
userptr requires the exact same memory layout between cpu and gpu,
since the current implementation uses the value of row_pitch*h, ignoring
the slice_pitch provided by the application. so, enable userptr
only if slice_pitch == row_pitch*h for image3d, 2darray and 1darray.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Now device and driver can support bigger memory, we need to abandon
our old 2G hard code. We get global memory by considering device
limitation, drm driver and kernel support and raw, this will ensure
a bigger global memory and a more stable system. We get max mem alloc
size from global memory size and the device limition.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
| |
Now gen9 can support bigger buffer size, and it can also support
4G global memory. We add new function to support it.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
| |
Now gen8 and gen9 support 4G global memory, and gen9 support
4G single buffer. Need to move the global_mem and max_mem_alloc
size into each define header.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
| |
The uint32_t size is not enough for coming bigger
gpu memory, now GEN9 support 4G buffer. Also add
assertion for invalid size.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We enable fp64 extension just on BDW platform. The
platforms before Gen7 will not have fp64 support.
We will enable fp64 on gen8 later platforms after
this feature is stable.
V3:
Unify the extersion setting for FP16 and FP64.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
currently, we support create program from 4 types of binary: SPIR(BITCODE),
LLVM Compiled Object, LLVM Library and GEN Binary. The detailed formats are
listed in code.
also use table to match or fill gen binary header in backend.
v2: use enum to replace the magic number.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
this regression issue is reported from conformance test, to enable
userptr for climage + use_host_ptr, the memory layout between the
host_ptr (for CPU) and drm bo (for GPU) must be the same. it means
bo's row pitch should be the same as image's row pitch, and h should
be the same as aligned h.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
fix case precesion fail: opencv_test_video/OCL_Video/PyrLKOpticalFlow.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
| |
This reverts commit 729b16fdb387437f97115e938745ab1135151553.
./opencv_test_imgproc --gtest_filter=OCL_Imgproc/CLAHETest.* failed due
to this patch.
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
| |
it could cause sampler data mismatch on IVB.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Guo, Yejun <yejun.guo@intel.com>
|
|
|
|
|
|
|
|
| |
the pointer must be 64 byte aligned, and only when w,h equals to its
aligned value, otherwise, roll back to the old method with extra copying.
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
currently, 'void* data' has two meanings: the pointer from application,
and the buffer to create image2d from. It makes the code a little
complex when supporting userptr for image. So, add a new function
parameter to separate the two meanings.
V2: fix when HAS_USERPTR is not enabled
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: xionghu.luo@intel.com
|