summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Allow creating out-of-order queues with clCreateCommandQueueHEADmasterRebecca N. Palmer2018-08-201-29/+5
| | | | | | | | clCreateCommandQueueWithProperties can already create them, but that's a 2.0 function. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Make in-order command queues actually be in-orderRebecca N. Palmer2018-08-206-34/+71
| | | | | | | | | When beignet added out-of-order execution support (7fd45f15), it made *all* command queues out-of-order, even if they were created as (and are reported by clGetCommandQueueInfo as) in-order. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Don't leak memory on long chains of eventsRebecca N. Palmer2018-08-202-8/+25
| | | | | | | | | | | | Delete event->depend_events when it is no longer needed, to allow the event objects it refers to to be freed. This avoids out-of-memory hangs in large dependency trees (e.g. long iterative calculations): https://launchpad.net/bugs/1354086 Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Enable Coffee Lake supportMark Thompson2018-02-052-3/+148
| | | | | | | | | Little change is needed here because the graphics core is the same as Kaby Lake. Includes all PCI IDs currently supported by the kernel driver in the drm-intel tree (Coffee Lake S, H and U devices in GT 1, 2 and 3 configurations). Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Fix enabling of fp64 extensionMark Thompson2018-02-051-8/+8
| | | | | | | | This should only be enabled after setting the default extensions, because the default setup overwrites the current extension string rather than adding to it. Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Ensure that DRM device uses the i915 driverMark Thompson2018-02-051-0/+30
| | | | | | | | | This avoids calling random ioctl()s and returning nonsensical errors for unsupported devices. In particular, loading is much cleaner on setups where the driver needs to iterate over multiple devices to find the correct one because the Intel graphics device is not the first DRM device. Signed-off-by: Mark Thompson <sw@jkqxz.net>
* Runtime: Remove X11 dri2 connection failed warning message.Yang Rong2018-01-101-2/+0
| | | | | | | | | This meesage is just for X11, if use wayland, it is not a error, so delete it. If X11 device open failed, they are another warning message below. Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: implement clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects.Yang Rong2017-09-211-0/+150
| | | | | | | | | | | As the application is responsible for synchronizing access to shared objects, before call clEnqueueAcquireGLObjects, GL's use has been finished, so just set the event status. clEnqueueReleaseGLObjects is same. V2: V1 is wrong version, correct it. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix a build warning.Yang, Rong R2017-07-311-5/+6
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix the context ref is not 0 assert when delete.Yang, Rong R2017-07-271-22/+8
| | | | | | | | | | | The CL_ENQUEUE_FILL_BUFFER_ALIGN8_* internal program is the same program, only add the program's ref once, but when delete context, caculate the internal program count, will add them individually. This mismatch will cause the context be free by mistake. New different CL_ENQUEUE_FILL_BUFFER_ALIGN8_* program for clearly. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix a cl_gpgpu_bind_image_for_vme NULL SIGSEGV.Yang, Rong R2017-07-271-1/+2
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Implement extension cl_intel_device_side_avc_motion_estimation.Chuanbo Weng2017-07-126-3/+148
| | | | | | | | | | | | | | | | | | This patch mainly contains: 1. built-in function __gen_ocl_ime implementation. 2. Lots of built-in functions of cl_intel_device_side_avc_motion_estimation are implemented. 3. This extension is required to run in simd16 mode. v2: move the utests to seprate patches one by one; as all the utests has extension function check, no need to put them in stand alone utest; uncomment the self test; fix extension check logic issue, should be && instead of ||. Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com> Signed-off-by: Xionghu Luo <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: remove ctx's useless fileds.Yang, Rong R2017-07-103-43/+5
| | | | | | | built_in_prgs and built_in_kernels seems useless, remove them. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: fix a recurrent release context error.Yang, Rong R2017-07-101-10/+8
| | | | | | | | | Before release internal resources, must set them to null, otherwize, when delete these resources, will call release context again. The ctx->built_in_prgs should be release by application. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* Runtime: refine max group size for SKL & KBLrander2017-07-041-9/+9
| | | | | | | | | Now change max group size to 256. it is a reasonable size for Gen9. According to performance test, 256 make good progress in openCV and no regression. So change it Signed-off-by: rander.wang <rander.wang@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: clean llvm module's clone and release.Yang, Rong R2017-06-233-1/+7
| | | | | | | | | | | | | | | There are some changes: 1. Clone the module before call LLVMLinkModules2, remove other clones for it. 2. Don't delete module in function llvmToGen. 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM and buildFromLLVMModule only handle llvm module. Actually, programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel, and I think it is useless, maybe we could delete it at all. V2: define errDiag beside #if/#endif. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Add missed kernel names into built-in kernel list.Yan Wang2017-06-221-1/+16
| | | | | Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Add missing SKL deivce IDPan Xiuli2017-06-222-1/+9
| | | | | | | It seems we missed some newly added device ID for SKL. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix context leak with internal kernelsPatrick Beaulieu2017-06-161-1/+21
| | | | | | | Account for internal program ctx references in cl_context_delete Signed-off-by: Patrick Beaulieu <patrick.beaulieu@avigilon.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Add new API enums for cl_intel_required_subgroup_size extensionPan Xiuli2017-06-165-0/+40
| | | | | | | | | | | Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for clGetKernelSubGroupInfo. We only have this extension for LLVM 40+ for frontend support. V2: Add opencl-c define Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.Yan Wang2017-06-147-37/+149
| | | | | | | It is similar with 2D image for avoiding extended image width truncated. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Optimize clEnqueueWriteImageByKernel and clEnqueuReadImageByKernel.Yan Wang2017-06-131-7/+18
| | | | | | | | 1. Only copy the data by origin and region defined. 2. Add clFinish to guarantee the kernel copying is finished when blocking writing. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix bug of clEnqueueUnmapMemObjectForKernel and clEnqueueMapImageByKernel.Yan Wang2017-06-131-34/+113
| | | | | | | | | | | 1. Support wrrting data by mapping/unmapping mode. 2. Add mapping record logic. 3. Add clFinish to guarantee the kernel copying is finished. 4. Fix the error of calling clEnqueueMapImageByKernel. blocking_map and map_flags need be switched. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add clFinish for guarantee the kernel copying is finished when create TILE_Y ↵Yan Wang2017-06-131-0/+7
| | | | | | | large image. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add cl_mem_record_map_mem_for_kernel() for record map adress for TILE_Y ↵Yan Wang2017-06-132-26/+88
| | | | | | | image by kernel copying. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: Fix a mssing llvm version marco for LLVM40+Pan Xiuli2017-06-091-1/+1
| | | | | | | Found a missing macro that need change to support LLVM40+. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.Yan Wang2017-05-255-28/+89
| | | | | | | | | | | | | | "imagedim_non_pow_2" cases of basic modudle of confrmance shows regression after use TILE_Y mode for large image by previous patch. This bug comes from the non-align16 kernel of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer. It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying. So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum width (16 x 1024 = 16384) of GEN surface state structure which only has 14 bits. So use align4 copy kernel to avoid this bug. Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
* build: fix cmake code generation dependencies.Ismo Puustinen2017-05-251-2/+2
| | | | | | | | There is a race condition between building .bc and header files and generating code from .cl targets. Fix the race by adding the dependency to generated files. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com>
* Implement TILE_Y large image in clEnqueueWriteImage.Yan Wang2017-05-181-0/+46
| | | | | | | | It will fail to copy data from host ptr to TILE_Y large image by memcpy. Use clEnqueueCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement TILE_Y large image in clEnqueueReadImage.Yan Wang2017-05-181-0/+55
| | | | | | | | It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Implement TILE_Y large image in clEnqueueMapImage and clEnqueueUnmapMemObject.Yan Wang2017-05-181-0/+111
| | | | | | | | It will fail to copy data from TILE_Y large image to buffer by memcpy. Use clEnqueueCopyImageToBuffer to do this. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Create image with TILE_Y mode still when image size>128MB for performance.Yan Wang2017-05-184-6/+111
| | | | | | | | It may failed to copy data from host ptr to TILE_Y large image. So use clCopyBufferToImage to do this on GPU side. Signed-off-by: Yan Wang <yan.wang@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GLK: add geminilake runtime support.Yang Rong2017-05-152-2/+47
| | | | | | | | Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg function. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* GLK: add Geminilake pciids.Yang Rong2017-05-151-1/+8
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Limit get_program_global_data() calls to OpenCL 2.0Jan Beich2017-03-231-2/+4
| | | | | | | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217635 Signed-off-by: Jan Beich <jbeich@freebsd.org> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* intel: Check that we can reserve the zero-offsetYang Rong2017-03-171-11/+20
| | | | | | | | | | | | | | | | commit ff57cee0519d ("ocl20/runtime: take the first 64KB page table entries") tries to allocate a bo at 0 offset, but failed to take into account that something may already be allocated there that it is not allowed to evict (particularly when not using full-ppgtt separation). Failure to do so causes all execution to subsequentally fail with "drm_intel_gem_bo_context_exec() failed: Device or resource busy" Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98647 Contributor: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Ruiling Song <ruiling.song@intel.com>
* add extension cl_intel_media_block_io READ related functionLuo Xionghu2017-03-131-0/+1
| | | | | | | | v2: add #define intel_media_block_io in libocl; move extension check code to this patch; Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* add extension intel_planar_yuv.Luo Xionghu2017-03-139-24/+217
| | | | | | | | | | | create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w * h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2: add extension define in libocl; fix error check. Signed-off-by: Luo Xionghu <xionghu.luo@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* CMAKE: Refine builtin kernel bin generatorPan Xiuli2017-03-071-7/+7
| | | | | | | | | Move the generated builtin str and bin files into the Cmake build directory to avoid chaos when changing LLVM. V2: Fix a bug that the builtin.cl was not written into build dir. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Runtime: add a warning when load gen binary fail.Yang Rong2017-02-141-0/+1
| | | | | | | | | | | Some applications use program's binary by default, if load the former's gen binary, because the fields of gen binary has changed, and lack of version checking, will lead to clCreateProgramWithBinary fail, may cause applications fail silently. Add a warning to hint user. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Runtime: fix get non support type device bug.Yang Rong2017-02-142-4/+8
| | | | | | | | | Only return support device type (GPU and default) in function cl_get_gt_device. Contributor: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Enable OpenCL 2.0 only where supportedPan Xiuli2017-02-145-8/+18
| | | | | | | | | | | | | | This allows a single beignet binary to both offer 2.0 where available, and still work on older hardware. V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec, and also likely to be faster). V3: Only enable OpenCL 2.0 when llvm version is 39. V4: Only enable OpenCL 2.0 on x64 host. V5: Always return 32 as address bits. Contributor: Rebecca N. Palmer <rebecca_palmer@zoho.com> Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Enable support for two-component 16-bit planesMark Thompson2017-02-141-0/+2
| | | | | | | | This is needed to support the chroma plane of P010 surfaces being mapped from VAAPI. Signed-off-by: Mark Thompson <sw@jkqxz.net> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Free context devices on context releaseGiuseppe Bilotta2017-02-101-0/+1
| | | | | | | | The context owns the array of devices passed to cl_context_new, so it's its duty to free it. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix obvious copy-pasteGiuseppe Bilotta2017-02-101-1/+1
| | | | | | | | | | The conditional was equal to the one before, and would never be hit because internal kernels were reset after release. Instead, since the body is resetting built-in kernels, it appears obvious that the conditional should be on the existence of built-in kernels. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* API: Fix local memory type to CL_LOCALPan Xiuli2017-02-104-4/+4
| | | | | | | | We are using SLM as local memory and we should return CL_LOCAL for CL_DEVICE_LOCAL_MEM_TYPE. Signed-off-by: Pan Xiuli <xiuli.pan@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Typo in error messageGiuseppe Bilotta2017-02-081-1/+1
| | | | | Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Reviewed-by: He Junyan <junyan.he@inbox.com>
* Make CL-GL sharing available via ICDRebecca N. Palmer2017-02-061-12/+16
| | | | | Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com> Reviewed-by: Chuanbo Weng <chuanbo.weng@intel.com>
* Android.mk: update Android.mk for android build.Yang Rong2017-01-191-3/+15
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>
* Add some pointer access check.Yang Rong2017-01-113-1/+5
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Pan Xiuli <xiuli.pan@intel.com>