summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Remove the generated test cases list.Release_v0.9.1Yi Sun2014-07-041-2/+0
| | | | | Signed-off-by: Yi Sun <yi.sun@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Build: check whether lspci exists.Zhigang Gong2014-07-041-0/+7
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* runtime: fix a gpgpu event and thread local gpgpu handling bug.Zhigang Gong2014-07-0313-74/+97
| | | | | | | | | | | | | | | When pending a command queue, we need to record the whole gpgpu structure not just the batch buffer. For the following reason: 1. We need to keep those private buffer, for example those printf buffers. 2. We need to make sure this gpgpu will not be reused by other enqueuement. v2: Don't try to flush all user event attached to the queue. Just need to flush the current event when doing command queue flush. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* runtime: recover the maximum read image args to 128.Zhigang Gong2014-07-031-1/+1
| | | | | | | To comply with the full profile. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Junyan He <junyan.he@linux.intel.com>
* Refine some event code.Yang Rong2014-07-032-8/+19
| | | | | | | | | 1. Do not add user event to cb->wait_list to avoid ref this user event twice. 2. Add assert when update status. 3. Set the queue's last wait event and barrier event to NULL when remove last event. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Check family of spilled register correctly.Ruiling Song2014-07-021-7/+6
| | | | | | | | | We only support DWORD QWORD register-spill currently. So if we cannot spill a register, simply return false instead of give an assert. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Refine the logic when suspend a batch buffer.Yang Rong2014-07-023-3/+14
| | | | | | | Clear the gpgpu's batch buffer when suspend to avoid potential issue. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Fix some event ref count error.Yang Rong2014-07-021-16/+16
| | | | | | | | Move the event add ref to function cl_event_new_enqueue_callback for clear. Also need add the wait user events' ref count. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
* runtime: fix potential curbe allocation issue.Zhigang Gong2014-07-012-15/+31
| | | | | | | | | | | | | | | | | | | | According to spec, different platforms have different curbe allocation restrication. The previous code set the curbe allocated size to 480 statically which is not correct. This patch change to always set the curbe entry num to 64 which is the maximum work group size. And set proper curbe allocation size according to the platform's hard limitation and a relatively reasonable kernel argument usage limitation. v3: when we call load_vte_state, we already know the eaxctly constant urb size used in the current kernel. We could choose a smallest valid curbe size for this kernel. And if the size exceed the hardware limitation, we report it as a warning here. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* runtime: fix max group size calculation issue.Zhigang Gong2014-07-015-33/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the kernel doesn't use slm/barrier, there is no hard limitation for the max group size. And if the max work group size is more than 1024, the original 64 urb entry count will not be sufficient to hold all the curbe payload. Change the entry count to max thread count to fix this potential issue. I found this bug when I tried to run phoronix test suite's juliagpu test case on my MBA. v2: refine the max kernel work group size calculation mechanism. the wg_sz should not be a device's member variable, it should be a variable derived from kernel and device's attriute at runtime. also fix wrong configuration for IVB GT1. v3: Add an important max thread limitation in the GPGPU_WALKER command. For non-Baytrail, the max thread depth * max thread height * max thread width should less than 64 (under either simd16 or simd8), no matter whether SLM/barrier is used. We oversighted that limitation before, thus for a simd8 kernel which use work group size 1024 will exceed this limitation and half of the thread will not be executed at all. Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* add the usage of link program from llvm binary.Luo2014-07-011-3/+38
| | | | | | | | | | user A could compile and link kernel source to llvm binary first, then query the binary to save to file; With the binary, user B can call clCreateProgramWithBinary without compile the source again. this usage could protect those who need to protect the kernel source. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: disable GVN pass when optLevel is zero.Ruiling Song2014-06-301-1/+2
| | | | | | | | GVN pass may generate some i256 data type, which our backend could not handle. So, only enable it when optLevel > 0. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
* Bump to 0.9.1 (development version).Zhigang Gong2014-06-301-1/+1
| | | | | | Bump to development version after 0.9.0. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Bump to version 0.9.Release_v0.9Zhigang Gong2014-06-261-2/+2
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Fix call cl_mem_copy_image_region bug.Yang Rong2014-06-264-16/+25
| | | | | | | | | When call cl_mem_copy_image_region, sometimes need add offset to src or dst address, sometimes need not add. Add two parameter to indicate it. Also fix the wrong offset when clEnqueueMapImage of CL_MEM_USE_HOST_PTR. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* docs: fixup markup format.Zhigang Gong2014-06-261-1/+1
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* docs: fix some markdown links and correct some information.Zhigang Gong2014-06-263-12/+21
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* docs: update some documents.Zhigang Gong2014-06-264-48/+44
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* GBE: fix some bugs in ocl stdlib header files.Zhigang Gong2014-06-261-4/+2
| | | | | | | The printf's prototype was added twice incorrectly. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
* gbe_bin_generator: fix the incorrect type of ↵Zhigang Gong2014-06-263-28/+29
| | | | | | | | | | | | cl_internal_built_in_kernel_str_size. We should define it as size_t. v2: correct some extern definitions in cl_mem.c. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
* Add optimization guide.Yang Rong2014-06-261-0/+28
| | | | | Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* runtime: Remove 'Experiment' from the platform name.Zhigang Gong2014-06-251-1/+1
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* add how to for cross compilerGuo Yejun2014-06-251-0/+60
| | | | | Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Fix clEnqueueMapImage with CL_MEM_USE_HOST_PTR bug.Yang Rong2014-06-253-30/+60
| | | | | | | | Should return host row pitch and host slice pitch. Also should copy back to image when unmap. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* add image_1d_to_1d builtin kernel name.Luo2014-06-251-0/+1
| | | | | Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* utests: fix one bug when create image at one test case.Zhigang Gong2014-06-251-0/+2
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* Implement the %p in the printfJunyan He2014-06-243-5/+22
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add the support for vector type in printf.Junyan He2014-06-244-81/+167
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: Further optimize exp().Ruiling Song2014-06-241-29/+11
| | | | | | | Use native_exp() as much as possible. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* add cpu copy for 1Darray and 2darray related copy APIs.Luo2014-06-245-6/+91
| | | | | | | | | | | | | detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray. 1d used gpu copy. v2: fixed 1d array to 1d array copy, don't need to switch depth and height. Signed-off-by: Luo <xionghu.luo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* add BEIGNET_INSTALL_DIR to clean codeGuo Yejun2014-06-243-17/+17
| | | | | Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* set LD_LIBRARY_PATH of libgbe.so for gbe_bin_generaterGuo Yejun2014-06-241-1/+1
| | | | | | | it is needed for cross compiler Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* implement API clEnqueueFillImage.Luo2014-06-2415-26/+261
| | | | | | | | | | | | enqueues a command to fill an image object with a specified color. fix typo cl_context_get_static_kernel_from_bin. v2: fix image 1d array bug. Signed-off-by: Luo <xionghu.luo@intel.com> Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* fix crash when OCL_STRICT_CONFORMANCE is unsetGuo Yejun2014-06-241-1/+1
| | | | | Signed-off-by: Guo Yejun <yejun.guo@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add the format and flag support for printf.Junyan He2014-06-233-57/+203
| | | | | | | | The format and flag such as -+# and precision request has been added into the output. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* update docs on environment variables.Ruiling Song2014-06-232-2/+48
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: switch to non strict conformance mode by default.Zhigang Gong2014-06-231-1/+1
| | | | Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
* utest_generator.py: add OCL_STRICT_CONFORMANCE enviroment condition.Yi Sun2014-06-232-3/+12
| | | | | | | | For auto-generated math cases, when OCL_STRICT_CONFORMANCE is not set, the expected diff increases to 1000x. Signed-off-by: Yi Sun <yi.sun@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: declare correct prototype for fastpath_rootnRuiling Song2014-06-231-1/+1
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: fix some builtin math functionRuiling Song2014-06-231-3/+3
| | | | | | | | __gen_ocl_exp stands for 2^x. So, use __gen_ocl_pow to implement native_exp(). Fix atanh implementation. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add some OpenCL1.2 parameters of function clGetDeviceInfo.Yang Rong2014-06-233-0/+9
| | | | | | | Include CL_DEVICE_LINKER_AVAILABLE, CL_DEVICE_PRINTF_BUFFER_SIZE, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Fix a CL_MEM_HOST_PTR bug.Yang Rong2014-06-231-2/+6
| | | | | | | Can't add sub_offset if mem is image. Signed-off-by: Yang Rong <rong.r.yang@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: replace OwningPtr with std::unique_ptrRuiling Song2014-06-231-4/+3
| | | | | Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* GBE: improve builtin exp.Ruiling Song2014-06-231-11/+11
| | | | | | | | Put some variables into register. This could improve luxMark sala about 10% under strict conformance. Signed-off-by: Ruiling Song <ruiling.song@intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Add the test cases for 1D Image ArrayJunyan He2014-06-235-0/+183
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
* Update the printf test case.Junyan He2014-06-231-0/+19
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add the support for %s in printfJunyan He2014-06-233-42/+70
| | | | | Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Fix a crash bug when no %d appears in the printf fmtJunyan He2014-06-232-4/+12
| | | | | | | | | If there no %d for all the printf statement, the curbe will ignore the content buffer ptr because no one use it. So when bind the buffer ptr in the run time, crash happens. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* Add %f and %c support for printf.Junyan He2014-06-233-52/+94
| | | | | | | | | Add the %c and %f support for printf. Also add the int to float and int to char conversion. Some minor errors such as wrong index flags have been fixed. Signed-off-by: Junyan He <junyan.he@linux.intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>
* GBE: fix some get kernel arg info bugs.Zhigang Gong2014-06-235-3/+16
| | | | | | | | Still can't handle the sampler_t which is not used actually. Access qualifier seems broken with llvm 3.3. Signed-off-by: Zhigang Gong <zhigang.gong@intel.com> Reviewed-by: Yang Rong <rong.r.yang@intel.com>