| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pending a command queue, we need to record the whole gpgpu
structure not just the batch buffer. For the following reason:
1. We need to keep those private buffer, for example those printf buffers.
2. We need to make sure this gpgpu will not be reused by other enqueuement.
v2:
Don't try to flush all user event attached to the queue.
Just need to flush the current event when doing command queue flush.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
To comply with the full profile.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
1. Do not add user event to cb->wait_list to avoid ref this user event twice.
2. Add assert when update status.
3. Set the queue's last wait event and barrier event to NULL when remove last event.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
We only support DWORD QWORD register-spill currently.
So if we cannot spill a register, simply return false
instead of give an assert.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Clear the gpgpu's batch buffer when suspend to avoid potential issue.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
Move the event add ref to function cl_event_new_enqueue_callback for clear.
Also need add the wait user events' ref count.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to spec, different platforms have different curbe
allocation restrication. The previous code set the curbe
allocated size to 480 statically which is not correct.
This patch change to always set the curbe entry num to 64
which is the maximum work group size. And set proper curbe
allocation size according to the platform's hard limitation
and a relatively reasonable kernel argument usage limitation.
v3:
when we call load_vte_state, we already know the eaxctly constant urb
size used in the current kernel. We could choose a smallest valid curbe
size for this kernel. And if the size exceed the hardware limitation,
we report it as a warning here.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the kernel doesn't use slm/barrier, there is no hard limitation
for the max group size. And if the max work group size is more than
1024, the original 64 urb entry count will not be sufficient to hold
all the curbe payload. Change the entry count to max thread count to
fix this potential issue.
I found this bug when I tried to run phoronix test suite's juliagpu
test case on my MBA.
v2:
refine the max kernel work group size calculation mechanism.
the wg_sz should not be a device's member variable, it should be
a variable derived from kernel and device's attriute at runtime.
also fix wrong configuration for IVB GT1.
v3:
Add an important max thread limitation in the GPGPU_WALKER command.
For non-Baytrail, the max thread depth * max thread height * max thread width
should less than 64 (under either simd16 or simd8), no matter whether
SLM/barrier is used. We oversighted that limitation before, thus for
a simd8 kernel which use work group size 1024 will exceed this limitation
and half of the thread will not be executed at all.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
user A could compile and link kernel source to llvm binary first, then
query the binary to save to file; With the binary, user B can call
clCreateProgramWithBinary without compile the source again.
this usage could protect those who need to protect the kernel source.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
GVN pass may generate some i256 data type, which our backend could not handle.
So, only enable it when optLevel > 0.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
| |
Bump to development version after 0.9.0.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
| |
When call cl_mem_copy_image_region, sometimes need add offset to src or dst address,
sometimes need not add. Add two parameter to indicate it.
Also fix the wrong offset when clEnqueueMapImage of CL_MEM_USE_HOST_PTR.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
| |
The printf's prototype was added twice incorrectly.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
cl_internal_built_in_kernel_str_size.
We should define it as size_t.
v2:
correct some extern definitions in cl_mem.c.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
Should return host row pitch and host slice pitch.
Also should copy back to image when unmap.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Use native_exp() as much as possible.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
detail cases: 1Darray, 2Darray, 2Darrayto2D, 2Darrayto3D, 2Dto2Darray, 3Dto2Darray.
1d used gpu copy.
v2:
fixed 1d array to 1d array copy, don't need to switch depth and height.
Signed-off-by: Luo <xionghu.luo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
| |
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
it is needed for cross compiler
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
enqueues a command to fill an image object with a specified color.
fix typo cl_context_get_static_kernel_from_bin.
v2:
fix image 1d array bug.
Signed-off-by: Luo <xionghu.luo@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Guo Yejun <yejun.guo@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
The format and flag such as -+# and precision request has
been added into the output.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
| |
For auto-generated math cases, when OCL_STRICT_CONFORMANCE is not set,
the expected diff increases to 1000x.
Signed-off-by: Yi Sun <yi.sun@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
__gen_ocl_exp stands for 2^x. So, use __gen_ocl_pow to implement native_exp().
Fix atanh implementation.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Include CL_DEVICE_LINKER_AVAILABLE, CL_DEVICE_PRINTF_BUFFER_SIZE, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
Can't add sub_offset if mem is image.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
Put some variables into register.
This could improve luxMark sala about 10% under strict conformance.
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
If there no %d for all the printf statement, the curbe
will ignore the content buffer ptr because no one use it.
So when bind the buffer ptr in the run time, crash happens.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Add the %c and %f support for printf.
Also add the int to float and int to char conversion.
Some minor errors such as wrong index flags have been fixed.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
Still can't handle the sampler_t which is not used actually.
Access qualifier seems broken with llvm 3.3.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|