| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Delete event->depend_events when it is no longer needed, to allow
the event objects it refers to to be freed.
This avoids out-of-memory hangs in large dependency trees
(e.g. long iterative calculations):
https://launchpad.net/bugs/1354086
Signed-off-by: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When a event complete, we need to notify all the command_queue
within the same context. But sometime, some command_queue in
the context is already invalid.
Modify to ensure all the command_queue to be notified are
valid.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
|
|
|
|
|
|
|
| |
Also remove the useless function cl_context_add_svm.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Ruiling Song <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modify the event exec function, make it as the uniformal entry
for all event command execution. This will help the timestamp
record and profiling feature a lot.
V2:
1. Set event init state to bigger than CL_QUEUED.
Event state should be set to CL_QUEUED exactly when it is to be queued.
Profiling feature make this requirement clearer. We need to record the
timestamp exactly when it it to be queued. So we need to add a additional
state beyond CL_QUEUED.
2. Fix cl_event_update_timestamp_gen bugi, the CL_SUMITTED time may be less.
GPU may record the timestamp of CL_RUNNING before CPU record timestamp of
CL_SUMITTED. It is a async process and it is hard for us to control.
According to SPEC, we need to record timestamp after some state is done.
We can just now set CL_SUMITTED to CL_RUNNING timestamp if the CL_SUBMITTED
timestamp is the bigger one.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TODO:
In opencl 2.0, a new profiling item called CL_PROFILING_COMMAND_COMPLETE
is imported. It means that we need to record the time stamp of all the
child events created by the "Kernel enqueing kernels" feature finish.
This should be done after the "Kernel enqueing kernels" feature enabled.
V2:
Update event time stamp before inserting to queue thread, avoid MT issue.
V3:
Fixup overflow problem.
V4:
Fixup overflow to 0xfffffffffffffffff problem.
Just take ownership and release event lock when call the update timestamp
function. The update timestamp function may have block system call can
should not hold the lock to call it.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
Make the list related functions more clear and readable.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
V2:
Move the event list status check to clWaitForEvents API.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewrite the cl_event, and modify all the event functions
using this new event manner. Event will co-operate with
command queue's thread together.
v2:
Fix a logic problem in event create failed.
V3:
Set enqueue default to do nothing, handle some enqueue has nothing
to do.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
We use context's lock when we add and delete cl objects.
Every cl object should use it's own lock to protect itself.
We also add some helper functions to ease the adding and
removing operations.
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Junyan He <junyan.he@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
| |
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
| |
this link fail appears on gcc 5.2.1.
Signed-off-by: Luo Xionghu <xionghu.luo@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
| |
Return CL_INVALID_CONTEXT if the context associated with
command_queue and events in event_wait_list are not the same.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Luo Xionghu <xionghu.luo@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refine the event struct to make last_event become a list to store
all uncompeleted events and update them every queue flush. This can
make sure all events created in the runtime have a chance to update
status and run callback functions and then be deleted. We will also
fix the memory leak problem casued by uncompeted events.
This is a bugfix for https://bugs.freedesktop.org/show_bug.cgi?id=91710
The leaked events with gpu buffers will be unreferenced and cause other
drm buffer leak and result in terrible memory leak.
Signed-off-by: Pan Xiuli <xiuli.pan@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Fix to calculate the current cpu monotonic raw timestamp in nanoseconds
for enqueued,submitted,start and finshed and send this to application
based on the parameter queries.
Signed-off-by: Midhun Kodiyath <midhunchandra.kodiyath@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
When the event parament is not NULL, the event will point to a new event, so
need to check address of the event and the wait events.
V2: check the address of the event and the wait events.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
| |
last_event and current_event should be thread private data.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
Beignet uses drm_intel_gem_bo_context_exec() to flush command queue to
linux drm driver layer. We need to check the return value of that function,
as it may fail when the application uses very large array.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CL_COMPLETE + thread safety for callbacks
When trying to register a callback on the clEnqueueReadBuffer command, since it is processed
synchroniously all the time, the command was marked CL_COMPLETE every time. If the event returned
by clEnqueueReadBuffer was then used to register a callback function, the callback function did
no check to execute it if nessary.
Modified the handling of the callback registration in cl_set_event_callback to only call the callback being created if it's status is already reached.
Added thread safety measures for pfn_notify calls since the status value can be changed while executing the callback.
Grouped the pfn_notify calls to a unified function cl_event_call_callback that handles thread safety: it queues callbacks in a node list while under the protection of pthread_mutex and then calls the callbacks outside of the pthread_mutex (this is required because the callback can deadlock if it calls a cl_api function that uses the mutex)
Signed-off-by: David Couturier <david.couturier@polymtl.ca>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
If call cl_event_delete before call back, then event will be deleted if
application release event in the call back. So must move the cl_event_delete at the last.
V2: V1 will not delete event if not user event, also need delete it.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
To make the license statement consistent to each other, adjust
all license versions to v2.1+. Thus beignet should have a pure
LGPL v2.1+ license.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
1.fix data structure redefine warnings.
2.fix 'data' with variable sized type 'union<*>' not at the end of a class warning(in immediate.hpp).
3.fix implicitly conversion warning.
4.fix explicitly assigning a variable type warning.
5.fix comparison of unsigned expression < 0 is always false warning(in cl_api.c).
Signed-off-by: Lv Meng <meng.lv@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fix the following two bugs in event handling.
1. When it's time to call a event's user call back function, we need to
set the executed to true before the call. As that call back function
may call into clReleaseEvent(), and if we don't set the executed status
to true, it will enter infinite recursive loop.
2. After the user call clEnqueueNDRangeKernel to get a valid event, the
user set a call back function to that event, and in that call back
function, it will release that event. This scenario is totally correct.
But our current event handling doesn't have a deadicated timer thread to
update those on-the-fly events' status. Thus those events will not have
a chance to get updated, and those call back function will not executed
forever. To introduce a complete timer style thread to maintain this type
of events is too heavy for this fix release. This patch choose an easy
way to work around it. It will make sure the last gpgpu event to be finished
before current task to be enqueued.
After this patch, most of the OpenCV 3.0 cases could run smoothly without
any serious issue.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pending a command queue, we need to record the whole gpgpu
structure not just the batch buffer. For the following reason:
1. We need to keep those private buffer, for example those printf buffers.
2. We need to make sure this gpgpu will not be reused by other enqueuement.
v2:
Don't try to flush all user event attached to the queue.
Just need to flush the current event when doing command queue flush.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
| |
1. Do not add user event to cb->wait_list to avoid ref this user event twice.
2. Add assert when update status.
3. Set the queue's last wait event and barrier event to NULL when remove last event.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
| |
Move the event add ref to function cl_event_new_enqueue_callback for clear.
Also need add the wait user events' ref count.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: "Luo, Xionghu" <xionghu.luo@intel.com>
|
|
|
|
|
|
|
|
| |
If event status is an Error code, the status of events wait on this event also should set to Error code.
V2: should not execute the enqueue command wait on the event whose status is error.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
| |
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
Event's status should be CL_COMPLETE if all wait events are complete in the wait list, in function
clEnqueueBarrierWithWaitList and clEnqueueMarkerWithWaitList.
v2: revert delete the event change in v1.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This command blocks command execution, that is, any following commands
enqueued after it do not execute until it completes;
API clEnqueueMarkerWithWaitList patch didn't push the latest, update in
this patch.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Signed-off-by: Luo <xionghu.luo@intel.com>
Conflicts:
src/cl_event.c
|
|
|
|
| |
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use the floatn's assigment to do the copy.
128 pattern size is according to double16, and because
the double problem on our platform, we use to float16
to handle this.
unaligned cases is not optimized now, just use the char
assigment.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
| |
The GPU timestamp should be lower 36 bit on HASWELL
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
|
|
|
|
|
|
|
|
| |
1. remove repeated user events in list.
2. missed braces in loops.
3. fix barrier event reference not incresed.
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
|
|
|
|
|
|
| |
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>
Reviewed-by: "Song, Ruiling" <ruiling.song@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The profiling feature is now all supported. We use
drm_intel_reg_read to get the current time of GPU
when the event is queued and submitted, and use
PIPI_CONTROL cmd to get the executing time of the
GPU for kernel start and end.
One trivial problem is that:
The GPU timer counter is 36 bits with resolution of
80ns, so 2^36*80 = 5500s, about half an hour.
Some test may last about 2~5 min and if it starts at
about half an hour, this may cause a wrap back problem
and cause the case fail.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We find some cases will use multi-threads to run on the same queue,
executing the same kernel. This will cause the gpgpu struct which
is very important for GPU context setting be destroyed because we
do not implement any sync protect on it now.
Move the gpgpu struct into thread specific space will fix this problem
because the lib_drm will do the GPU command serialization for us.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: "Zou, Nanhai" <nanhai.zou@intel.com>
|
|
|
|
|
|
|
|
|
| |
ctx->events points to the head of 'event list' under the ctx.
When deleting an event from the list, we should also update
the head pointer besides updating its neighbour's next & prev,
Signed-off-by: Ruiling Song <ruiling.song@intel.com>
Reviewedy-by: "Xing, Homer" <homer.xing@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use PIPE_CONTROL to get the time stamps from GPU just after batch
start and before batch flush. Using the first one the caculate the
CL_PROFILING_COMMAND_START time and uing the second one to caculate
the CL_PROFILING_COMMAND_END time.
There are 2 limitations here:
1. Then end time stamp is just before the FLUSH, so the Flush time
is not included, which will cause to lose the accuracy. Because
the we do not know which event will be used to do the profling
when it is created, adding another flush for end time stamp may
add some overload.
2. The time of CPU and GPU can not be sync correctly now. So the
time of CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT
which happens on CPU side can not be caculated correctly with the
same base time of GPU. So we just simplely set them to
CL_PROFILING_COMMAND_START now. For the Event not involving GPU
operations such as ReadBuffer, all the times are 0 now.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-by: Yang Rong <rong.r.yang@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add some event info to cl_command_queue.
One is non-complete user events, used to block marker event and barrier.
After these events become CL_COMPLETE, the events blocked by these events also
become CL_COMPLETE, so marker event will also set to CL_COMPLETE. If there is no
user events, need wait last event complete and set marker event to complete.
Add barrier_index, for clEnqueueBarrier, point to user events, indicate the enqueue
apis follow clEnqueueBarrier should wait on how many user events.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
| |
In function cl_event_set_status, between pthread_mutex_lock and pthread_mutex_unlock
will call cl_event_delete, which also require the same lock, cause deak lock.
Unlock it before call cl_event_delete.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now use the defer execute to wait events.
If there is no user event waited, then using wait rendering to wait
GPU event complete and call the enqueue api immediately.
If there is the user events waited, then should prepare the the enqueue
data, and resume the enqueue when all user events that waited complete.
The achieve these, add the enqueue callback to user event, and add the all
user event and other wait event list to enqueue callback. When set user event
to complete, check all enqueue callbacks wait this event.
Now, clEnqueueMark/clEnqueueBarrier still not impletement, and clEnqueueMapBuffer
/clEnqueueMapImage is not consistency with spec.
Signed-off-by: Yang Rong <rong.r.yang@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
|
|
|
|
|
| |
Cleaned up some warnings
Forced pedantic option removal in the Makefile
|
|
|