From eb843940ec938083ad4481ad3b28ac0d9ca1f5be Mon Sep 17 00:00:00 2001 From: Zhigang Gong Date: Mon, 15 Sep 2014 14:45:24 +0800 Subject: Remove out-of-date document. Signed-off-by: Zhigang Gong --- docs/Beignet/Backend.mdwn | 2 +- docs/Beignet/Backend/flat_address_space.mdwn | 98 -------------------------- docs/Beignet/Backend/mixed_buffer_pointer.mdwn | 46 ++++++++++++ 3 files changed, 47 insertions(+), 99 deletions(-) delete mode 100644 docs/Beignet/Backend/flat_address_space.mdwn create mode 100644 docs/Beignet/Backend/mixed_buffer_pointer.mdwn diff --git a/docs/Beignet/Backend.mdwn b/docs/Beignet/Backend.mdwn index be6081b2..319ce817 100644 --- a/docs/Beignet/Backend.mdwn +++ b/docs/Beignet/Backend.mdwn @@ -88,7 +88,7 @@ Implementation details Several key decisions may use the hardware in an usual way. See the following documents for the technical details about the compiler implementation: -- [[Flat address space|flat_address_space]] +- [[Mixed buffer pointer)|mixed_buffer_pointer]] - [[Unstructured branches|unstructured_branches]] - [[Scalar intermediate representation|gen_ir]] - [[Clean backend implementation|compiler_backend]] diff --git a/docs/Beignet/Backend/flat_address_space.mdwn b/docs/Beignet/Backend/flat_address_space.mdwn deleted file mode 100644 index 3018a295..00000000 --- a/docs/Beignet/Backend/flat_address_space.mdwn +++ /dev/null @@ -1,98 +0,0 @@ -Flat Address Space -================== - -Segmented address space... --------------------------- - -The first challenge with OpenCL is its very liberal use of pointers. The memory -is segment into several address spaces: - -- private. This is the memory for each work item - -- global. These are buffers in memory shared by all work items and work groups - -- constant. These are constant buffers in memory shared by all work items and -work groups as well - -- local. These is a memory shared by all work items in the *same* work group - -... But with no restriction inside each address space ------------------------------------------------------ - -The challenge is that there is no restriction in OpenCL inside each address -space i.e. the full C semantic applies in particular regarding pointer -arithmetic. - -Therefore the following code is valid: - - -\_\_kernel void example(\_\_global int *dst, \_\_global int *src0, \_\_global int *src1)
-{
-  \_\_global int *from;
-  if (get\_global\_id(0) % 2)
-    from = src0;
-  else
-    from = src1;
-  dst[get\_global\_id(0)] = from[get\_global\_id(0)];
-} -
- -As one may see, the load done in the last line actually mixes pointers from both -source src0 and src1. This typically makes the use of binding table indices -pretty hard. In we use binding table 0 for dst, 1 for src0 and 2 for src1 (for -example), we are not able to express the load in the last line with one send -only. - -No support for stateless in required messages ---------------------------------------------- - -Furthermore, in IVB, we are going four types of messages to implement the loads -and the stores - -- Byte scattered reads. They are used to read bytes/shorts/integers that are not -aligned on 4 bytes. This is a gather message i.e. the user provides up to 16 -addresses - -- Byte scattered writes. They are used to write bytes/shorts/integers that are not -aligned on 4 bytes. This is a scatter message i.e. the user provides up to 16 -addresses - -- Untyped reads. They allow to read from 1 to 4 double words (i.e 4 bytes) per -lane. This is also a gather message i.e. up to 16 address are provided per -message. - -- Untyped writes. They are the counter part of the untyped reads - -Problem is that IVB does not support stateless accesses for these messages. So -surfaces are required. Secondly, stateless messages are not that interesting -since all of them require a header which is still slow to assemble. - -Implemented solution --------------------- - -The solution is actually quite simple. Even with no stateless support, it is -actually possible to simulate it with a surface. As one may see in the run-time -code in `intel/intel_gpgpu.c`, we simply create a surface: - -- 2GB big - -- Which starts at offset 0 - -Surprisingly, this surface can actually map the complete GTT address space which -is 2GB big. One may look at `flat_address_space` unit test in the run-time code -that creates and copies buffers in such a way that the complete GTT address -space is traversed. - -This solution brings a pretty simple implementation in the compiler side. -Basically, there is nothing to do when translating from LLVM to Gen ISA. A -pointer to `__global` or `__constant` memory is simply a 32 bits offset in that -surface. - -Related problems ----------------- - -There is one drawback for this approach. Since we use a 2GB surface that maps -the complete GTT space, there is no protection at all. Each write can therefore -potentially modify any buffer including the command buffer, the frame buffer or -the kernel code. There is *no* protection at all in the hardware to prevent -that. diff --git a/docs/Beignet/Backend/mixed_buffer_pointer.mdwn b/docs/Beignet/Backend/mixed_buffer_pointer.mdwn new file mode 100644 index 00000000..f43ab7ed --- /dev/null +++ b/docs/Beignet/Backend/mixed_buffer_pointer.mdwn @@ -0,0 +1,46 @@ +Mixed Buffer Pointer +-------------------- + +Segmented address space... +-------------------------- + +The first challenge with OpenCL is its very liberal use of pointers. The memory +is segment into several address spaces: + +- private. This is the memory for each work item + +- global. These are buffers in memory shared by all work items and work groups + +- constant. These are constant buffers in memory shared by all work items and +work groups as well + +- local. These is a memory shared by all work items in the *same* work group + +... But with no restriction inside each address space +----------------------------------------------------- + +The challenge is that there is no restriction in OpenCL inside each address +space i.e. the full C semantic applies in particular regarding pointer +arithmetic. + +Therefore the following code is valid: + + +\_\_kernel void example(\_\_global int *dst, \_\_global int *src0, \_\_global int *src1)
+{
+  \_\_global int *from;
+  if (get\_global\_id(0) % 2)
+    from = src0;
+  else
+    from = src1;
+  dst[get\_global\_id(0)] = from[get\_global\_id(0)];
+} +
+ +As one may see, the load done in the last line actually mixes pointers from both +source src0 and src1. This typically makes the use of binding table indices +pretty hard. In we use binding table 0 for dst, 1 for src0 and 2 for src1 (for +example), we are not able to express the load in the last line with one send +only. The pointer "from" in the last line is so called a mixed buffer pointer. + +(To be updated) -- cgit v1.2.1