[Clang] New loop pragma vectorize_predicate

This adds a new vectorize predication loop hint: #pragma clang loop vectorize_predicate(enable) that can be used to indicate to the vectoriser that all (load/store) instructions should be predicated (masked). This allows, for example, folding of the remainder loop into the main loop. This patch will be followed up with D64916 and D65197. The former is a refactoring in the loopvectorizer and the groundwork to make tail loop folding a more general concept, and in the latter the actual tail loop folding transformation will be implemented. Differential Revision: https://reviews.llvm.org/D64744 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@366989 91177308-0d34-0410-b5e6-96231b3b80d8
author: Sjoerd Meijer <sjoerd.meijer@arm.com> 2019-07-25 07:33:13 +0000
committer: Sjoerd Meijer <sjoerd.meijer@arm.com> 2019-07-25 07:33:13 +0000
commit: dfb6cacf9c29f492202b77d55f9c174a2917b190 (patch)
tree: 8683d6e2bf33282f9b8350da39e51e593d2a7d82 /docs
parent: c7a4550a9dec3c6a09f8214a28dcbeee8032a1c1 (diff)
download: clang-dfb6cacf9c29f492202b77d55f9c174a2917b190.tar.gz
1 files changed, 18 insertions, 3 deletions
diff --git a/docs/LanguageExtensions.rst b/docs/LanguageExtensions.rst
index cb72c459c1..5bd234f2bf 100644
--- a/docs/LanguageExtensions.rst
+++ b/docs/LanguageExtensions.rst
@@ -2946,12 +2946,12 @@ Extensions for loop hint optimizations
 
 The ``#pragma clang loop`` directive is used to specify hints for optimizing the
 subsequent for, while, do-while, or c++11 range-based for loop. The directive
-provides options for vectorization, interleaving, unrolling and
+provides options for vectorization, interleaving, predication, unrolling and
 distribution. Loop hints can be specified before any loop and will be ignored if
 the optimization is not safe to apply.
 
-Vectorization and Interleaving
-------------------------------
+Vectorization, Interleaving, and Predication
+--------------------------------------------
 
 A vectorized loop performs multiple iterations of the original loop
 in parallel using vector instructions. The instruction set of the target
@@ -2994,6 +2994,21 @@ width/count of the set of target architectures supported by your application.
 Specifying a width/count of 1 disables the optimization, and is equivalent to
 ``vectorize(disable)`` or ``interleave(disable)``.
 
+Vector predication is enabled by ``vectorize_predicate(enable)``, for example:
+
+.. code-block:: c++
+
+  #pragma clang loop vectorize(enable)
+  #pragma clang loop vectorize_predicate(enable)
+  for(...) {
+    ...
+  }
+
+This predicates (masks) all instructions in the loop, which allows the scalar
+remainder loop (the tail) to be folded into the main vectorized loop. This
+might be more efficient when vector predication is efficiently supported by the
+target platform.
+
 Loop Unrolling
 --------------
author	Sjoerd Meijer <sjoerd.meijer@arm.com>	2019-07-25 07:33:13 +0000
committer	Sjoerd Meijer <sjoerd.meijer@arm.com>	2019-07-25 07:33:13 +0000
commit	dfb6cacf9c29f492202b77d55f9c174a2917b190 (patch)
tree	8683d6e2bf33282f9b8350da39e51e593d2a7d82 /docs
parent	c7a4550a9dec3c6a09f8214a28dcbeee8032a1c1 (diff)
download	clang-dfb6cacf9c29f492202b77d55f9c174a2917b190.tar.gz