[X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics.

Summary: The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics. For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency. For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this. Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics. To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this. So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56998 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@352267 91177308-0d34-0410-b5e6-96231b3b80d8
author: Craig Topper <craig.topper@intel.com> 2019-01-26 02:42:01 +0000
committer: Craig Topper <craig.topper@intel.com> 2019-01-26 02:42:01 +0000
commit: 0ad26a6d8de91e3dc5e322d51d3cdc947e7dbe0e (patch)
tree: c832c0e4d6aceaed5b7f43c9d7e3dcfbc6656649 /test/CodeGen/avx512f-builtins.c
parent: 8a558a52a96fc5411834e6e015219513a9986bdc (diff)
download: clang-0ad26a6d8de91e3dc5e322d51d3cdc947e7dbe0e.tar.gz
1 files changed, 10 insertions, 6 deletions
diff --git a/test/CodeGen/avx512f-builtins.c b/test/CodeGen/avx512f-builtins.c
index 6b041cea71..55bdf4f5fc 100644
--- a/test/CodeGen/avx512f-builtins.c
+++ b/test/CodeGen/avx512f-builtins.c
@@ -5022,42 +5022,46 @@ __m512 test_mm512_maskz_cvt_roundph_ps(__mmask16 __U, __m256i __A)
 __m512 test_mm512_cvt_roundepi32_ps( __m512i __A)
 {
   // CHECK-LABEL: @test_mm512_cvt_roundepi32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtdq2ps.512
+  // CHECK: @llvm.x86.avx512.sitofp.round.v16f32.v16i32
   return _mm512_cvt_roundepi32_ps(__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
 
 __m512 test_mm512_mask_cvt_roundepi32_ps(__m512 __W, __mmask16 __U, __m512i __A)
 {
   // CHECK-LABEL: @test_mm512_mask_cvt_roundepi32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtdq2ps.512
+  // CHECK: @llvm.x86.avx512.sitofp.round.v16f32.v16i32
+  // CHECK: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_mask_cvt_roundepi32_ps(__W,__U,__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
 
 __m512 test_mm512_maskz_cvt_roundepi32_ps(__mmask16 __U, __m512i __A)
 {
   // CHECK-LABEL: @test_mm512_maskz_cvt_roundepi32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtdq2ps.512
+  // CHECK: @llvm.x86.avx512.sitofp.round.v16f32.v16i32
+  // CHECK: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_maskz_cvt_roundepi32_ps(__U,__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
 
 __m512 test_mm512_cvt_roundepu32_ps(__m512i __A)
 {
   // CHECK-LABEL: @test_mm512_cvt_roundepu32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtudq2ps.512
+  // CHECK: @llvm.x86.avx512.uitofp.round.v16f32.v16i32
   return _mm512_cvt_roundepu32_ps(__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
 
 __m512 test_mm512_mask_cvt_roundepu32_ps(__m512 __W, __mmask16 __U,__m512i __A)
 {
   // CHECK-LABEL: @test_mm512_mask_cvt_roundepu32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtudq2ps.512
+  // CHECK: @llvm.x86.avx512.uitofp.round.v16f32.v16i32
+  // CHECK: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_mask_cvt_roundepu32_ps(__W,__U,__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
 
 __m512 test_mm512_maskz_cvt_roundepu32_ps(__mmask16 __U,__m512i __A)
 {
   // CHECK-LABEL: @test_mm512_maskz_cvt_roundepu32_ps
-  // CHECK: @llvm.x86.avx512.mask.cvtudq2ps.512
+  // CHECK: @llvm.x86.avx512.uitofp.round.v16f32.v16i32
+  // CHECK: select <16 x i1> %{{.*}}, <16 x float> %{{.*}}, <16 x float> %{{.*}}
   return _mm512_maskz_cvt_roundepu32_ps(__U,__A, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);
 }
author	Craig Topper <craig.topper@intel.com>	2019-01-26 02:42:01 +0000
committer	Craig Topper <craig.topper@intel.com>	2019-01-26 02:42:01 +0000
commit	0ad26a6d8de91e3dc5e322d51d3cdc947e7dbe0e (patch)
tree	c832c0e4d6aceaed5b7f43c9d7e3dcfbc6656649 /test/CodeGen/avx512f-builtins.c
parent	8a558a52a96fc5411834e6e015219513a9986bdc (diff)
download	clang-0ad26a6d8de91e3dc5e322d51d3cdc947e7dbe0e.tar.gz