diff options
author | Jonathan Lennox <jonathan@vidyo.com> | 2015-08-03 17:04:21 -0400 |
---|---|---|
committer | Jean-Marc Valin <jmvalin@jmvalin.ca> | 2015-09-01 17:21:31 -0400 |
commit | 1d60b49e9d95672a17ebe5578319c59fa3963224 (patch) | |
tree | 06f66c74aff5cdb65db0774d8ed9f5fa9617752d | |
parent | b4aa5dc858c905d9b09e70794584c44f7f4d2f7a (diff) | |
download | opus-1d60b49e9d95672a17ebe5578319c59fa3963224.tar.gz |
In optimized mode, don't force Clang to use explicit load/store for _mm_cvtepi16_epi32, only for _mm_cvtepi8_epi32. Adjust comment accordingly.
-rw-r--r-- | celt/x86/x86cpu.h | 20 |
1 files changed, 12 insertions, 8 deletions
diff --git a/celt/x86/x86cpu.h b/celt/x86/x86cpu.h index ef53f0c9..cdbab9c3 100644 --- a/celt/x86/x86cpu.h +++ b/celt/x86/x86cpu.h @@ -55,21 +55,25 @@ int opus_select_arch(void); reference in the PMOVSXWD instruction itself, but gcc is not smart enough to optimize this out when optimizations ARE enabled. - It appears clang requires us to do this always (which is fair, since - technically the compiler is always allowed to do the dereference before - invoking the function implementing the intrinsic). I have not investiaged - whether it is any smarter than gcc when it comes to eliminating the extra - load instruction.*/ + Clang, in contrast, requires us to do this always for _mm_cvtepi8_epi32 + (which is fair, since technically the compiler is always allowed to do the + dereference before invoking the function implementing the intrinsic). + However, it is smart enough to eliminate the extra MOVD instruction. + For _mm_cvtepi16_epi32, it does the right thing, though does *not* optimize out + the extra MOVQ if it's specified explicitly */ + # if defined(__clang__) || !defined(__OPTIMIZE__) # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x)))) - -# define OP_CVTEPI16_EPI32_M64(x) \ - (_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x)))) # else # define OP_CVTEPI8_EPI32_M32(x) \ (_mm_cvtepi8_epi32(*(__m128i *)(x))) +#endif +# if !defined(__OPTIMIZE__) +# define OP_CVTEPI16_EPI32_M64(x) \ + (_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x)))) +# else # define OP_CVTEPI16_EPI32_M64(x) \ (_mm_cvtepi16_epi32(*(__m128i *)(x))) # endif |