diff options
author | Timothy B. Terriberry <tterribe@xiph.org> | 2015-01-02 15:48:54 -0800 |
---|---|---|
committer | Timothy B. Terriberry <tterribe@xiph.org> | 2015-01-02 16:16:21 -0800 |
commit | 7422189ab16de442554da7f73c3c6f3c15130d22 (patch) | |
tree | 38894f0c3d4cca820268fe881fc334d5bb1a7422 /celt | |
parent | 23f503ad1c388aa9171af931ccb2f114f0839e0e (diff) | |
download | opus-7422189ab16de442554da7f73c3c6f3c15130d22.tar.gz |
Fix silk_VQ_WMat_EC_sse4_1().
During review of c95c9a048f32, I replaced a call to
_mm_cvtepi8_epi32() with the OP_CVTEPI16_EPI32_M64() macro (note
the 16 instead of 8).
Make a separate OP_CVTEPI8_EPI32_M32() macro and use that instead.
Thaks to Wei Zhou for the report.
Diffstat (limited to 'celt')
-rw-r--r-- | celt/x86/x86cpu.h | 24 |
1 files changed, 16 insertions, 8 deletions
diff --git a/celt/x86/x86cpu.h b/celt/x86/x86cpu.h index 2394b05e..44b3a597 100644 --- a/celt/x86/x86cpu.h +++ b/celt/x86/x86cpu.h @@ -44,18 +44,26 @@ int opus_select_arch(void); # endif -/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi16_epi32() - when optimizations are disabled, even though the actual PMOVSXWD instruction - takes an m64. Unlike a normal m64 reference, these require 16-byte alignment - and load 16 bytes instead of 8, possibly reading out of bounds. - - We can insert an explicit MOVQ using _mm_loadl_epi64(), which should have the - same semantics as an m64 reference in the PMOVSXWD instruction itself, but - gcc is not smart enough to optimize this out when optimizations ARE enabled.*/ +/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi8_epi32() + or _mm_cvtepi16_epi32() when optimizations are disabled, even though the + actual PMOVSXWD instruction takes an m32 or m64. Unlike a normal memory + reference, these require 16-byte alignment and load a full 16 bytes (instead + of 4 or 8), possibly reading out of bounds. + + We can insert an explicit MOVD or MOVQ using _mm_cvtsi32_si128() or + _mm_loadl_epi64(), which should have the same semantics as an m32 or m64 + reference in the PMOVSXWD instruction itself, but gcc is not smart enough to + optimize this out when optimizations ARE enabled.*/ # if !defined(__OPTIMIZE__) +# define OP_CVTEPI8_EPI32_M32(x) \ + (_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x)))) + # define OP_CVTEPI16_EPI32_M64(x) \ (_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x)))) # else +# define OP_CVTEPI8_EPI32_M32(x) \ + (_mm_cvtepi8_epi32(*(__m128i *)(x))) + # define OP_CVTEPI16_EPI32_M64(x) \ (_mm_cvtepi16_epi32(*(__m128i *)(x))) # endif |