diff options
author | Timothy B. Terriberry <tterribe@xiph.org> | 2013-05-19 17:11:17 -0700 |
---|---|---|
committer | Timothy B. Terriberry <tterribe@xiph.org> | 2013-05-19 19:12:51 -0700 |
commit | 972a34ec2c79d241318af24389b8ee042d10556a (patch) | |
tree | 18894d8e576d351923ed57aacbdec125919d3ba8 /m4 | |
parent | b7bd4c20acfd951ba46647e07411285997d952f4 (diff) | |
download | opus-972a34ec2c79d241318af24389b8ee042d10556a.tar.gz |
Add ARMv4/ARMv5E macros.
Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
http://lists.xiph.org/pipermail/opus/2013-May/002078.html
Revised version:
- Add autconf detection (ported from libtheora).
- Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
- Use actual macros so they can still be selectively overridden.
- Split out ARMv4 parts and add a few more ARMv4 macros.
- Label blocks to make them easy to find in generated assembly.
- Fix MULT16_32_Q15() so we can pass make check.
The MDCT test passes in values larger than 2**30 for b.
The new version should be just as fast (or faster, since it's
easier to merge the shift with following instructions), and
there's no appreciable impact on accuracy (FFT/MDCT SNR actually
goes up in most cases).
- Fix register constraints.
We were using early-clobber flags in a bunch of places that
didn't need them, and commutative-pair flags in a bunch of
places that weren't actually commutative.
This was Jean-Marc's fault (the original code came from Speex).
- Simplify silk_CLZ16().
- Port over iFFT C_MULC asm by Andree Buschmann
<AndreeBuschmann@t-online.de> from Rockbox.
- Speed up the C_MULC asm by using LDRD, allowing more flexible
addressing, re-ordering instructions to avoid some stalls,
allowing more flexible register allocation, and getting things
out of the inline asm block so the compiler can schedule them
better.
- Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
new C_MULC.
In total, this patch gives a 22.3% speed-up on test_opus_encoder on
a 600 MHz Cortex A8 using gcc 4.2.1,
When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
on the same processor/compiler.
On the conformance test vectors:
Average mono quality is 97.0583 %
Average stereo quality is 97.775 %
Diffstat (limited to 'm4')
-rw-r--r-- | m4/as-gcc-inline-assembly.m4 | 106 |
1 files changed, 106 insertions, 0 deletions
diff --git a/m4/as-gcc-inline-assembly.m4 b/m4/as-gcc-inline-assembly.m4 new file mode 100644 index 00000000..4437a9d0 --- /dev/null +++ b/m4/as-gcc-inline-assembly.m4 @@ -0,0 +1,106 @@ +dnl as-gcc-inline-assembly.m4 0.1.0 + +dnl autostars m4 macro for detection of gcc inline assembly + +dnl David Schleef <ds@schleef.org> + +dnl $Id$ + +dnl AS_COMPILER_FLAG(ACTION-IF-ACCEPTED, [ACTION-IF-NOT-ACCEPTED]) +dnl Tries to compile with the given CFLAGS. +dnl Runs ACTION-IF-ACCEPTED if the compiler can compile with the flags, +dnl and ACTION-IF-NOT-ACCEPTED otherwise. + +AC_DEFUN([AS_GCC_INLINE_ASSEMBLY], +[ + AC_MSG_CHECKING([if compiler supports gcc-style inline assembly]) + + AC_TRY_COMPILE([], [ +#ifdef __GNUC_MINOR__ +#if (__GNUC__ * 1000 + __GNUC_MINOR__) < 3004 +#error GCC before 3.4 has critical bugs compiling inline assembly +#endif +#endif +__asm__ (""::) ], [flag_ok=yes], [flag_ok=no]) + + if test "X$flag_ok" = Xyes ; then + $1 + true + else + $2 + true + fi + AC_MSG_RESULT([$flag_ok]) +]) + +AC_DEFUN([AC_TRY_ASSEMBLE], +[ac_c_ext=$ac_ext + ac_ext=${ac_s_ext-s} + cat > conftest.$ac_ext <<EOF + .file "configure" +[$1] +EOF +if AC_TRY_EVAL(ac_compile); then + ac_ext=$ac_c_ext + ifelse([$2], , :, [ $2 + rm -rf conftest*]) +else + echo "configure: failed program was:" >&AC_FD_CC + cat conftest.$ac_ext >&AC_FD_CC + ac_ext=$ac_c_ext +ifelse([$3], , , [ rm -rf conftest* + $3 +])dnl +fi +rm -rf conftest*]) + + +AC_DEFUN([AS_ASM_ARM_NEON], +[ + AC_MSG_CHECKING([if assembler supports NEON instructions on ARM]) + + AC_TRY_ASSEMBLE([vorr d0,d0,d0], [flag_ok=yes], [flag_ok=no]) + + if test "X$flag_ok" = Xyes ; then + $1 + true + else + $2 + true + fi + AC_MSG_RESULT([$flag_ok]) +]) + + +AC_DEFUN([AS_ASM_ARM_MEDIA], +[ + AC_MSG_CHECKING([if assembler supports ARMv6 media instructions on ARM]) + + AC_TRY_ASSEMBLE([shadd8 r3,r3,r3], [flag_ok=yes], [flag_ok=no]) + + if test "X$flag_ok" = Xyes ; then + $1 + true + else + $2 + true + fi + AC_MSG_RESULT([$flag_ok]) +]) + + +AC_DEFUN([AS_ASM_ARM_EDSP], +[ + AC_MSG_CHECKING([if assembler supports EDSP instructions on ARM]) + + AC_TRY_ASSEMBLE([qadd r3,r3,r3], [flag_ok=yes], [flag_ok=no]) + + if test "X$flag_ok" = Xyes ; then + $1 + true + else + $2 + true + fi + AC_MSG_RESULT([$flag_ok]) +]) |