From ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 Mon Sep 17 00:00:00 2001 From: Joseph Myers Date: Wed, 20 Sep 2017 16:54:05 +0000 Subject: Add SSE4.1 trunc, truncf (bug 20142). This patch adds SSE4.1 versions of trunc and truncf, using the roundsd / roundss instructions, similar to the versions of ceil, floor, rint and nearbyint functions we already have. In my testing with the glibc benchtests these are about 30% faster than the C versions for double, 20% faster for float. Tested for x86_64. [BZ #20142] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1. * sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file. * sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. --- NEWS | 2 ++ 1 file changed, 2 insertions(+) (limited to 'NEWS') diff --git a/NEWS b/NEWS index dd381f8930..a3aa94cb3b 100644 --- a/NEWS +++ b/NEWS @@ -12,6 +12,8 @@ Major new features: * Optimized x86-64 asin, atan2, exp, expf, log, pow, atan, sin and tan with FMA, contributed by Arjan van de Ven and H.J. Lu from Intel. +* Optimized x86-64 trunc and truncf for processors with SSE4.1. + * In order to support faster and safer process termination the malloc API family of functions will no longer print a failure address and stack backtrace after detecting heap corruption. The goal is to minimize the -- cgit v1.2.1