summaryrefslogtreecommitdiff
path: root/crypto/arm_arch.h
Commit message (Collapse)AuthorAgeFilesLines
* Apply aes-gcm unroll8+eor3 optimization patch to Neoverse V2Xiaokang Qian2023-02-081-0/+1
| | | | | | Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/20184)
* Apply SM4 optimization patch to Kunpeng-920Xu Yizhou2022-11-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the ideal scenario, performance can reach up to 2.2X. But in single block input or CFB/OFB mode, CBC encryption, performance could drop about 50%. Perf data on Kunpeng-920 2.6GHz hardware, before and after optimization: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 75318.96k 79089.62k 79736.15k 79934.12k 80325.44k 80068.61k SM4-ECB 80211.39k 84998.36k 86472.28k 87024.93k 87144.80k 86862.51k SM4-GCM 72156.19k 82012.08k 83848.02k 84322.65k 85103.65k 84896.43k SM4-CBC 77956.13k 80638.81k 81976.17k 81606.31k 82078.91k 81750.70k SM4-CFB 78078.20k 81054.87k 81841.07k 82396.38k 82203.99k 82236.76k SM4-OFB 78282.76k 82074.03k 82765.74k 82989.06k 83200.68k 83487.17k After: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 35678.07k 120687.25k 176632.27k 177192.62k 177586.18k 178295.18k SM4-ECB 35540.32k 122628.07k 175067.90k 178007.84k 178298.88k 178328.92k SM4-GCM 34215.75k 116720.50k 170275.16k 171770.88k 172714.21k 172272.30k SM4-CBC 35645.60k 36544.86k 36515.50k 36732.15k 36618.24k 36629.16k SM4-CFB 35528.14k 35690.99k 35954.86k 35843.42k 35809.18k 35809.96k SM4-OFB 35563.55k 35853.56k 35963.05k 36203.52k 36233.85k 36307.82k Signed-off-by: Xu Yizhou <xuyizhou1@huawei.com> Reviewed-by: Hugo Landau <hlandau@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/19547)
* Fix aarch64 signed bit shift issue found by UBSANTom Cosgrove2022-07-191-4/+4
| | | | | | | | | | | | | | | Also fix conditional branch out of range when using sanitisers. Fixes #18813 Signed-off-by: Tom Cosgrove <tom.cosgrove@arm.com> Change-Id: Ic543885091ed3ef2ddcbe21de0a4ac0bca1e2494 Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18816)
* Apply the AES-GCM unroll8 optimization patch to Neoverse N2XiaokangQian2022-05-231-0/+1
| | | | | | | | | | | The loop unrolling and use of EOR3 can improve N2 performance by up to 32% Signed-off-by: XiaokangQian <xiaokang.qian@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/18350)
* Update copyright yearMatt Caswell2022-05-031-1/+1
| | | | | Reviewed-by: Tomas Mraz <tomas@openssl.org> Release: yes
* Acceleration of chacha20 on aarch64 by SVEDaniel Hu2022-05-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17916)
* Optimize AES-GCM for uarchs with unroll and new instructionsXiaokangQian2022-01-251-0/+6
| | | | | | | | | | | | | | | Increase the block numbers to 8 for every iteration. Increase the hash table capacity. Make use of EOR3 instruction to improve the performance. This can improve performance 25-40% on out-of-order microarchitectures with a large number of fast execution units, such as Neoverse V1. We also see 20-30% performance improvements on other architectures such as the M1. Assembly code reviewd by Tom Cosgrove (ARM). Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15916)
* SM4 optimization for ARM by HW instructionDaniel Hu2022-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the SM4 optimization for ARM processor, using SM4 HW instruction, which is an optional feature of crypto extension for aarch64 V8. Tested on some modern ARM micro-architectures with SM4 support, the performance uplift can be observed around 8X~40X over existing C implementation in openssl. Algorithms that can be parallelized (like CTR, ECB, CBC decryption) are on higher end, with algorithm like CBC encryption on lower end (due to inter-block dependency) Perf data on Yitian-710 2.75GHz hardware, before and after optimization: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 105787.80k 107837.87k 108380.84k 108462.08k 108549.46k 108554.92k SM4-ECB 111924.58k 118173.76k 119776.00k 120093.70k 120264.02k 120274.94k SM4-CBC 106428.09k 109190.98k 109674.33k 109774.51k 109827.41k 109827.41k After (7.4x - 36.6x faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes SM4-CTR 781979.02k 2432994.28k 3437753.86k 3834177.88k 3963715.58k 3974556.33k SM4-ECB 937590.69k 2941689.02k 3945751.81k 4328655.87k 4459181.40k 4468692.31k SM4-CBC 890639.88k 1027746.58k 1050621.78k 1056696.66k 1058613.93k 1058701.31k Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17455)
* SM3 acceleration with SM3 hardware instruction on aarch64fangming.fang2022-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | SM3 hardware instruction is optional feature of crypto extension for aarch64. This implementation accelerates SM3 via SM3 instructions. For the platform not supporting SM3 instruction, the original C implementation still works. Thanks to AliBaba for testing and reporting the following perf numbers for Yitian710: Benchmark on T-Head Yitian-710 2.75GHz: Before: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 49297.82k 121062.63k 223106.05k 283371.52k 307574.10k 309400.92k After (33% - 74% faster): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes sm3 65640.01k 179121.79k 359854.59k 481448.96k 534055.59k 538274.47k Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17454)
* Don't use __ARMEL__/__ARMEB__ in aarch64 assemblyDavid Benjamin2022-01-091-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | GCC's __ARMEL__ and __ARMEB__ defines denote little- and big-endian arm, respectively. They are not defined on aarch64, which instead use __AARCH64EL__ and __AARCH64EB__. However, OpenSSL's assembly originally used the 32-bit defines on both platforms and even define __ARMEL__ and __ARMEB__ in arm_arch.h. This is less portable and can even interfere with other headers, which use __ARMEL__ to detect little-endian arm. Over time, the aarch64 assembly has switched to the correct defines, such as in 32bbb62ea634239e7cb91d6450ba23517082bab6. This commit finishes the job: poly1305-armv8.pl needed a fix and the dual-arch armx.pl files get one more transform to convert from 32-bit to 64-bit. (There is an even more official endianness detector, __ARM_BIG_ENDIAN in the Arm C Language Extensions. But I've stuck with the GCC ones here as that would be a larger change.) Reviewed-by: Matt Caswell <matt@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> (Merged from https://github.com/openssl/openssl/pull/17373)
* Add Arm Assembly (aarch64) support for RNGOrr Toledano2021-12-161-0/+1
| | | | | | | | | | Include aarch64 asm instructions for random number generation using the RNDR and RNDRRS instructions. Provide detection functions for RNDR and RNDRRS getauxval. Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15361)
* aarch64: support BTI and pointer authentication in assemblyRuss Butler2021-10-011-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds optional support for - Armv8.3-A Pointer Authentication (PAuth) and - Armv8.5-A Branch Target Identification (BTI) features to the perl scripts. Both features can be enabled with additional compiler flags. Unless any of these are enabled explicitly there is no code change at all. The extensions are briefly described below. Please read the appropriate chapters of the Arm Architecture Reference Manual for the complete specification. Scope ----- This change only affects generated assembly code. Armv8.3-A Pointer Authentication -------------------------------- Pointer Authentication extension supports the authentication of the contents of registers before they are used for indirect branching or load. PAuth provides a probabilistic method to detect corruption of register values. PAuth signing instructions generate a Pointer Authentication Code (PAC) based on the value of a register, a seed and a key. The generated PAC is inserted into the original value in the register. A PAuth authentication instruction recomputes the PAC, and if it matches the PAC in the register, restores its original value. In case of a mismatch, an architecturally unmapped address is generated instead. With PAuth, mitigation against ROP (Return-oriented Programming) attacks can be implemented. This is achieved by signing the contents of the link-register (LR) before it is pushed to stack. Once LR is popped, it is authenticated. This way a stack corruption which overwrites the LR on the stack is detectable. The PAuth extension adds several new instructions, some of which are not recognized by older hardware. To support a single codebase for both pre Armv8.3-A targets and newer ones, only NOP-space instructions are added by this patch. These instructions are treated as NOPs on hardware which does not support Armv8.3-A. Furthermore, this patch only considers cases where LR is saved to the stack and then restored before branching to its content. There are cases in the code where LR is pushed to stack but it is not used later. We do not address these cases as they are not affected by PAuth. There are two keys available to sign an instruction address: A and B. PACIASP and PACIBSP only differ in the used keys: A and B, respectively. The keys are typically managed by the operating system. To enable generating code for PAuth compile with -mbranch-protection=<mode>: - standard or pac-ret: add PACIASP and AUTIASP, also enables BTI (read below) - pac-ret+b-key: add PACIBSP and AUTIBSP Armv8.5-A Branch Target Identification -------------------------------------- Branch Target Identification features some new instructions which protect the execution of instructions on guarded pages which are not intended branch targets. If Armv8.5-A is supported by the hardware, execution of an instruction changes the value of PSTATE.BTYPE field. If an indirect branch lands on a guarded page the target instruction must be one of the BTI <jc> flavors, or in case of a direct call or jump it can be any other instruction. If the target instruction is not compatible with the value of PSTATE.BTYPE a Branch Target Exception is generated. In short, indirect jumps are compatible with BTI <j> and <jc> while indirect calls are compatible with BTI <c> and <jc>. Please refer to the specification for the details. Armv8.3-A PACIASP and PACIBSP are implicit branch target identification instructions which are equivalent with BTI c or BTI jc depending on system register configuration. BTI is used to mitigate JOP (Jump-oriented Programming) attacks by limiting the set of instructions which can be jumped to. BTI requires active linker support to mark the pages with BTI-enabled code as guarded. For ELF64 files BTI compatibility is recorded in the .note.gnu.property section. For a shared object or static binary it is required that all linked units support BTI. This means that even a single assembly file without the required note section turns-off BTI for the whole binary or shared object. The new BTI instructions are treated as NOPs on hardware which does not support Armv8.5-A or on pages which are not guarded. To insert this new and optional instruction compile with -mbranch-protection=standard (also enables PAuth) or +bti. When targeting a guarded page from a non-guarded page, weaker compatibility restrictions apply to maintain compatibility between legacy and new code. For detailed rules please refer to the Arm ARM. Compiler support ---------------- Compiler support requires understanding '-mbranch-protection=<mode>' and emitting the appropriate feature macros (__ARM_FEATURE_BTI_DEFAULT and __ARM_FEATURE_PAC_DEFAULT). The current state is the following: ------------------------------------------------------- | Compiler | -mbranch-protection | Feature macros | +----------+---------------------+--------------------+ | clang | 9.0.0 | 11.0.0 | +----------+---------------------+--------------------+ | gcc | 9 | expected in 10.1+ | ------------------------------------------------------- Available Platforms ------------------ Arm Fast Model and QEMU support both extensions. https://developer.arm.com/tools-and-software/simulation-models/fast-models https://www.qemu.org/ Implementation Notes -------------------- This change adds BTI landing pads even to assembly functions which are likely to be directly called only. In these cases, landing pads might be superfluous depending on what code the linker generates. Code size and performance impact for these cases would be negligible. Interaction with C code ----------------------- Pointer Authentication is a per-frame protection while Branch Target Identification can be turned on and off only for all code pages of a whole shared object or static binary. Because of these properties if C/C++ code is compiled without any of the above features but assembly files support any of them unconditionally there is no incompatibility between the two. Useful Links ------------ To fully understand the details of both PAuth and BTI it is advised to read the related chapters of the Arm Architecture Reference Manual (Arm ARM): https://developer.arm.com/documentation/ddi0487/latest/ Additional materials: "Providing protection for complex software" https://developer.arm.com/architectures/learn-the-architecture/providing-protection-for-complex-software Arm Compiler Reference Guide Version 6.14: -mbranch-protection https://developer.arm.com/documentation/101754/0614/armclang-Reference/armclang-Command-line-Options/-mbranch-protection?lang=en Arm C Language Extensions (ACLE) https://developer.arm.com/docs/101028/latest Addional Notes -------------- This patch is a copy of the work done by Tamas Petz in boringssl. It contains the changes from the following commits: aarch64: support BTI and pointer authentication in assembly Change-Id: I4335f92e2ccc8e209c7d68a0a79f1acdf3aeb791 URL: https://boringssl-review.googlesource.com/c/boringssl/+/42084 aarch64: Improve conditional compilation Change-Id: I14902a64e5f403c2b6a117bc9f5fb1a4f4611ebf URL: https://boringssl-review.googlesource.com/c/boringssl/+/43524 aarch64: Fix name of gnu property note section Change-Id: I6c432d1c852129e9c273f6469a8b60e3983671ec URL: https://boringssl-review.googlesource.com/c/boringssl/+/44024 Change-Id: I2d95ebc5e4aeb5610d3b226f9754ee80cf74a9af Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from https://github.com/openssl/openssl/pull/16674)
* Update copyright yearMatt Caswell2021-05-201-1/+1
| | | | | Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15381)
* crypto/arm_arch.h: add a variable declarationXiaofei Bai2021-05-141-0/+1
| | | | | | | | | | Add this variable declaration to prevent "-Werror,-Wmissing-variable-declarations" error from compiler. This error currently only happens on clang. Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/15240)
* Read MIDR_EL1 system register on aarch64Fangming.Fang2020-12-091-0/+44
| | | | | | | | | | | | | | MIDR_EL1 system register exposes microarchitecture information so that people can make micro-arch related optimization such as exposing as much instruction level parallelism as possible. MIDR_EL1 register can be read only if HWCAP_CPUID feature is supported. Change-Id: Iabb8a36c5d31b184dba6399f378598058d394d4e Reviewed-by: Paul Dale <paul.dale@oracle.com> Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org> (Merged from https://github.com/openssl/openssl/pull/11744)
* Fix header file include guard namesDr. Matthias St. Pierre2019-09-281-2/+2
| | | | | | | | | | | | | Make the include guards consistent by renaming them systematically according to the naming conventions below For the public header files (in the 'include/openssl' directory), the guard names try to match the path specified in the include directives, with all letters converted to upper case and '/' and '.' replaced by '_'. For the private header files files, an extra 'OSSL_' is added as prefix. Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/9333)
* Following the license change, modify the boilerplates in crypto/Richard Levitte2018-12-061-1/+1
| | | | | | | [skip ci] Reviewed-by: Matt Caswell <matt@openssl.org> (Merged from https://github.com/openssl/openssl/pull/7827)
* Fix building linux-armv4 with --strict-warningsBernd Edlinger2018-04-201-1/+1
| | | | | | Reviewed-by: Rich Salz <rsalz@openssl.org> Reviewed-by: Richard Levitte <levitte@openssl.org> (Merged from https://github.com/openssl/openssl/pull/6026)
* Update copyright yearMatt Caswell2018-02-131-1/+1
| | | | Reviewed-by: Richard Levitte <levitte@openssl.org>
* crypto/armcap.c: detect hardware-assisted SHA512 support.Andy Polyakov2018-02-121-0/+1
| | | | Reviewed-by: Rich Salz <rsalz@openssl.org>
* Many spelling fixes/typo's corrected.Josh Soref2017-11-111-1/+1
| | | | | | | | | Around 138 distinct errors found and fixed; thanks! Reviewed-by: Kurt Roeckx <kurt@roeckx.be> Reviewed-by: Tim Hudson <tjh@openssl.org> Reviewed-by: Rich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/3459)
* Copyright consolidation 07/10Rich Salz2016-05-171-0/+9
| | | | Reviewed-by: Richard Levitte <levitte@openssl.org>
* Run util/openssl-format-source -v -c .Matt Caswell2015-01-221-53/+53
| | | | Reviewed-by: Tim Hudson <tjh@openssl.org>
* Remove inconsistency in ARM support.Andy Polyakov2015-01-041-0/+12
| | | | | | | | | This facilitates "universal" builds, ones that target multiple architectures, e.g. ARMv5 through ARMv7. See commentary in Configure for details. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matt Caswell <matt@openssl.org>
* Remove OPENSSL_FIPSCANISTER code.Dr. Stephen Henson2014-12-081-4/+0
| | | | | | | OPENSSL_FIPSCANISTER is only set if the fips module is being built (as opposed to being used). Since the fips module wont be built in master this is redundant. Reviewed-by: Tim Hudson <tjh@openssl.org>
* Add linux-aarch64 taget.Andy Polyakov2014-06-011-2/+13
| | | | | | | armcap.c is shared between 32- and 64-bit builds and features link-time detection of getauxval. Submitted by: Ard Biesheuvel.
* crypto/armcap.c: detect ARMv8 capabilities [in 32-bit build].Andy Polyakov2014-05-041-0/+4
|
* arm_arch.h: allow to specify __ARM_ARCH__ elsewhere.Andy Polyakov2011-11-091-1/+1
|
* arm_arch.h: add missing pre-defined macro, __ARM_ARCH_5TEJ__.Andy Polyakov2011-10-191-1/+2
|
* Make sure OPENSSL_FIPSCANISTER is visible to ARM assembly language files.Dr. Stephen Henson2011-07-221-1/+1
|
* ARM assembler pack: add platform run-time detection.Andy Polyakov2011-07-171-1/+8
|
* Now the FIPS capable OpenSSL is available simplify the various FIPS testDr. Stephen Henson2011-06-221-1/+1
| | | | | | | | | | | build options. All fispcanisterbuild builds only build fipscanister.o and include symbol renaming. Move all renamed symbols to fipssyms.h Update README.FIPS
* Include fipssyms.h for ARM builds to translate symbols.Dr. Stephen Henson2011-05-041-0/+4
| | | | Translate arm symbol to fips_*.
* ARM assembler pack: add missing arm_arch.h.Andy Polyakov2011-04-011-0/+39