diff options
author | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2019-01-27 11:19:56 +0200 |
---|---|---|
committer | Jussi Kivilinna <jussi.kivilinna@iki.fi> | 2019-01-27 11:19:56 +0200 |
commit | d6330dfb4b0e9fb3f8eef65ea13146060b804a97 (patch) | |
tree | 585a2b0108970d3b5d9fcc57d2333c22dd004cdc /cipher/cipher-internal.h | |
parent | 7d9b2f114f3edf4d13640616cf34c79364234781 (diff) | |
download | libgcrypt-d6330dfb4b0e9fb3f8eef65ea13146060b804a97.tar.gz |
Add stitched ChaCha20-Poly1305 SSSE3 and AVX2 implementations
* cipher/asm-poly1305-amd64.h: New.
* cipher/Makefile.am: Add 'asm-poly1305-amd64.h'.
* cipher/chacha20-amd64-avx2.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New.
* cipher/chacha20-amd64-ssse3.S (QUATERROUND2): Add interleave
operators.
(_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1): New.
* cipher/chacha20.c (_gcry_chacha20_poly1305_amd64_ssse3_blocks4)
(_gcry_chacha20_poly1305_amd64_ssse3_blocks1)
(_gcry_chacha20_poly1305_amd64_avx2_blocks8): New prototypes.
(chacha20_encrypt_stream): Split tail to...
(do_chacha20_encrypt_stream_tail): ... new function.
(_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New.
* cipher/cipher-internal.h (_gcry_chacha20_poly1305_encrypt)
(_gcry_chacha20_poly1305_decrypt): New prototypes.
* cipher/cipher-poly1305.c (_gcry_cipher_poly1305_encrypt): Call
'_gcry_chacha20_poly1305_encrypt' if cipher is ChaCha20.
(_gcry_cipher_poly1305_decrypt): Call
'_gcry_chacha20_poly1305_decrypt' if cipher is ChaCha20.
* cipher/poly1305-internal.h (_gcry_cipher_poly1305_update_burn): New
prototype.
* cipher/poly1305.c (poly1305_blocks): Make static.
(_gcry_poly1305_update): Split main function body to ...
(_gcry_poly1305_update_burn): ... new function.
--
Benchmark on Intel Skylake (i5-6500, 3200 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.378 ns/B 2526 MiB/s 1.21 c/B
STREAM dec | 0.373 ns/B 2560 MiB/s 1.19 c/B
POLY1305 enc | 0.685 ns/B 1392 MiB/s 2.19 c/B
POLY1305 dec | 0.686 ns/B 1390 MiB/s 2.20 c/B
POLY1305 auth | 0.315 ns/B 3031 MiB/s 1.01 c/B
After, 8-way AVX2 (~36% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.503 ns/B 1896 MiB/s 1.61 c/B
POLY1305 dec | 0.485 ns/B 1965 MiB/s 1.55 c/B
Benchmark on Intel Haswell (i7-4790K, 3998 Mhz):
Before, 8-way AVX2:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.318 ns/B 2999 MiB/s 1.27 c/B
STREAM dec | 0.317 ns/B 3004 MiB/s 1.27 c/B
POLY1305 enc | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 dec | 0.586 ns/B 1627 MiB/s 2.34 c/B
POLY1305 auth | 0.271 ns/B 3524 MiB/s 1.08 c/B
After, 8-way AVX2 (~30% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.452 ns/B 2108 MiB/s 1.81 c/B
POLY1305 dec | 0.440 ns/B 2167 MiB/s 1.76 c/B
Before, 4-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.627 ns/B 1521 MiB/s 2.51 c/B
STREAM dec | 0.626 ns/B 1523 MiB/s 2.50 c/B
POLY1305 enc | 0.895 ns/B 1065 MiB/s 3.58 c/B
POLY1305 dec | 0.896 ns/B 1064 MiB/s 3.58 c/B
POLY1305 auth | 0.271 ns/B 3521 MiB/s 1.08 c/B
After, 4-way SSSE3 (~20% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 0.733 ns/B 1301 MiB/s 2.93 c/B
POLY1305 dec | 0.726 ns/B 1314 MiB/s 2.90 c/B
Before, 1-way SSSE3:
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.56 ns/B 609.6 MiB/s 6.25 c/B
POLY1305 dec | 1.56 ns/B 609.4 MiB/s 6.26 c/B
After, 1-way SSSE3 (~18% faster):
CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte
POLY1305 enc | 1.31 ns/B 725.4 MiB/s 5.26 c/B
POLY1305 dec | 1.31 ns/B 727.3 MiB/s 5.24 c/B
For comparison to other libraries (on Intel i7-4790K, 3998 Mhz):
bench-slope-openssl: OpenSSL 1.1.1 11 Sep 2018
Cipher:
chacha20 | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 0.301 ns/B 3166.4 MiB/s 1.20 c/B
STREAM dec | 0.300 ns/B 3174.7 MiB/s 1.20 c/B
POLY1305 enc | 0.463 ns/B 2060.6 MiB/s 1.85 c/B
POLY1305 dec | 0.462 ns/B 2063.8 MiB/s 1.85 c/B
POLY1305 auth | 0.162 ns/B 5899.3 MiB/s 0.646 c/B
bench-slope-nettle: Nettle 3.4
Cipher:
chacha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc | 1.65 ns/B 578.2 MiB/s 6.59 c/B
STREAM dec | 1.65 ns/B 578.2 MiB/s 6.59 c/B
POLY1305 enc | 2.05 ns/B 464.8 MiB/s 8.20 c/B
POLY1305 dec | 2.05 ns/B 464.7 MiB/s 8.20 c/B
POLY1305 auth | 0.404 ns/B 2359.1 MiB/s 1.62 c/B
bench-slope-botan: Botan 2.6.0
Cipher:
ChaCha | nanosecs/byte mebibytes/sec cycles/byte
STREAM enc/dec | 0.855 ns/B 1116.0 MiB/s 3.42 c/B
POLY1305 enc | 1.60 ns/B 595.4 MiB/s 6.40 c/B
POLY1305 dec | 1.60 ns/B 595.8 MiB/s 6.40 c/B
POLY1305 auth | 0.752 ns/B 1268.3 MiB/s 3.01 c/B
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Diffstat (limited to 'cipher/cipher-internal.h')
-rw-r--r-- | cipher/cipher-internal.h | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/cipher/cipher-internal.h b/cipher/cipher-internal.h index 89886962..78f05dbb 100644 --- a/cipher/cipher-internal.h +++ b/cipher/cipher-internal.h @@ -542,6 +542,15 @@ void _gcry_cipher_poly1305_setkey /* */ (gcry_cipher_hd_t c); +/*-- chacha20.c --*/ +gcry_err_code_t _gcry_chacha20_poly1305_encrypt +/* */ (gcry_cipher_hd_t c, byte *outbuf, const byte *inbuf, + size_t length); +gcry_err_code_t _gcry_chacha20_poly1305_decrypt +/* */ (gcry_cipher_hd_t c, byte *outbuf, const byte *inbuf, + size_t length); + + /*-- cipher-ocb.c --*/ gcry_err_code_t _gcry_cipher_ocb_encrypt /* */ (gcry_cipher_hd_t c, |