delta/snappy-git.git - github.com: google/snappy

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fixes for Windows bazel build.HEAD main	Richard O'Grady	2023-04-14	3	-4/+15
\| \| \| \| \| \| \| \|	Don't pass -Wno-sign-compare on Windows. Add a #define HAVE_WINDOWS_H if _WIN32 is defined. Don't assume sys/uio.h is available on Windows. PiperOrigin-RevId: 524416809
*	Add initial bazel build support for snappy.	Richard O'Grady	2023-04-13	4	-0/+229
\| \| \| \|	PiperOrigin-RevId: 524135175
*	Upgrade googletest to v1.13.0 release.	Richard O'Grady	2023-04-13	1	-0/+0
\|
*	Disable Wimplicit-int-float-conversion warning in googletest	Richard O'Grady	2023-04-13	1	-0/+7
\| \| \| \|	PiperOrigin-RevId: 524031046
*	Upgrade benchmark library to v1.7.1 release.	Richard O'Grady	2023-04-11	1	-0/+0
\|
*	Disable -Wsign-compare warning.	Richard O'Grady	2023-04-11	1	-0/+5
\| \| \| \|	PiperOrigin-RevId: 523460180
*	Define missing SNAPPY_PREFETCH macros.	Richard O'Grady	2023-04-11	3	-0/+15
\| \| \| \|	PiperOrigin-RevId: 523287305
*	Add prefetch to zippy compress	Ilya Tokar	2023-03-29	2	-6/+2
\| \| \| \|	PiperOrigin-RevId: 518358512
*	Explicitly #include <utility> in snappy-internal.h	Snappy Team	2023-03-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	snappy-internal.h uses std::pair, which is defined in the <utility> header. Typically, this works because existing C++ standard library implementations provide <utility> via other transitive includes; however, these transitive includes are not guaranteed to exist, and don't exist in certain contexts (e.g. compiling against LLVM's libc++ with Clang modules.) PiperOrigin-RevId: 517213822
*	Optimize check for uncommon decompression for ARM, saving two instructions ↵	Snappy Team	2023-03-29	1	-5/+10
\| \| \| \| \| \|	and three cycles. PiperOrigin-RevId: 517141646
*	Tag open source release 1.1.10.1.1.10	Victor Costan	2023-03-08	2	-1/+7
\| \| \| \|	PiperOrigin-RevId: 515161676
*	The output buffer in DecompressBranchless is never read from and the source ↵	Snappy Team	2023-03-07	1	-9/+41
\| \| \| \| \| \| \| \|	buffers are never written. This allows us to defer any writes to the output buffer for an arbitrary amount of time as long as the writes all occur in the proper order. When a MemCopy64 would have normally occurred we save away the source address and length. Once we reach the location of the next write to the output buffer first perform the deferred copy. This gives time for the source address calculation and length to finish before the deferred copy. This change gives 1.84% on CLX and 0.97% Milan. PiperOrigin-RevId: 504012310
*	Merge pull request #150 from davemgreen:betterunalignedloads	Victor Costan	2023-01-12	1	-12/+45
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 501489679
\| *	Change LittleEndian loads/stores to use memcpy	David Green	2022-01-19	1	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code uses a series of 8bit loads with shifts and ors to emulate an (unaligned) load of a larger type. These are then expected to become single loads in the compiler, producing optimal assembly. Whilst this is true it happens very late in the compiler, meaning that throughout most of the pipeline it is treated (and cost-modelled) as multiple loads, shifts and ors. This can make the compiler make poor decisions (such as not unrolling loops that should be), or to break up the pattern before it is turned into a single load. For example the loops in CompressFragment do not get unrolled as expected due to a higher cost than the unroll threshold in clang. Instead this patch uses a more conventional methods of loading unaligned data, using a memcpy directly which the compiler will be able to deal with much more straight forwardly, modelling it as a single unaligned load. The old code is left as-is for big-endian systems. This helps improve the performance of the BM_ZFlat benchmarks by up to 10-15% on an Arm Neoverse N1. Change-Id: I986f845ebd0a0806d052d2be3e4dbcbee91713d7
* \|	Allow some buffer overwrite on literal emitting	Snappy Team	2023-01-12	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calls to memcpy seem to be quite expensive ``` BM_ZFlat/0 [html (22.24 %) ] 114µs ± 6% 110µs ± 6% -3.97% (p=0.000 n=118+115) BM_ZFlat/1 [urls (47.84 %) ] 1.63ms ± 5% 1.58ms ± 5% -3.39% (p=0.000 n=117+115) BM_ZFlat/2 [jpg (99.95 %) ] 7.84µs ± 6% 7.70µs ± 6% -1.66% (p=0.000 n=119+117) BM_ZFlat/3 [jpg_200 (73.00 %)] 265ns ± 6% 255ns ± 6% -3.48% (p=0.000 n=101+98) BM_ZFlat/4 [pdf (83.31 %) ] 11.8µs ± 6% 11.6µs ± 6% -2.14% (p=0.000 n=118+116) BM_ZFlat/5 [html4 (22.52 %) ] 525µs ± 6% 513µs ± 6% -2.36% (p=0.000 n=117+116) BM_ZFlat/6 [txt1 (57.87 %) ] 494µs ± 5% 480µs ± 6% -2.84% (p=0.000 n=118+116) BM_ZFlat/7 [txt2 (62.02 %) ] 444µs ± 4% 428µs ± 7% -3.51% (p=0.000 n=119+117) BM_ZFlat/8 [txt3 (55.17 %) ] 1.34ms ± 5% 1.30ms ± 5% -2.40% (p=0.000 n=120+116) BM_ZFlat/9 [txt4 (66.41 %) ] 1.84ms ± 5% 1.78ms ± 5% -3.55% (p=0.000 n=110+111) BM_ZFlat/10 [pb (19.61 %) ] 101µs ± 5% 97µs ± 5% -4.67% (p=0.000 n=118+118) BM_ZFlat/11 [gaviota (37.73 %)] 368µs ± 5% 360µs ± 6% -2.13% (p=0.000 n=91+90) BM_ZFlat/12 [cp (48.25 %) ] 38.9µs ± 6% 36.8µs ± 6% -5.36% (p=0.000 n=88+87) BM_ZFlat/13 [c (42.52 %) ] 13.4µs ± 6% 13.1µs ± 8% -2.38% (p=0.000 n=115+116) BM_ZFlat/14 [lsp (48.94 %) ] 4.05µs ± 4% 3.94µs ± 4% -2.58% (p=0.000 n=91+85) BM_ZFlat/15 [xls (41.10 %) ] 1.42ms ± 5% 1.39ms ± 7% -2.49% (p=0.000 n=116+117) BM_ZFlat/16 [xls_200 (78.00 %)] 313ns ± 6% 307ns ± 5% -1.89% (p=0.000 n=89+84) BM_ZFlat/17 [bin (18.12 %) ] 518µs ± 5% 506µs ± 5% -2.42% (p=0.000 n=118+116) BM_ZFlat/18 [bin_200 (7.50 %) ] 86.8ns ± 6% 85.3ns ± 6% -1.76% (p=0.000 n=118+114) BM_ZFlat/19 [sum (48.99 %) ] 67.9µs ± 4% 61.1µs ± 6% -9.96% (p=0.000 n=114+117) BM_ZFlat/20 [man (59.45 %) ] 5.64µs ± 6% 5.47µs ± 7% -3.06% (p=0.000 n=117+115) BM_ZFlatAll [21 kTestDataFiles] 9.23ms ± 4% 9.01ms ± 5% -2.44% (p=0.000 n=80+83) BM_ZFlatIncreasingTableSize [7 tables ] 30.4µs ± 5% 29.3µs ± 7% -3.45% (p=0.000 n=96+96) ``` PiperOrigin-RevId: 490184133
* \|	Add prefetch to zippy decompess,	Ilya Tokar	2023-01-12	1	-0/+8
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 489554313
* \|	Add "cc" clobbers to inline asm that modifies flags.	Snappy Team	2023-01-12	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As far as we know, the lack of "cc" in the clobbers hasn't caused problems yet, but it could. This change is to improve correctness, and is also almost certainly performance neutral. PiperOrigin-RevId: 487133620
* \|	Improve the speed of hashing in zippy compression.	Snappy Team	2023-01-12	3	-20/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change replaces the hashing function used during compression with one that is roughly as good but faster. This speeds up compression by two to a few percent on the Intel-, AMD-, and Arm-based machines we tested. The amount of compression is roughly unchanged. PiperOrigin-RevId: 485960303
* \|	Modify MemCopy64 to use AVX 32 byte copies instead of SSE2 16 byte copies on ↵	Snappy Team	2023-01-12	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \|	capable x86 platforms. This gives an average speedup of 6.87% on Milan and 1.90% on Skylake. PiperOrigin-RevId: 480370725
* \|	Fix the remaining occurrence of non-const `std::string::data()`.	Marcin Kowalczyk	2022-10-08	1	-1/+1
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 479818960
* \|	Fix compilation errors under C++11.	Matt Callanan	2022-10-08	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	`std::string::data()` is const-only until C++17. PiperOrigin-RevId: 479708109
* \|	Fix warnings due to use of `__attribute__(always_inline)` without `inline`.	Marcin Kowalczyk	2022-10-05	1	-2/+2
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 478984028
* \|	Add `snappy::CompressFromIOVec`.	Matt Callanan	2022-09-29	4	-23/+218
\| \| \| \| \| \| \| \| \| \| \| \|	This reads from an `iovec` array rather than from a `char` array as in `snappy::Compress`. PiperOrigin-RevId: 476930623
* \|	Merge pull request #148 from pitrou:ubsan-ptr-add-overflow	Victor Costan	2022-07-27	1	-1/+2
\|\ \ \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 463090354
\| * \|	Fix UBSan error (ptr + offset overflow)	Antoine Pitrou	2021-11-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As `i + offset` is promoted to a "negative" size_t, UBSan would complain when adding the resulting offset to `dst`: ``` /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43: runtime error: addition of unsigned offset to 0x6120003c5ec1 overflowed to 0x6120003c5ec0 #0 0x7f9ebd21769c in snappy::(anonymous namespace)::Copy64BytesWithPatternExtension(char, unsigned long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:343:43 #1 0x7f9ebd21769c in std::__1::pair<unsigned char const, long> snappy::DecompressBranchless<char>(unsigned char const, unsigned char const, long, char, long) /tmp/RtmptDX1SS/file584e37df4e/snappy_ep-prefix/src/snappy_ep/snappy.cc:1160:15 ```
* \| \|	Move the comment about non-overlap requirement from the implementation to the	Marcin Kowalczyk	2022-07-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	contract of `MemCopy64()`, and clarify that it applies to `size`, not to 64. PiperOrigin-RevId: 453920284
* \| \|	Optimize zippy MemCpy / MemMove during decompression	Snappy Team	2022-07-27	1	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default MemCpy() / MemMove() always copies 64 bytes in DecompressBranchless(). Profiling shows that the vast majority of the time we need to copy many fewer bytes (typically <= 16 bytes). It is safe to copy fewer bytes as long as we exceed len. This change improves throughput by ~12% on ARM, ~35% on AMD Milan, and ~7% on Intel Cascade Lake. PiperOrigin-RevId: 453917840
* \| \|	Optimize Zippy compression for ARM by 5-10% by choosing csel instructions	Snappy Team	2022-05-09	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 444863689
* \| \|	Fix compilation for older GCC and Clang versions.	Snappy Team	2022-02-20	1	-1/+1
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not everything defining __GNUC__ supports flag outputs from asm statements; in particular, some Clang versions on macOS does not. The correct test per the GCC documentation is __GCC_ASM_FLAG_OUTPUTS__, so use that instead. PiperOrigin-RevId: 423749308
* \|	Update contributing guidelines.master	Victor Costan	2022-01-12	2	-24/+35
\|/ \| \| \| \| \| \|	* Align CONTRIBUTING.md with the google/new-project template. * Explain the support story for the CMake config. PiperOrigin-RevId: 421311695
*	Pass by reference the first argument of ExtractLowBytes	Snappy Team	2021-11-14	1	-1/+1
\| \| \| \| \| \|	to avoid UB of passing uninitialized argument by value. PiperOrigin-RevId: 406052814
*	Switch CI to GitHub Actions.	Victor Costan	2021-09-01	4	-148/+136
\| \| \| \|	PiperOrigin-RevId: 394247182
*	Merge pull request #140 from JunHe77:adv	Victor Costan	2021-08-31	1	-4/+8
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 394061345
\| *	decompress: refine data depdency	Jun He	2021-08-30	1	-4/+8
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	The final ip advance value doesn't have to wait for the result of offset to load *tag. It can be computed along with the offset, so the codegen will use one csinc in parallel with ldrb. This will improve the throughput. With this change it is observed ~4.2% uplift in UFlat/10 and ~3.7% in UFlatMedley Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I20ab211235bbf578c6c978f2bbd9160a49e920da
*	Merge pull request #133 from JunHe77:simd	Victor Costan	2021-08-30	3	-2/+31
\|\ \| \| \| \| \| \|	PiperOrigin-RevId: 393681630
\| *	Add config and header file for NEON support	Jun He	2021-08-12	2	-0/+12
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I3fade568ff92b4303387705f843d0051d5e88349
\| *	Fix SSE3 and BMI2 compile error	Jun He	2021-08-12	2	-23/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After SHUFFLE code blocks are refactored, "tmmintrin.h" is missed, and bmi2 code part will have build failure as type conflicts. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I7800cd7e050f4d349e5a227206b14b9c566e547f
* \|	Migrate feature detection macro checks from #ifdef to #if.	Victor Costan	2021-08-16	7	-43/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The #if predicate evaluates to false if the macro is undefined, or defined to 0. #ifdef (and its synonym #if defined) evaluates to false only if the macro is undefined. The new setup allows differentiating between setting a macro to 0 (to express that the capability definitely does not exist / should not be used) and leaving a macro undefined (to express not knowing whether a capability exists / not caring if a capability is used). PiperOrigin-RevId: 391094241
* \|	Add baseline CPU level to Travis CI.	Victor Costan	2021-08-16	1	-4/+13
\| \| \| \| \| \| \| \|	PiperOrigin-RevId: 391082698
* \|	Merge pull request #135 from JunHe77:remove_extra	Victor Costan	2021-08-14	1	-0/+9
\|\ \ \| \| \| \| \| \| \| \| \|	PiperOrigin-RevId: 390767998
\| * \|	decompress: add hint to remove extra AND	Jun He	2021-08-12	1	-0/+9
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clang doesn't realize the load with free zero-extension, and emits another extra 'and xn, xm, 0xff' to calc offset. With this change ,this extra op is removed, and consistent 1.7% performance uplift is observed. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Ica4617852c4b93eadc6c5c551dc3961ffbadb8f0
* \|	Merge pull request #136 from JunHe77:ext_arm	Victor Costan	2021-08-13	1	-0/+4
\|\ \ \| \|/ \|/\| \| \|	PiperOrigin-RevId: 390715690
\| *	decompression: optimize ExtractOffset for Arm	Jun He	2021-08-06	1	-0/+3
\|/ \| \| \| \| \| \| \| \| \| \|	Inspired by kExtractMasksCombined, this patch uses shift to replace table lookup. On Arm the codegen is 2 shift ops (lsl+lsr). Comparing to previous ldr which requires 4 cycles latency, the lsl+lsr only need 2 cycles. Slight (~0.3%) uplift observed on N1, and ~3% on A72. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I5b53632d22d9e5cf1a49d0c5cdd16265a15de23b
*	Move the extract masks variable out in zippy. I see a consistent 1.5-2% ↵	Snappy Team	2021-08-02	1	-9/+18
\| \| \| \| \| \|	improvement for ARM. Probably because ARM has more relaxed address computation than x86 https://www.godbolt.org/z/bfM1ezx41. I don't think this is a compiler bug or it can do something about it PiperOrigin-RevId: 387569896
*	Remove inline assembly as the bug in clang was fixed	Snappy Team	2021-08-02	1	-16/+0
\| \| \| \|	PiperOrigin-RevId: 387356237
*	Optimize memset to pure SIMD because compilers generate consistently bad ↵	Snappy Team	2021-08-02	2	-1/+15
\| \| \| \| \| \|	code. clang for ARM and gcc for x86 https://gcc.godbolt.org/z/oxeGG7aEx PiperOrigin-RevId: 383467656
*	Optimize tag extraction for ARM with conditional increment instruction ↵	Snappy Team	2021-07-05	1	-2/+25
\| \| \| \| \| \|	generation (csinc). For codegen see https://gcc.godbolt.org/z/a8z9j95Pv PiperOrigin-RevId: 382688740
*	Enable vector byte shuffle optimizations on ARM NEON	atdt	2021-07-05	2	-59/+99
\| \| \| \| \| \|	The SSSE3 intrinsics we use have their direct analogues in NEON, so making this optimization portable requires a very thin translation layer. PiperOrigin-RevId: 381280165
*	Update Travis CI config.	Victor Costan	2021-05-25	1	-9/+9
\| \| \| \| \| \| \|	Xcode (drives macOS image) : 12.2 => 12.5 Clang : 10 => 12 GCC : 10 => 11 PiperOrigin-RevId: 375610083
*	Clarify, in a comment, that offset/256 fits in 3 bits. It has to in this ↵	Snappy Team	2021-05-25	1	-1/+1
\| \| \| \| \| \|	context, because the other 5 bits in the byte are used for len-4 and the tag. PiperOrigin-RevId: 374926553