delta/samba.git - git.samba.org: samba.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	lib/compression: Fix length check	Joseph Sutton	2023-01-10	1	-1/+1
\| \| \| \| \| \| \|	Put the division on the correct side of the inequality. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	lib/compression: add simple python bindings	Douglas Bagnall	2022-12-22	2	-0/+309
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are four functions, allowing compression and decompression in the two formats we support so far. The functions will accept bytes or unicode strings which are treated as utf-8. The LZ77+Huffman decompression algorithm requires an exact target length to decompress, so this is mandatory. The plain decompression algorithm does not need an exact length, but you can provide one to help it know how much space to allocate. As currently written, you can provide a short length and it will often succeed in decompressing to a different shorter string. These bindings are intended to make ad-hoc investigation easier, not for production use. This is reflected in the guesses about output size that plain_decompress() makes if you don't supply one -- either they are stupidly wasteful or ridiculously insufficient, depending on whether or not you were trying to decompress a 20MB string. >>> a = '12345678' >>> import compression >>> b = compression.huffman_compress(a) >>> b b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 #.... >>> len(b) 262 >>> c = compression.huffman_decompress(b, len(a)) >>> c b'12345678' # note, c is bytes, a is str >>> a '12345678' >>> d = compression.plain_compress(a) >>> d b'\xff\xff\xff\x0012345678' >>> compression.plain_decompress(d) # no size specified, guesses b'12345678' >>> compression.plain_decompress(d,5) b'12345' >>> compression.plain_decompress(d,0) # 0 for auto b'12345678' >>> compression.plain_decompress(d,1) b'1' >>> compression.plain_decompress(a,444) Traceback (most recent call last): compression.CompressionError: unable to decompress data into a buffer of 444 bytes. >>> compression.plain_decompress(b,444) b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 #... That last one decompresses the Huffman compressed file with the plain compressor; pretty much any string is valid for plain decompression. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression/huffman: debug function bails upon disaster (CID 1517261)	Douglas Bagnall	2022-12-19	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't get a node with a zero code, and there's probably nothing to do but stop. CID 1517261 (#1-2 of 2): Bad bit shift operation (BAD_SHIFT)11. negative_shift: In expression j >> offset - k, shifting by a negative amount has undefined behavior. The shift amount, offset - k, is -3. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Mon Dec 19 23:29:04 UTC 2022 on sn-devel-184
*	compression/huffman: double check distance in matches (CID 1517278)	Douglas Bagnall	2022-12-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we just wrote the intermediate representation to have no zero distances, we can be sure it doesn't, but Coverity doesn't know. If distance is zero, `bitlen_nonzero_16(distance)` would be bad. CID 1517278 (#1 of 1): Bad bit shift operation (BAD_SHIFT)41. large_shift: In expression 1 << code_dist, left shifting by more than 31 bits has undefined behavior. The shift amount, code_dist, is 65535. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression: fix sign extension of long matches (CID 1517275)	Douglas Bagnall	2022-12-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Very long matches would be written instead as very very long matches. We can't in fact hit this because we have a MAX_MATCH_LENGTH defined as 64M, but if we could, it might make certain 2GB+ strings impossible to compress. CID 1517275 (#1 of 1): Unintended sign extension (SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension: intermediate[i + 2UL] with type uint16_t (16 bits, unsigned) is promoted in intermediate[i + 2UL] << 16 to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If intermediate[i + 2UL] << 16 is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression tests: avoid div by zero in failure (CID 1517297)	Douglas Bagnall	2022-12-19	2	-0/+2
\| \| \| \| \|	Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression/tests: calm the static analysts (CID: numerous)	Douglas Bagnall	2022-12-19	2	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	None of our test vectors are 18446744073709551615 bytes long, which means we can know an `expected_length == returned_length` check will catch the case where the compression function returns -1 for error. We know that, but Coverity doesn't. It's the same thing over and over again, in two different patterns: >>> CID 1517301: Memory - corruptions (OVERRUN) >>> Calling "memcmp" with "original.data" and "original.length" is suspicious because of the very large index, 18446744073709551615. The index may be due to a negative parameter being interpreted as unsigned. 393 if (original.length != decomp_written \|\| 394 memcmp(decompressed.data, 395 original.data, 396 original.length) != 0) { 397 debug_message("\033[1;31mgot %zd, expected %zu\033[0m\n", 398 decomp_written, *** CID 1517299: Memory - corruptions (OVERRUN) /lib/compression/tests/test_lzxpress_plain.c: 296 in test_lzxpress_plain_decompress_more_compressed_files() 290 debug_start_timer(); 291 written = lzxpress_decompress(p.compressed.data, 292 p.compressed.length, 293 dest, 294 p.decompressed.length); 295 debug_end_timer("decompress", p.decompressed.length); >>> CID 1517299: Memory - corruptions (OVERRUN) >>> Calling "memcmp" with "p.decompressed.data" and "p.decompressed.length" is suspicious because of the very large index, 18446744073709551615. The index may be due to a negative parameter being interpreted as unsigned. 296 if (written == p.decompressed.length && 297 memcmp(dest, p.decompressed.data, p.decompressed.length) == 0) { 298 debug_message("\033[1;32mdecompressed %s! Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression/huffman: check again for invalid codes (CID 1517302)	Douglas Bagnall	2022-12-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	We know that code is non-zero, because it comes from the combination of the intermediate representation and the symbol tables that were generated at the same time. But Coverity doesn't know that, and it thinks we could be doing undefined things in the subsequent shift. CID 1517302: Integer handling issues (BAD_SHIFT) In expression "1 << code_bit_len", shifting by a negative amount has Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression/huffman: tighten bit_len checks (fix SUSE -O3 build)	Douglas Bagnall	2022-12-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The struct write_context bit_len attribute is always between 0 and 31, but if the next patches are applied without this, SUSE GCC -O3 will worry thusly: ../../lib/compression/lzxpress_huffman.c: In function ‘lzxpress_huffman_compress’: ../../lib/compression/lzxpress_huffman.c:953:5: error: assuming signed overflow does not occur when simplifying conditional to constant [-Werror=strict-overflow] if (wc->bit_len > 16) { ^ cc1: all warnings being treated as errors Inspection tell us that the invariant holds. Nevertheless, we can safely use an unsigned type and insist that over- or under- flow is bad. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>
*	compression/huffman: avoid semi-defined behaviour in decompress	Douglas Bagnall	2022-12-19	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had output[output_pos - distance]; where output_pos and distance are size_t and distance can be greater than output_pos (because it refers to a place in the previous block). The underflow is defined, leading to a big number, and when sizeof(size_t) == sizeof(*uint8_t) the subsequent overflow works as expected. But if size_t is smaller than a pointer, bad things will happen. This was found by OSSFuzz with 'UBSAN_OPTIONS=print_stacktrace=1:silence_unsigned_overflow=1'. Credit to OSSFuzz. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
*	lib/compression: Include missing stat header file	Anoop C S	2022-12-06	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<sys/stat.h> was missing from compression library tests which resulted in the following compile time error: ../../lib/compression/tests/test_lzx_huffman.c: In function ‘datablob_from_file’: ../../lib/compression/tests/test_lzx_huffman.c:383:21: error: storage size of ‘s’ isn’t known 383 \| struct stat s; \| ^ ../../lib/compression/tests/test_lzx_huffman.c:389:15: warning: implicit declaration of function ‘fstat’ [-Wimplicit-function-declaration] 389 \| ret = fstat(fileno(fh), &s); \| ^~~~~ Signed-off-by: Anoop C S <anoopcs@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Volker Lendecke <vl@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Tue Dec 6 11:39:16 UTC 2022 on sn-devel-184
*	lib:compression: Initialize variables	Andreas Schneider	2022-12-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches’: lib/compression/tests/test_lzx_huffman.c:1013:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized] 1013 \| assert_int_equal(score, i * j); \| ^ lib/compression/tests/test_lzx_huffman.c:979:19: note: ‘j’ was declared here 979 \| size_t i, j; \| ^ lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches_abc’: lib/compression/tests/test_lzx_huffman.c:1059:39: error: ‘k’ may be used uninitialized [-Werror=maybe-uninitialized] 1059 \| assert_int_equal(score, i * j * k); \| ^ lib/compression/tests/test_lzx_huffman.c:1020:22: note: ‘k’ was declared here 1020 \| size_t i, j, k; \| ^ lib/compression/tests/test_lzx_huffman.c:1059:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized] 1059 \| assert_int_equal(score, i * j * k); \| ^ lib/compression/tests/test_lzx_huffman.c:1020:19: note: ‘j’ was declared here 1020 \| size_t i, j, k; \| ^ Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org> Autobuild-Date(master): Sun Dec 4 09:12:30 UTC 2022 on sn-devel-184
*	lib/compression/lzxpress: fix our slow compression	Douglas Bagnall	2022-12-02	1	-46/+164
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses the same hash table method as lzxpress_huffman, though the code can't be directly reused as the sizes of the offsets is different, and there is not a block processing step here. This will worsen the compression ratio compared to the exhaustive search we previously used, though we still perform better than Windows. To put numbers on it, the test files used to compress to 0.91 of Windows' compression size, and now they compress to 0.96. On the other hand this is many orders of magnitude faster. It is difficult to say exactly how much faster -- while the testsuite time has only improved 200-fold (from 7 minutes to 2 seconds), most of the remaining 2 seconds is used in data generation and management, not compression. OSSFuzz consistently finds new vectors that time out after a minute; on these we'll see nearly an order of magnitude of orders of magnitude inprovement. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz> Autobuild-User(master): Joseph Sutton <jsutton@samba.org> Autobuild-Date(master): Fri Dec 2 00:00:04 UTC 2022 on sn-devel-184
*	lib/compression/lzxpress: shift encoding into helper functions	Douglas Bagnall	2022-12-01	1	-74/+104
\| \| \| \| \| \| \| \|	This makes it easier to rework the encoding decision to depend on a hash table match rather than the current exhaustive search. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression/lzxpress compression: use a write context struct	Douglas Bagnall	2022-12-01	1	-50/+62
\| \| \| \| \| \| \| \| \|	This will make it possible to move encoding operations into helper functions, which will make it easier to restructure the code to use a hash table for faster matching. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: more tests for lzxpress plain compression	Douglas Bagnall	2022-12-01	1	-0/+749
\| \| \| \| \| \| \| \|	These are based on (i.e. copied and pasted from) the LZ77 + Huffman tests. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	testdata: move compression examples to re-use with lzxpress plain	Douglas Bagnall	2022-12-01	1	-3/+3
\| \| \| \| \| \| \| \|	Everything that is in testdata/compression/lzxpress-huffman/ can also be used for lzxpress plain tests, which is something we really need. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression/lzx-plain: relax size requirements on long file	Douglas Bagnall	2022-12-01	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	We are going to change from a slow exact match algorithm to a fast heuristic search that will not always get the same results as the exhaustive search. To be precise, a million zeros will compress to 112 rather than 93 bytes. We don't insist on an exact size, because that is not an issue here. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/comression: convert test_lzxpress_plain to cmocka	Douglas Bagnall	2022-12-01	2	-128/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mainly so I can go make bin/test_lzxpress_plain && bin/test_lzxpress_plain valgrind bin/test_lzxpress_plain rr bin/test_lzxpress_plain rr replay in a tight loop. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: add test scripts README	Douglas Bagnall	2022-12-01	1	-0/+19
\| \| \| \| \|	Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: test util to generate fuzzing seeds	Douglas Bagnall	2022-12-01	1	-0/+45
\| \| \| \| \|	Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: Windows utility to generate test vectors	Douglas Bagnall	2022-12-01	1	-0/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If compiled on Windows using Cygwin, MSYS2, or similar, this will output compressed versions of files exactly as specified by MZ-XCA, if the following conditions are met: 1. The file > 300 bytes. 2. The compressed file is smaller than the decompressed file. Otherwise it returns the data unchanged. Without warning; that's just how the API works. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: script to test 3 byte hash	Douglas Bagnall	2022-12-01	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	Compression uses a 3 byte hash remember LZ77 matches in a 14-bit table. This script runs the hash over all 16M combinations, then again over all ASCII combinations, counting collisions to find hot-spots. If you think you have a better hash, you are probably right, but you should try it here -- alter h() -- before committing to it. This one is literally the first one I thought of. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: helper script to make unbalanced data	Douglas Bagnall	2022-12-01	1	-0/+185
\| \| \| \| \| \| \| \|	Huffman tree re-quantisation and perhaps other code paths are only triggered by pathological data like this. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: add a debug script to describe headers	Douglas Bagnall	2022-12-01	1	-0/+54
\| \| \| \| \|	Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression/tests: add lzhuffman timer functions	Douglas Bagnall	2022-12-01	1	-5/+36
\| \| \| \| \| \| \| \| \| \| \| \|	With LZXHUFF_DEBUG_VERBOSE set, we measure the compression and decompression rate relative to the decompressed size. On reasonably long strings on my laptop, compiled with -O0, it turns out to between 20 and 500 MB/s, both ways, depending on the complexity of the string. Very short strings are of course dominated by overhead. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: debug routines for lzxpress-huffman	Douglas Bagnall	2022-12-01	1	-1/+249
\| \| \| \| \| \| \| \| \| \| \| \| \|	If you need to see a Huffman tree (and sometimes you do), set DEBUG_HUFFMAN_TREE to true at the top of lzxpress_huffman.c, and run: make bin/test_lzx_huffman && bin/test_lzx_huffman Actually, that will show you hundreds of trees, and you'll be glad of that if you are ever trying to understand this. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression/lzhuff: add debug flag to skip LZ77	Douglas Bagnall	2022-12-01	1	-1/+10
\| \| \| \| \| \| \| \|	Encoding without LZ77 matches is valid, and it is useful for isolating bugs. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: LZ77 + Huffman compression	Douglas Bagnall	2022-12-01	3	-0/+1861
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This compresses files as described in MS-XCA 2.2, and as decompressed by the decompressor in the previous commit. As with the decompressor, there are two public functions -- one that uses a talloc context, and one that uses pre-allocated memory. The compressor requires a tightly bound amount of auxillary memory (>220kB) in a few different buffers, which is all gathered together in the public struct lzxhuff_compressor_mem. An instantiated but not initialised copy of this struct is required by the non-talloc function; it can be used over and over again. Our compression speed is about the same as the decompression speed (between 20 and 500 MB/s on this laptop, depending on the data), and our compression ratio is very similar to that of Windows. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: add LZ77 + Huffman decompression	Douglas Bagnall	2022-12-01	4	-3/+1218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t input, size_t input_size, uint8_t output, size_t output_size); uint8_t lzxpress_huffman_decompress_talloc(TALLOC_CTX mem_ctx, const uint8_t input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib/compression: move lzxpress_plain test into tests/	Douglas Bagnall	2022-12-01	1	-0/+0
\| \| \| \| \| \| \| \|	We are going to add more tests for lib/compression, and they can't all be called "testsuite.c". Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
*	lib: Fix the 32-bit build	Volker Lendecke	2022-07-23	1	-1/+1
\| \| \| \| \|	Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>
*	lzxpress: compress shortcut if we've reached maximum length	Douglas Bagnall	2022-05-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A simple degenerate case for our compressor has been a large number of repeated bytes that will match the maximum length (~64k) at all 8192 search positions, 8191 of which searches are in vain because the matches are not of greater length than the first one. Here we recognise the inevitable and reduce runtime proportionately. Credit to OSS-Fuzz. REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47428 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Autobuild-User(master): Douglas Bagnall <dbagnall@samba.org> Autobuild-Date(master): Tue May 17 23:11:21 UTC 2022 on sn-devel-184
*	lzxpress/test: time performance of long boring sequences	Douglas Bagnall	2022-05-17	1	-0/+69
\| \| \| \| \| \| \| \|	We get very slow when long runs of the bytes are the same. On this laptop the test takes 18s; with the next commit it will be 0.006s. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression:tests: align test names with functions	Douglas Bagnall	2022-05-12	1	-5/+5
\| \| \| \| \| \| \| \|	You'll thank me if you're ever debugging these and wondering why 'lzxpress4' calls 'lzxpress2' (or is it the other way round?). Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: add a few comments, including MS-XCA pointers.	Douglas Bagnall	2022-05-12	1	-0/+19
\| \| \| \| \|	Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: remove always false constant comparison	Douglas Bagnall	2022-05-12	1	-3/+0
\| \| \| \| \| \| \|	We set `uncompressed_pos = 0;` unconditionally, just ~10 lines up. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: lzxpress decompress empty string as empty string	Douglas Bagnall	2022-05-12	1	-0/+4
\| \| \| \| \| \| \| \|	This mirrors the behaviour of lzxpress_compress, which "encodes" an empty string as an empty string. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: fix lzxpress decompress with trailing flags	Douglas Bagnall	2022-05-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Every so often, lzxpress adds a 32-bit block of indicator flags to help decode the next clump of 32 code words. A naive compressor (such as we have) might do this at the very end for flags that aren't actually used because there are no more bytes to decompress. If that happens we need to stop processing, or we'll come to worse outcome at the next CHECK_INPUT_BYTES. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression:tests: test lzxpress in some edge cases	Douglas Bagnall	2022-05-12	1	-1/+61
\| \| \| \| \| \| \| \| \| \|	Empty strings and trailing flag blocks. (found with Honggfuzz and a round-trip fuzzer that aborts if the strings differ). Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: Move maximum length calculation out of inner loop	Joseph Sutton	2022-05-12	1	-6/+3
\| \| \| \| \| \| \|	This makes the code clearer. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Use correct values for max len and offset	Joseph Sutton	2022-05-12	1	-2/+2
\| \| \| \| \|	Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Replace divisions with shifts	Joseph Sutton	2022-05-12	1	-4/+5
\| \| \| \| \| \| \|	This is more consistent with the compression code. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Remove unneeded loop variable	Joseph Sutton	2022-05-12	1	-2/+1
\| \| \| \| \| \|	Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: Reduce scope of variables	Joseph Sutton	2022-05-12	1	-14/+13
\| \| \| \| \| \| \| \|	This makes the code clearer. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: Use PUSH_LE_U32 for first output buffer write	Joseph Sutton	2022-05-12	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>
*	compression: Add bounds check for first output buffer write	Joseph Sutton	2022-05-12	1	-1/+3
\| \| \| \| \|	Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Remove helper variables str1 and str2	Joseph Sutton	2022-05-12	1	-6/+4
\| \| \| \| \| \| \|	This simplifies the code and makes it clearer. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Fix writing output flags	Joseph Sutton	2022-05-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	If indic_bit == 0, the shift amount of 32 - indic_bit == 32 will equal the width of a 32-bit integer type, and these shifts will invoke undefined behaviour, which is likely to cause incorrect output. Fix this by not shifting a 32-bit integer type by 32 bits or more. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
*	compression: Remove byte_left variable	Joseph Sutton	2022-05-12	1	-5/+2
\| \| \| \| \| \| \| \|	We can simplify this code using the identity: byte_left + uncompressed_pos = uncompressed_size Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>