summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorsnappy.mirrorbot@gmail.com <snappy.mirrorbot@gmail.com@03e5f5b5-db94-4691-08a0-1a8bf15f6143>2011-06-03 20:53:06 +0000
committersnappy.mirrorbot@gmail.com <snappy.mirrorbot@gmail.com@03e5f5b5-db94-4691-08a0-1a8bf15f6143>2011-06-03 20:53:06 +0000
commit23805f9d5fbcdf4b0d4b94b530f3d81124d8ba63 (patch)
treec1a6652e87450a5c74fec18eddddab6e6d4e8a4e
parentd737ce7568c794f0a92cf1190706b9b7b79e395d (diff)
downloadsnappy-23805f9d5fbcdf4b0d4b94b530f3d81124d8ba63.tar.gz
Speed up decompression by removing a fast-path attempt.
Whenever we try to enter a copy fast-path, there is a certain cost in checking that all the preconditions are in place, but it's normally offset by the fact that we can usually take the cheaper path. However, in a certain path we've already established that "avail < literal_length", which usually means that either the available space is small, or the literal is big. Both will disqualify us from taking the fast path, and thus we take the hit from the precondition checking without gaining much from having a fast path. Thus, simply don't try the fast path in this situation -- we're already on a slow path anyway (one where we need to refill more data from the reader). I'm a bit surprised at how much this gained; it could be that this path is more common than I thought, or that the simpler structure somehow makes the compiler happier. I haven't looked at the assembler, but it's a win across the board on both Core 2, Core i7 and Opteron, at least for the cases we typically care about. The gains seem to be the largest on Core i7, though. Results from my Core i7 workstation: Benchmark Time(ns) CPU(ns) Iterations --------------------------------------------------- BM_UFlat/0 73337 73091 190996 1.3GB/s html [ +1.7%] BM_UFlat/1 696379 693501 20173 965.5MB/s urls [ +2.7%] BM_UFlat/2 9765 9734 1472135 12.1GB/s jpg [ +0.7%] BM_UFlat/3 29720 29621 472973 3.0GB/s pdf [ +1.8%] BM_UFlat/4 294636 293834 47782 1.3GB/s html4 [ +2.3%] BM_UFlat/5 28399 28320 494700 828.5MB/s cp [ +3.5%] BM_UFlat/6 12795 12760 1000000 833.3MB/s c [ +1.2%] BM_UFlat/7 3984 3973 3526448 893.2MB/s lsp [ +5.7%] BM_UFlat/8 991996 989322 14141 992.6MB/s xls [ +3.3%] BM_UFlat/9 228620 227835 61404 636.6MB/s txt1 [ +4.0%] BM_UFlat/10 197114 196494 72165 607.5MB/s txt2 [ +3.5%] BM_UFlat/11 605240 603437 23217 674.4MB/s txt3 [ +3.7%] BM_UFlat/12 804157 802016 17456 573.0MB/s txt4 [ +3.9%] BM_UFlat/13 347860 346998 40346 1.4GB/s bin [ +1.2%] BM_UFlat/14 44684 44559 315315 818.4MB/s sum [ +2.3%] BM_UFlat/15 5120 5106 2739726 789.4MB/s man [ +3.3%] BM_UFlat/16 76591 76355 183486 1.4GB/s pb [ +2.8%] BM_UFlat/17 238564 237828 58824 739.1MB/s gaviota [ +1.6%] BM_UValidate/0 42194 42060 333333 2.3GB/s html [ -0.1%] BM_UValidate/1 433182 432005 32407 1.5GB/s urls [ -0.1%] BM_UValidate/2 197 196 71428571 603.3GB/s jpg [ +0.5%] BM_UValidate/3 14494 14462 972222 6.1GB/s pdf [ +0.5%] BM_UValidate/4 168444 167836 83832 2.3GB/s html4 [ +0.1%] R=jeff Revision created by MOE tool push_codebase. git-svn-id: http://snappy.googlecode.com/svn/trunk@42 03e5f5b5-db94-4691-08a0-1a8bf15f6143
-rw-r--r--snappy.cc3
1 files changed, 1 insertions, 2 deletions
diff --git a/snappy.cc b/snappy.cc
index dc6c2f3..596f597 100644
--- a/snappy.cc
+++ b/snappy.cc
@@ -677,8 +677,7 @@ class SnappyDecompressor {
uint32 avail = ip_limit_ - ip;
while (avail < literal_length) {
- bool allow_fast_path = (avail >= 16);
- if (!writer->Append(ip, avail, allow_fast_path)) return;
+ if (!writer->Append(ip, avail, false)) return;
literal_length -= avail;
reader_->Skip(peeked_);
size_t n;