| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following common local transformations
(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y
The compiler itself matches such patterns many times. This also aligns with other popular compilers.
Fixes #59111
Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev: 1aad82e62e61e89789932c2070851a023f130dd8
GitHub-Pull-Request: golang/go#59119
Reviewed-on: https://go-review.googlesource.com/c/go/+/477555
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.
The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).
Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.
This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.
E.x:
// ppc64le: foo
// ppc64le/power9: foo
into
// ppc64x: foo
or
// ppc64: foo
// ppc64le: foo
into
// ppc64x: foo
import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
with open(file) as f:
text = [l for l in f]
i = 0
while i < len(text):
first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
if first:
j = i+1
while j < len(text):
second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
if not second:
break
if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
text=text[:j] + (text[j+1:])
else:
j += 1
i+=1
with open(file, 'w') as f:
f.write("".join(text))
Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.
This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.
This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.
Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.
For example:
func foo(a, b uint64) uint64 {
d := a+0x1234568
d1 := b+0x1234568
return d&d1
}
The code generated by the master branch:
0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add
0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add
0x001c 00028 AND R1, R2, R0
This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
0x0004 00004 MOVD $19088744, R2 // movz+movk
0x000c 00012 ADD R0, R2, R3
0x0010 00016 ADD R1, R2, R1
0x0014 00020 AND R1, R3, R0
This CL also adds constant folding optimization for ADDS instruction.
In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5)
Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5)
FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5)
FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5)
FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5)
FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5)
FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5)
FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5)
GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5)
GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5)
Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5)
Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4)
HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5)
JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5)
JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4)
Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5)
GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5)
RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5)
RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5)
RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4)
RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5)
RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5)
RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5)
Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5)
Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5)
TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4)
TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4)
Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For signed comparisons, the following four optimization rules hold:
(CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y)
(CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y)
(CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y)
(CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y)
But currently they only apply to jump instructions, not to conditional
instructions within a block, such as cset, csel, etc. This CL extends
the above rules into blocks so that conditional instructions can also be
optimized.
name old time/op new time/op delta
DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5)
DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5)
DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5)
DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5)
Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de
Reviewed-on: https://go-review.googlesource.com/c/go/+/420434
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated multiple tests in test/codegen/arithmetic.go to verify
on ppc64/ppc64le as well
Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/358546
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode.
Looking at Go binary itself, before the change there was:
ANDL: 2337
ANDQ: 4476
After the change:
ANDL: 3790
ANDQ: 3024
So we got rid of 1452 ANDQs
This makes the Linux x86_64 binary 0.03% smaller.
There seems to be an impact on performance.
Intel Cascade Lake benchmarks (with perflock):
name old time/op new time/op delta
BinaryTree17-8 1.91s ± 1% 1.89s ± 1% -1.22% (p=0.000 n=21+18)
Fannkuch11-8 2.34s ± 0% 2.34s ± 0% ~ (p=0.052 n=20+20)
FmtFprintfEmpty-8 27.7ns ± 1% 27.4ns ± 3% ~ (p=0.497 n=21+21)
FmtFprintfString-8 53.2ns ± 0% 51.5ns ± 0% -3.21% (p=0.000 n=20+19)
FmtFprintfInt-8 57.3ns ± 0% 55.7ns ± 0% -2.89% (p=0.000 n=19+19)
FmtFprintfIntInt-8 92.3ns ± 0% 88.4ns ± 1% -4.23% (p=0.000 n=20+21)
FmtFprintfPrefixedInt-8 103ns ± 0% 103ns ± 0% +0.23% (p=0.000 n=20+21)
FmtFprintfFloat-8 147ns ± 0% 148ns ± 0% +0.75% (p=0.000 n=20+21)
FmtManyArgs-8 384ns ± 0% 381ns ± 0% -0.63% (p=0.000 n=21+21)
GobDecode-8 3.86ms ± 1% 3.88ms ± 1% +0.52% (p=0.000 n=20+21)
GobEncode-8 2.77ms ± 1% 2.77ms ± 0% ~ (p=0.078 n=21+21)
Gzip-8 168ms ± 1% 168ms ± 0% +0.24% (p=0.000 n=20+20)
Gunzip-8 25.1ms ± 0% 24.3ms ± 0% -3.03% (p=0.000 n=21+21)
HTTPClientServer-8 61.4µs ± 8% 59.1µs ±10% ~ (p=0.088 n=20+21)
JSONEncode-8 6.86ms ± 0% 6.70ms ± 0% -2.29% (p=0.000 n=20+19)
JSONDecode-8 30.8ms ± 1% 30.6ms ± 1% -0.82% (p=0.000 n=20+20)
Mandelbrot200-8 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.191 n=16+17)
GoParse-8 2.61ms ± 2% 2.60ms ± 1% ~ (p=0.561 n=21+20)
RegexpMatchEasy0_32-8 48.5ns ± 2% 45.9ns ± 3% -5.26% (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8 139ns ± 0% 139ns ± 0% +0.27% (p=0.000 n=18+20)
RegexpMatchEasy1_32-8 41.3ns ± 0% 42.1ns ± 4% +1.95% (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8 216ns ± 2% 216ns ± 0% +0.17% (p=0.020 n=21+19)
RegexpMatchMedium_32-8 790ns ± 7% 803ns ± 8% ~ (p=0.178 n=21+21)
RegexpMatchMedium_1K-8 23.5µs ± 5% 23.7µs ± 5% ~ (p=0.421 n=21+21)
RegexpMatchHard_32-8 1.09µs ± 1% 1.09µs ± 1% -0.53% (p=0.000 n=19+18)
RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.610 n=21+20)
Revcomp-8 348ms ± 0% 353ms ± 0% +1.38% (p=0.000 n=17+18)
Template-8 42.0ms ± 1% 41.9ms ± 1% -0.30% (p=0.049 n=20+20)
TimeParse-8 185ns ± 0% 185ns ± 0% ~ (p=0.387 n=20+18)
TimeFormat-8 237ns ± 1% 241ns ± 1% +1.57% (p=0.000 n=21+21)
[Geo mean] 35.4µs 35.2µs -0.66%
name old speed new speed delta
GobDecode-8 199MB/s ± 1% 198MB/s ± 1% -0.52% (p=0.000 n=20+21)
GobEncode-8 277MB/s ± 1% 277MB/s ± 0% ~ (p=0.075 n=21+21)
Gzip-8 116MB/s ± 1% 115MB/s ± 0% -0.25% (p=0.000 n=20+20)
Gunzip-8 773MB/s ± 0% 797MB/s ± 0% +3.12% (p=0.000 n=21+21)
JSONEncode-8 283MB/s ± 0% 290MB/s ± 0% +2.35% (p=0.000 n=20+19)
JSONDecode-8 63.0MB/s ± 1% 63.5MB/s ± 1% +0.82% (p=0.000 n=20+20)
GoParse-8 22.2MB/s ± 2% 22.3MB/s ± 1% ~ (p=0.539 n=21+20)
RegexpMatchEasy0_32-8 660MB/s ± 2% 697MB/s ± 3% +5.57% (p=0.000 n=20+21)
RegexpMatchEasy0_1K-8 7.36GB/s ± 0% 7.34GB/s ± 0% -0.26% (p=0.000 n=18+20)
RegexpMatchEasy1_32-8 775MB/s ± 0% 761MB/s ± 4% -1.88% (p=0.000 n=17+21)
RegexpMatchEasy1_1K-8 4.74GB/s ± 2% 4.74GB/s ± 0% -0.18% (p=0.020 n=21+19)
RegexpMatchMedium_32-8 40.6MB/s ± 7% 39.9MB/s ± 9% ~ (p=0.191 n=21+21)
RegexpMatchMedium_1K-8 43.7MB/s ± 5% 43.2MB/s ± 5% ~ (p=0.435 n=21+21)
RegexpMatchHard_32-8 29.3MB/s ± 1% 29.4MB/s ± 1% +0.53% (p=0.000 n=19+18)
RegexpMatchHard_1K-8 31.0MB/s ± 0% 31.0MB/s ± 0% ~ (p=0.572 n=21+20)
Revcomp-8 730MB/s ± 0% 720MB/s ± 0% -1.36% (p=0.000 n=17+18)
Template-8 46.2MB/s ± 1% 46.3MB/s ± 1% +0.30% (p=0.041 n=20+20)
[Geo mean] 204MB/s 205MB/s +0.30%
Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514
Reviewed-on: https://go-review.googlesource.com/c/go/+/354970
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test is not executed by default (see #48247) and does not
actually pass. It was added in CL 346689. The code generation
changes made in that CL only change how instructions are assembled,
they do not actually affect the output of the compiler. This test
is unfortunately therefore invalid and will never pass.
Updates #48247.
Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f
Reviewed-on: https://go-review.googlesource.com/c/go/+/348390
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL simplifies riscv addition (add r, imm) to
(ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges.
(-4096 <= imm <= -2049 or 2048 <= imm <= 4094)
There is little impact to the go1 benchmark, while the total
size of pkg/linux_riscv64 decreased by about 11KB.
Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e
GitHub-Last-Rev: a2f56a07635344a40d6b8a9571f236743122be34
GitHub-Pull-Request: golang/go#48110
Reviewed-on: https://go-review.googlesource.com/c/go/+/346689
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Munday <mike.munday@lowrisc.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
arithmetic expressions
This applies the following generic integer addition/subtraction transformations:
x - (x + y) = -y
(x - y) - x = -y
y + (x - y) = x
y + (z + (x - y) = x + z
There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself:
runtime.stackfree()
runtime.runqdrain()
Go binary size reduced by 0.05% on Linux x86_64.
StackCopy bench (perflocked Cascade Lake x86):
name old time/op new time/op delta
StackCopyPtr-8 87.3ms ± 1% 86.9ms ± 0% -0.45% (p=0.000 n=20+20)
StackCopy-8 77.6ms ± 1% 77.0ms ± 0% -0.76% (p=0.000 n=20+20)
StackCopyNoCache-8 2.28ms ± 2% 2.26ms ± 2% -0.93% (p=0.008 n=19+20)
test/bench/go1 benchmarks (perflocked Cascade Lake x86):
name old time/op new time/op delta
BinaryTree17-8 1.88s ± 1% 1.88s ± 0% ~ (p=0.373 n=15+12)
Fannkuch11-8 2.31s ± 0% 2.35s ± 0% +1.52% (p=0.000 n=15+14)
FmtFprintfEmpty-8 26.6ns ± 0% 26.6ns ± 0% ~ (p=0.081 n=14+13)
FmtFprintfString-8 48.6ns ± 0% 50.0ns ± 0% +2.86% (p=0.000 n=15+14)
FmtFprintfInt-8 56.9ns ± 0% 54.8ns ± 0% -3.70% (p=0.000 n=15+15)
FmtFprintfIntInt-8 90.4ns ± 0% 88.8ns ± 0% -1.78% (p=0.000 n=15+15)
FmtFprintfPrefixedInt-8 104ns ± 0% 104ns ± 0% ~ (p=0.905 n=14+13)
FmtFprintfFloat-8 148ns ± 0% 144ns ± 0% -2.19% (p=0.000 n=14+15)
FmtManyArgs-8 389ns ± 0% 390ns ± 0% +0.35% (p=0.000 n=12+15)
GobDecode-8 3.90ms ± 1% 3.88ms ± 0% -0.49% (p=0.000 n=15+14)
GobEncode-8 2.73ms ± 0% 2.73ms ± 0% ~ (p=0.425 n=15+14)
Gzip-8 169ms ± 0% 168ms ± 0% -0.52% (p=0.000 n=13+13)
Gunzip-8 24.7ms ± 0% 24.8ms ± 0% +0.61% (p=0.000 n=15+15)
HTTPClientServer-8 60.5µs ± 6% 60.4µs ± 7% ~ (p=0.595 n=15+15)
JSONEncode-8 6.97ms ± 1% 6.93ms ± 0% -0.69% (p=0.000 n=14+14)
JSONDecode-8 31.2ms ± 1% 30.8ms ± 1% -1.27% (p=0.000 n=14+14)
Mandelbrot200-8 3.87ms ± 0% 3.87ms ± 0% ~ (p=0.652 n=15+14)
GoParse-8 2.65ms ± 2% 2.64ms ± 1% ~ (p=0.202 n=15+15)
RegexpMatchEasy0_32-8 45.1ns ± 0% 45.9ns ± 0% +1.68% (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8 140ns ± 0% 139ns ± 0% -0.44% (p=0.000 n=15+14)
RegexpMatchEasy1_32-8 40.9ns ± 3% 40.5ns ± 0% -0.88% (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8 215ns ± 1% 220ns ± 1% +2.27% (p=0.000 n=15+15)
RegexpMatchMedium_32-8 783ns ± 7% 738ns ± 0% ~ (p=0.361 n=15+15)
RegexpMatchMedium_1K-8 24.1µs ± 6% 23.4µs ± 6% -2.94% (p=0.004 n=15+15)
RegexpMatchHard_32-8 1.10µs ± 1% 1.09µs ± 1% -0.40% (p=0.006 n=15+14)
RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.535 n=12+14)
Revcomp-8 354ms ± 0% 353ms ± 0% -0.23% (p=0.002 n=15+13)
Template-8 42.0ms ± 1% 41.8ms ± 2% -0.37% (p=0.023 n=14+15)
TimeParse-8 181ns ± 0% 180ns ± 1% -0.18% (p=0.014 n=12+13)
TimeFormat-8 240ns ± 0% 242ns ± 1% +0.69% (p=0.000 n=12+15)
[Geo mean] 35.2µs 35.1µs -0.43%
name old speed new speed delta
GobDecode-8 197MB/s ± 1% 198MB/s ± 0% +0.49% (p=0.000 n=15+14)
GobEncode-8 281MB/s ± 0% 281MB/s ± 0% ~ (p=0.419 n=15+14)
Gzip-8 115MB/s ± 0% 115MB/s ± 0% +0.52% (p=0.000 n=13+13)
Gunzip-8 786MB/s ± 0% 781MB/s ± 0% -0.60% (p=0.000 n=15+15)
JSONEncode-8 278MB/s ± 1% 280MB/s ± 0% +0.69% (p=0.000 n=14+14)
JSONDecode-8 62.3MB/s ± 1% 63.1MB/s ± 1% +1.29% (p=0.000 n=14+14)
GoParse-8 21.9MB/s ± 2% 22.0MB/s ± 1% ~ (p=0.205 n=15+15)
RegexpMatchEasy0_32-8 709MB/s ± 0% 697MB/s ± 0% -1.65% (p=0.000 n=14+15)
RegexpMatchEasy0_1K-8 7.34GB/s ± 0% 7.37GB/s ± 0% +0.43% (p=0.000 n=15+15)
RegexpMatchEasy1_32-8 783MB/s ± 2% 790MB/s ± 0% +0.88% (p=0.000 n=15+13)
RegexpMatchEasy1_1K-8 4.77GB/s ± 1% 4.66GB/s ± 1% -2.23% (p=0.000 n=15+15)
RegexpMatchMedium_32-8 41.0MB/s ± 7% 43.3MB/s ± 0% ~ (p=0.360 n=15+15)
RegexpMatchMedium_1K-8 42.5MB/s ± 6% 43.8MB/s ± 6% +3.07% (p=0.004 n=15+15)
RegexpMatchHard_32-8 29.2MB/s ± 1% 29.3MB/s ± 1% +0.41% (p=0.006 n=15+14)
RegexpMatchHard_1K-8 31.1MB/s ± 0% 31.1MB/s ± 0% ~ (p=0.495 n=12+14)
Revcomp-8 718MB/s ± 0% 720MB/s ± 0% +0.23% (p=0.002 n=15+13)
Template-8 46.3MB/s ± 1% 46.4MB/s ± 2% +0.38% (p=0.021 n=14+15)
[Geo mean] 205MB/s 206MB/s +0.57%
Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0
Reviewed-on: https://go-review.googlesource.com/c/go/+/344930
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In codegen/arithmetic.go, previously there are MOVD's that match
for loads of arguments. With register ABI there are no more such
loads. Remove the MOVD matches.
Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64
Reviewed-on: https://go-review.googlesource.com/c/go/+/324251
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some codegen tests were written with the assumption that
arguments and results are in memory, and with a specific stack
layout. With the register ABI, the assumption is no longer true.
Adjust the tests to work with both cases.
- For tests expecting in memory arguments/results, change to use
global variables or memory-assigned argument/results.
- Allow more registers. E.g. some tests expecting register names
contain only letters (e.g. AX), but it can also contain numbers
(e.g. R10).
- Some instruction selection changes when operate on register vs.
memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both.
TODO: mathbits.go and memops.go still need fix.
Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/309335
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current layout algorithm tries to put consecutive blocks together,
so the priority of the successor block is higher than the priority of
the zero indegree block. This algorithm is beneficial for subsequent
register allocation, but will result in more branch instructions.
The depth-first topological sorting algorithm is a well-known layout
algorithm, which has applications in many languages, and it helps to
reduce branch instructions. This CL applies it to the layout pass.
The test results show that it helps to reduce the code size.
This CL also includes the following changes:
1, Removed the primary predecessor mechanism. The new layout algorithm is
not very friendly to register allocator in some cases, in order to adapt
to the new layout algorithm, a new primary predecessor selection strategy
is introduced.
2, Since the new layout implementation may place non-loop blocks between
loop blocks, some adaptive modifications have also been made to looprotate
pass.
3, The layout also affects the results of codegen, so this CL also adjusted
several codegen tests accordingly.
It is inevitable that this CL will cause the code size or performance of a
few functions to decrease, but the number of cases it improves is much larger
than the number of cases it drops.
Statistical data from compilecmp on linux/amd64 is as follow:
name old time/op new time/op delta
Template 382ms ± 4% 382ms ± 4% ~ (p=0.497 n=49+50)
Unicode 170ms ± 9% 169ms ± 8% ~ (p=0.344 n=48+50)
GoTypes 2.01s ± 4% 2.01s ± 4% ~ (p=0.628 n=50+48)
Compiler 190ms ±10% 189ms ± 9% ~ (p=0.734 n=50+50)
SSA 11.8s ± 2% 11.8s ± 3% ~ (p=0.877 n=50+50)
Flate 241ms ± 9% 241ms ± 8% ~ (p=0.897 n=50+49)
GoParser 366ms ± 3% 361ms ± 4% -1.21% (p=0.004 n=47+50)
Reflect 835ms ± 3% 838ms ± 3% ~ (p=0.275 n=50+49)
Tar 336ms ± 4% 335ms ± 3% ~ (p=0.454 n=48+48)
XML 433ms ± 4% 431ms ± 3% ~ (p=0.071 n=49+48)
LinkCompiler 706ms ± 4% 705ms ± 4% ~ (p=0.608 n=50+49)
ExternalLinkCompiler 1.85s ± 3% 1.83s ± 2% -1.47% (p=0.000 n=49+48)
LinkWithoutDebugCompiler 437ms ± 5% 437ms ± 6% ~ (p=0.953 n=49+50)
[Geo mean] 615ms 613ms -0.37%
name old alloc/op new alloc/op delta
Template 38.7MB ± 1% 38.7MB ± 1% ~ (p=0.834 n=50+50)
Unicode 28.1MB ± 0% 28.1MB ± 0% -0.22% (p=0.000 n=49+50)
GoTypes 168MB ± 1% 168MB ± 1% ~ (p=0.054 n=47+47)
Compiler 23.0MB ± 1% 23.0MB ± 1% ~ (p=0.432 n=50+50)
SSA 1.54GB ± 0% 1.54GB ± 0% +0.21% (p=0.000 n=50+50)
Flate 23.6MB ± 1% 23.6MB ± 1% ~ (p=0.153 n=43+46)
GoParser 35.1MB ± 1% 35.1MB ± 2% ~ (p=0.202 n=50+50)
Reflect 84.7MB ± 1% 84.7MB ± 1% ~ (p=0.333 n=48+49)
Tar 34.5MB ± 1% 34.5MB ± 1% ~ (p=0.406 n=46+49)
XML 44.3MB ± 2% 44.2MB ± 3% ~ (p=0.981 n=50+50)
LinkCompiler 131MB ± 0% 128MB ± 0% -2.74% (p=0.000 n=50+50)
ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.007 n=50+50)
LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% -0.02% (p=0.000 n=50+50)
[Geo mean] 69.3MB 69.1MB -0.22%
file before after Δ %
addr2line 4104220 4043684 -60536 -1.475%
api 5342502 5249678 -92824 -1.737%
asm 4973785 4858257 -115528 -2.323%
buildid 2667844 2625660 -42184 -1.581%
cgo 4686849 4616313 -70536 -1.505%
compile 23667431 23268406 -399025 -1.686%
cover 4959676 4874108 -85568 -1.725%
dist 3515934 3450422 -65512 -1.863%
doc 3995581 3925469 -70112 -1.755%
fix 3379202 3318522 -60680 -1.796%
link 6743249 6629913 -113336 -1.681%
nm 4047529 3991777 -55752 -1.377%
objdump 4456151 4388151 -68000 -1.526%
pack 2435040 2398072 -36968 -1.518%
pprof 13804080 13565808 -238272 -1.726%
test2json 2690043 2645987 -44056 -1.638%
trace 10418492 10232716 -185776 -1.783%
vet 7258259 7121259 -137000 -1.888%
total 113145867 111204202 -1941665 -1.716%
The situation on linux/arm64 is as follow:
name old time/op new time/op delta
Template 280ms ± 1% 282ms ± 1% +0.75% (p=0.000 n=46+48)
Unicode 124ms ± 2% 124ms ± 2% +0.37% (p=0.045 n=50+50)
GoTypes 1.69s ± 1% 1.70s ± 1% +0.56% (p=0.000 n=49+50)
Compiler 122ms ± 1% 123ms ± 1% +0.93% (p=0.000 n=50+50)
SSA 12.6s ± 1% 12.7s ± 0% +0.72% (p=0.000 n=50+50)
Flate 170ms ± 1% 172ms ± 1% +0.97% (p=0.000 n=49+49)
GoParser 262ms ± 1% 263ms ± 1% +0.39% (p=0.000 n=49+48)
Reflect 639ms ± 1% 650ms ± 1% +1.63% (p=0.000 n=49+49)
Tar 243ms ± 1% 245ms ± 1% +0.82% (p=0.000 n=50+50)
XML 324ms ± 1% 327ms ± 1% +0.72% (p=0.000 n=50+49)
LinkCompiler 597ms ± 1% 596ms ± 1% -0.27% (p=0.001 n=48+47)
ExternalLinkCompiler 1.90s ± 1% 1.88s ± 1% -1.00% (p=0.000 n=50+50)
LinkWithoutDebugCompiler 364ms ± 1% 363ms ± 1% ~ (p=0.220 n=49+50)
[Geo mean] 485ms 488ms +0.49%
name old alloc/op new alloc/op delta
Template 38.7MB ± 0% 38.8MB ± 1% ~ (p=0.093 n=43+49)
Unicode 28.4MB ± 0% 28.4MB ± 0% +0.03% (p=0.000 n=49+45)
GoTypes 169MB ± 1% 169MB ± 1% +0.23% (p=0.010 n=50+50)
Compiler 23.2MB ± 1% 23.2MB ± 1% +0.11% (p=0.000 n=40+44)
SSA 1.54GB ± 0% 1.55GB ± 0% +0.45% (p=0.000 n=47+49)
Flate 23.8MB ± 2% 23.8MB ± 1% ~ (p=0.543 n=50+50)
GoParser 35.3MB ± 1% 35.4MB ± 1% ~ (p=0.792 n=50+50)
Reflect 85.2MB ± 1% 85.2MB ± 0% ~ (p=0.055 n=50+47)
Tar 34.5MB ± 1% 34.5MB ± 1% +0.06% (p=0.015 n=50+50)
XML 43.8MB ± 2% 43.9MB ± 2% +0.19% (p=0.000 n=48+48)
LinkCompiler 137MB ± 0% 136MB ± 0% -0.92% (p=0.000 n=50+50)
ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.516 n=50+50)
LinkWithoutDebugCompiler 84.0MB ± 0% 84.0MB ± 0% ~ (p=0.057 n=50+50)
[Geo mean] 70.4MB 70.4MB +0.01%
file before after Δ %
addr2line 4021557 4002933 -18624 -0.463%
api 5127847 5028503 -99344 -1.937%
asm 5034716 4936836 -97880 -1.944%
buildid 2608118 2594094 -14024 -0.538%
cgo 4488592 4398320 -90272 -2.011%
compile 22501129 22213592 -287537 -1.278%
cover 4742301 4713573 -28728 -0.606%
dist 3388071 3365311 -22760 -0.672%
doc 3802250 3776082 -26168 -0.688%
fix 3306147 3216939 -89208 -2.698%
link 6404483 6363699 -40784 -0.637%
nm 3941026 3921930 -19096 -0.485%
objdump 4383330 4295122 -88208 -2.012%
pack 2404547 2389515 -15032 -0.625%
pprof 12996234 12856818 -139416 -1.073%
test2json 2668500 2586788 -81712 -3.062%
trace 9816276 9609580 -206696 -2.106%
vet 6900682 6787338 -113344 -1.643%
total 108535806 107056973 -1478833 -1.363%
Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255239
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backstop support for non-sse2 chips now that 387 is gone.
RELNOTE=yes
Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013
Reviewed-on: https://go-review.googlesource.com/c/go/+/260017
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My last 387 CL. So sad ... ... ... ... not!
Fixes #40255
Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merges an lis + subf into subfic, and for 32b constants
lwa + subf into oris + ori + subf.
The carry bit is no longer used in code generation, therefore
I think we can clobber it as needed. Note, lowered borrow/carry
arithmetic is self-contained and thus is not affected.
A few extra rules are added to ensure early transformations to
SUBFCconst don't trip up earlier rules, fold constant operations,
or otherwise simplify lowering. Likewise, tests are added to
ensure all rules are hit. Generic constant folding catches
trivial cases, however some lowering rules insert arithmetic
which can introduce new opportunities (e.g BitLen or Slicemask).
I couldn't find a specific benchmark to demonstrate noteworthy
improvements, but this is generating subfic in many of the default
bent test binaries, so we are at least saving a little code space.
Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012
Reviewed-on: https://go-review.googlesource.com/c/go/+/249461
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, opt pass can expose more obvious constant-folding
opportunites.
Example:
func test(i int) int {return (i+8)-(i+4)}
The previous version:
MOVD "".i(FP), R0
ADD $8, R0, R1
ADD $4, R0, R0
SUB R0, R1, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
The optimized version:
MOVD $4, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
This patch removes some existing reassociation rules, such as "x+(z-C)",
because the current generic rewrite rules will canonicalize "x-const"
to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))".
This patch also adds test cases.
Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797
Reviewed-on: https://go-review.googlesource.com/c/go/+/237137
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new lowering rule to match and replace such instances
with the MADDLD instruction available on power9 where
possible.
Likewise, this plumbs in a new ppc64 ssa opcode to house
the newly generated MADDLD instructions.
When testing ed25519, this reduced binary size by 936B.
Similarly, MADDLD combination occcurs in a few other less
obvious cases such as division by constant.
Testing of golang.org/x/crypto/ed25519 shows non-trivial
speedup during keygeneration:
name old time/op new time/op delta
KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19%
Signing 64.3µs ± 0% 64.4µs ± 0% +0.16%
Verification 147µs ± 0% 147µs ± 0% +0.11%
Similarly, this test binary has shrunk by 66488B.
Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/248723
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the existing optimizations aren't triggered because they
are handled by the generic rules so this CL removes them. Also
some constraints were copied without much thought from the amd64
rules and they don't make sense on s390x, so we remove those
constraints.
Finally, add a 'multiply by the sum of two powers of two'
optimization. This makes sense on s390x as shifts are low latency
and can also sometimes be optimized further (especially if we add
support for RISBG instructions).
name old time/op new time/op delta
IntMulByConst/3-8 1.70ns ±11% 1.10ns ± 5% -35.26% (p=0.000 n=10+10)
IntMulByConst/5-8 1.64ns ± 7% 1.10ns ± 4% -32.94% (p=0.000 n=10+9)
IntMulByConst/12-8 1.65ns ± 6% 1.20ns ± 4% -27.16% (p=0.000 n=10+9)
IntMulByConst/120-8 1.66ns ± 4% 1.22ns ±13% -26.43% (p=0.000 n=10+10)
IntMulByConst/-120-8 1.65ns ± 7% 1.19ns ± 4% -28.06% (p=0.000 n=9+10)
IntMulByConst/65537-8 0.86ns ± 9% 1.12ns ±12% +30.41% (p=0.000 n=10+10)
IntMulByConst/65538-8 1.65ns ± 5% 1.23ns ± 5% -25.11% (p=0.000 n=10+10)
Change-Id: Ib196e6bff1e97febfd266134d0a2b2a62897989f
Reviewed-on: https://go-review.googlesource.com/c/go/+/248937
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Taking over Zach's CL 212277. Just cleaned up and added a test.
For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.
These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c = n >> log(c) if n >= 0
// = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
(Rsh64x64
(Add64 <t> n (Rsh64Ux64 <t>
(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
(Const64 <typ.UInt64> [64-log2(c)])))
(Const64 <typ.UInt64> [log2(c)]))
If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.
There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.
Fixes #36159
By Printf count, this zeroes 137 right shifts when building std and cmd.
Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit CL 224837.
Reason for revert: Reverting partial reverts of 222782.
Update #37881
Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826
Reviewed-on: https://go-review.googlesource.com/c/go/+/225217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rolling back portions of CL 222782 to see if that helps
issue #37881 any.
Update #37881
Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/224837
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 44343c777c (CL 173557) added rules for handling
divisibility checks for powers of 2 for signed integers, x%c ==0.
This change adds the complementary indivisibility rules, x%c != 0.
Fixes #34166
Change-Id: I87379e30af7aff633371acca82db2397da9b2c07
Reviewed-on: https://go-review.googlesource.com/c/go/+/194219
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17
This general rule can compute divisibilty by one multiplication, and add
and a compare for odd divisors and an additional rotate for even divisors.
To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt". This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also needed.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.
Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.
On amd64, the divisibility check is 30-45% faster.
name old time/op new time/op delta`
DivisiblePow2constI64-4 0.83ns ± 1% 0.82ns ± 0% ~ (p=0.079 n=5+4)
DivisibleconstI64-4 2.68ns ± 1% 1.87ns ± 0% -30.33% (p=0.000 n=5+4)
DivisibleWDivconstI64-4 2.69ns ± 1% 2.71ns ± 3% ~ (p=1.000 n=5+5)
DivisiblePow2constI32-4 1.15ns ± 1% 1.15ns ± 0% ~ (p=0.238 n=5+4)
DivisibleconstI32-4 2.24ns ± 1% 1.20ns ± 0% -46.48% (p=0.016 n=5+4)
DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.683 n=5+5)
DivisiblePow2constI16-4 0.81ns ± 1% 0.82ns ± 1% ~ (p=0.135 n=5+5)
DivisibleconstI16-4 2.11ns ± 2% 1.20ns ± 1% -42.99% (p=0.008 n=5+5)
DivisibleWDivconstI16-4 2.23ns ± 0% 2.27ns ± 2% +1.79% (p=0.029 n=4+4)
DivisiblePow2constI8-4 0.81ns ± 1% 0.81ns ± 1% ~ (p=0.286 n=5+5)
DivisibleconstI8-4 2.13ns ± 3% 1.19ns ± 1% -43.84% (p=0.008 n=5+5)
DivisibleWDivconstI8-4 2.23ns ± 1% 2.25ns ± 1% ~ (p=0.183 n=5+5)
Fixes #30282
Fixes #15806
Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/173998
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17
This general rule can compute divisibilty by one multiplication and a compare
for odd divisors and an additional rotate for even divisors.
To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt". This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also available.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.
Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.
Additional rules to lower the generic RotateLeft* ops were also applied.
On amd64, the divisibility check is 25-50% faster.
name old time/op new time/op delta
DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5)
DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5)
DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5)
DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5)
DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5)
DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5)
DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5)
DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5)
DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5)
DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5)
DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4)
DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5)
DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5)
DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5)
DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5)
DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5)
DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5)
DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5)
DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5)
DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4)
DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5)
DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5)
DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5)
DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5)
A follow-up change will address the signed division case.
Updates #30282
Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403
Reviewed-on: https://go-review.googlesource.com/c/go/+/168037
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For powers of two (c=1<<k), the divisibility check x%c == 0 can be made
just by checking the trailing zeroes via a mask x&(c-1) == 0 even for signed
integers. This avoids division fix-ups when just divisibility check is needed.
To apply this rule, we match on the fixed-up version of the division. This is
neccessary because the mod and division rewrite rules are already applied
during the initial opt pass.
The speed up on amd64 due to elimination of unneccessary fix-up code is ~55%:
name old time/op new time/op delta
DivconstI64-4 2.08ns ± 0% 2.09ns ± 1% ~ (p=0.730 n=5+5)
DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.66% (p=0.008 n=5+5)
DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=0.683 n=5+5)
DivconstI32-4 1.53ns ± 0% 1.53ns ± 1% ~ (p=0.968 n=4+5)
DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 1% -54.97% (p=0.008 n=5+5)
DivconstU32-4 1.78ns ± 1% 1.80ns ± 2% ~ (p=0.206 n=5+5)
DivconstI16-4 1.54ns ± 2% 1.54ns ± 0% ~ (p=0.238 n=5+4)
DivisiblePow2constI16-4 1.78ns ± 0% 0.81ns ± 1% -54.72% (p=0.000 n=4+5)
DivconstU16-4 1.00ns ± 5% 1.01ns ± 1% ~ (p=0.119 n=5+5)
DivconstI8-4 1.54ns ± 0% 1.54ns ± 2% ~ (p=0.571 n=4+5)
DivisiblePow2constI8-4 1.78ns ± 0% 0.82ns ± 8% -53.71% (p=0.008 n=5+5)
DivconstU8-4 0.93ns ± 1% 0.93ns ± 1% ~ (p=0.643 n=5+5)
A follow-up CL will address the general case of x%c == 0 for signed integers.
Updates #15806
Change-Id: Iabadbbe369b6e0998c8ce85d038ebc236142e42a
Reviewed-on: https://go-review.googlesource.com/c/go/+/173557
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
This reverts CL 168038 (git 68819fb6d2bab59e4eadcdf62aa4a2a54417d640)
Reason for revert: Doesn't work on 32 bit archs.
Change-Id: Idec9098060dc65bc2f774c5383f0477f8eb63a3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/173442
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For powers of two (c=1<<k), the divisibility check x%c == 0 can be made
just by checking the trailing zeroes via a mask x&(c-1)==0 even for signed
integers. This avoids division fixups when just divisibility check is needed.
To apply this rule the generic divisibility rule for A%B = A-(A/B*B) is disabled
on the "opt" pass, but this does not affect generated code as this rule is applied
later.
The speed up on amd64 due to elimination of unneccessary fixup code is ~55%:
name old time/op new time/op delta
DivconstI64-4 2.08ns ± 0% 2.07ns ± 0% ~ (p=0.079 n=5+5)
DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.55% (p=0.008 n=5+5)
DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=1.000 n=5+5)
DivconstI32-4 1.53ns ± 0% 1.53ns ± 0% ~ (all equal)
DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 4% -54.75% (p=0.008 n=5+5)
DivconstU32-4 1.78ns ± 1% 1.78ns ± 1% ~ (p=1.000 n=5+5)
DivconstI16-4 1.54ns ± 2% 1.53ns ± 0% ~ (p=0.333 n=5+4)
DivisiblePow2constI16-4 1.78ns ± 0% 0.79ns ± 1% -55.39% (p=0.000 n=4+5)
DivconstU16-4 1.00ns ± 5% 0.99ns ± 1% ~ (p=0.730 n=5+5)
DivconstI8-4 1.54ns ± 0% 1.53ns ± 0% ~ (p=0.714 n=4+5)
DivisiblePow2constI8-4 1.78ns ± 0% 0.80ns ± 0% -55.06% (p=0.000 n=5+4)
DivconstU8-4 0.93ns ± 1% 0.95ns ± 1% +1.72% (p=0.024 n=5+5)
A follow-up CL will address the general case of x%c == 0 for signed integers.
Updates #15806
Change-Id: I0d284863774b1bc8c4ce87443bbaec6103e14ef4
Reviewed-on: https://go-review.googlesource.com/c/go/+/168038
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'SUBQ $-0x80, r' is shorter to encode than 'ADDQ $0x80, r',
and functionally equivalent. Use it instead.
Shaves off a few bytes here and there:
file before after Δ %
compile 25935856 25927664 -8192 -0.032%
nm 4251840 4247744 -4096 -0.096%
Change-Id: Ia9e02ea38cbded6a52a613b92e3a914f878d931e
Reviewed-on: https://go-review.googlesource.com/c/go/+/168344
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
| |
Change-Id: I33f5b5051e5f75aa264ec656926223c5a3c09c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167498
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
This CL adds several test cases of arithmetic operations for
386/amd64/arm/arm64.
Change-Id: I362687c06249f31091458a1d8c45fc4d006b616a
Reviewed-on: https://go-review.googlesource.com/c/151897
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This CL fixes several typos and adds two more cases
to arithmetic test.
Change-Id: I086560162ea351e2166866e444e2317da36c1729
Reviewed-on: https://go-review.googlesource.com/c/145210
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL optimize amd64's code:
"ADDQ $-1, MEM_OP" -> "DECQ MEM_OP"
"ADDL $-1, MEM_OP" -> "DECL MEM_OP"
1. The total size of pkg/linux_amd64 (excluding cmd/compile)
decreases about 0.1KB.
2. The go1 benchmark shows little regression, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 2.60s ± 5% 2.64s ± 3% +1.53% (p=0.000 n=38+39)
Fannkuch11-4 2.37s ± 2% 2.38s ± 2% ~ (p=0.950 n=40+40)
FmtFprintfEmpty-4 40.4ns ± 5% 40.5ns ± 5% ~ (p=0.711 n=40+40)
FmtFprintfString-4 72.4ns ± 5% 72.3ns ± 3% ~ (p=0.485 n=40+40)
FmtFprintfInt-4 79.7ns ± 3% 80.1ns ± 3% ~ (p=0.124 n=40+40)
FmtFprintfIntInt-4 126ns ± 3% 127ns ± 3% +0.71% (p=0.027 n=40+40)
FmtFprintfPrefixedInt-4 153ns ± 4% 153ns ± 2% ~ (p=0.604 n=40+40)
FmtFprintfFloat-4 206ns ± 5% 210ns ± 5% +1.79% (p=0.002 n=40+40)
FmtManyArgs-4 498ns ± 3% 496ns ± 3% ~ (p=0.099 n=40+40)
GobDecode-4 6.48ms ± 6% 6.47ms ± 7% ~ (p=0.686 n=39+40)
GobEncode-4 5.95ms ± 7% 5.96ms ± 6% ~ (p=0.670 n=40+34)
Gzip-4 224ms ± 6% 223ms ± 5% ~ (p=0.143 n=40+40)
Gunzip-4 36.5ms ± 4% 36.5ms ± 4% ~ (p=0.556 n=40+40)
HTTPClientServer-4 60.7µs ± 2% 59.9µs ± 3% -1.20% (p=0.000 n=39+39)
JSONEncode-4 9.03ms ± 4% 9.04ms ± 4% ~ (p=0.589 n=40+40)
JSONDecode-4 49.4ms ± 4% 49.2ms ± 4% ~ (p=0.276 n=40+40)
Mandelbrot200-4 3.80ms ± 4% 3.79ms ± 4% ~ (p=0.837 n=40+40)
GoParse-4 3.15ms ± 5% 3.13ms ± 5% ~ (p=0.240 n=40+40)
RegexpMatchEasy0_32-4 72.9ns ± 3% 72.0ns ± 8% -1.25% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 229ns ± 5% 230ns ± 4% ~ (p=0.318 n=40+40)
RegexpMatchEasy1_32-4 66.9ns ± 3% 67.3ns ± 7% ~ (p=0.817 n=40+40)
RegexpMatchEasy1_1K-4 371ns ± 5% 370ns ± 4% ~ (p=0.275 n=40+40)
RegexpMatchMedium_32-4 106ns ± 4% 104ns ± 7% -2.28% (p=0.000 n=40+40)
RegexpMatchMedium_1K-4 32.0µs ± 2% 31.4µs ± 3% -2.08% (p=0.000 n=40+40)
RegexpMatchHard_32-4 1.54µs ± 7% 1.52µs ± 3% -1.80% (p=0.007 n=39+40)
RegexpMatchHard_1K-4 45.8µs ± 4% 45.5µs ± 3% ~ (p=0.707 n=40+40)
Revcomp-4 401ms ± 5% 401ms ± 6% ~ (p=0.935 n=40+40)
Template-4 62.4ms ± 4% 61.2ms ± 3% -1.85% (p=0.000 n=40+40)
TimeParse-4 315ns ± 2% 318ns ± 3% +1.10% (p=0.002 n=40+40)
TimeFormat-4 297ns ± 3% 298ns ± 3% ~ (p=0.238 n=40+40)
[Geo mean] 45.8µs 45.7µs -0.22%
name old speed new speed delta
GobDecode-4 119MB/s ± 6% 119MB/s ± 7% ~ (p=0.684 n=39+40)
GobEncode-4 129MB/s ± 7% 128MB/s ± 6% ~ (p=0.413 n=40+34)
Gzip-4 86.6MB/s ± 6% 87.0MB/s ± 6% ~ (p=0.145 n=40+40)
Gunzip-4 532MB/s ± 4% 532MB/s ± 4% ~ (p=0.556 n=40+40)
JSONEncode-4 215MB/s ± 4% 215MB/s ± 4% ~ (p=0.583 n=40+40)
JSONDecode-4 39.3MB/s ± 4% 39.5MB/s ± 4% ~ (p=0.277 n=40+40)
GoParse-4 18.4MB/s ± 5% 18.5MB/s ± 5% ~ (p=0.229 n=40+40)
RegexpMatchEasy0_32-4 439MB/s ± 3% 445MB/s ± 8% +1.28% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 4.46GB/s ± 4% 4.45GB/s ± 4% ~ (p=0.343 n=40+40)
RegexpMatchEasy1_32-4 479MB/s ± 3% 476MB/s ± 7% ~ (p=0.855 n=40+40)
RegexpMatchEasy1_1K-4 2.76GB/s ± 5% 2.77GB/s ± 4% ~ (p=0.250 n=40+40)
RegexpMatchMedium_32-4 9.36MB/s ± 4% 9.58MB/s ± 6% +2.31% (p=0.001 n=40+40)
RegexpMatchMedium_1K-4 32.0MB/s ± 2% 32.7MB/s ± 3% +2.12% (p=0.000 n=40+40)
RegexpMatchHard_32-4 20.7MB/s ± 7% 21.1MB/s ± 3% +1.95% (p=0.005 n=40+40)
RegexpMatchHard_1K-4 22.4MB/s ± 4% 22.5MB/s ± 3% ~ (p=0.689 n=40+40)
Revcomp-4 634MB/s ± 5% 634MB/s ± 6% ~ (p=0.935 n=40+40)
Template-4 31.1MB/s ± 3% 31.7MB/s ± 3% +1.88% (p=0.000 n=40+40)
[Geo mean] 129MB/s 130MB/s +0.62%
Change-Id: I9d61ee810d900920c572cbe89e2f1626bfed12b7
Reviewed-on: https://go-review.googlesource.com/c/145209
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
prove is able to find 94 occurrences in std cmd where a divisor
can't have the value -1. The change removes
the extraneous fix-up code for these cases.
Fixes #25239
Change-Id: Ic184de971f47cc57c702eb72805b8e291c14035d
Reviewed-on: https://go-review.googlesource.com/c/130215
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Adding cases for ppc64,ppc64le to the codegen tests
where appropriate.
Change-Id: Idf8cbe88a4ab4406a4ef1ea777bd15a58b68f3ed
Reviewed-on: https://go-review.googlesource.com/c/142557
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
This CL adds tests for armv7's MULS and arm64's MSUBW.
Change-Id: Id0fd5d26fd477e4ed14389b0d33cad930423eb5b
Reviewed-on: https://go-review.googlesource.com/c/141651
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the
constant is +1/-1.
1. The total size of pkg/linux_386/ decreases 28 bytes, excluding
cmd/compile.
2. There is no regression in the go1 benchmark test, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.25s ± 2% 3.23s ± 3% -0.70% (p=0.040 n=30+30)
Fannkuch11-4 3.50s ± 1% 3.47s ± 1% -0.68% (p=0.000 n=30+30)
FmtFprintfEmpty-4 44.6ns ± 3% 44.8ns ± 3% +0.46% (p=0.029 n=30+30)
FmtFprintfString-4 79.0ns ± 3% 78.7ns ± 3% ~ (p=0.053 n=30+30)
FmtFprintfInt-4 89.2ns ± 2% 89.4ns ± 3% ~ (p=0.665 n=30+29)
FmtFprintfIntInt-4 142ns ± 3% 142ns ± 3% ~ (p=0.435 n=30+30)
FmtFprintfPrefixedInt-4 182ns ± 2% 182ns ± 2% ~ (p=0.964 n=30+30)
FmtFprintfFloat-4 407ns ± 3% 411ns ± 4% ~ (p=0.080 n=30+30)
FmtManyArgs-4 597ns ± 3% 593ns ± 4% ~ (p=0.222 n=30+30)
GobDecode-4 7.09ms ± 6% 7.07ms ± 7% ~ (p=0.633 n=30+30)
GobEncode-4 6.81ms ± 9% 6.81ms ± 8% ~ (p=0.982 n=30+30)
Gzip-4 398ms ± 4% 400ms ± 6% ~ (p=0.177 n=30+30)
Gunzip-4 41.3ms ± 3% 40.6ms ± 4% -1.71% (p=0.005 n=30+30)
HTTPClientServer-4 63.4µs ± 3% 63.4µs ± 4% ~ (p=0.646 n=30+28)
JSONEncode-4 16.0ms ± 3% 16.1ms ± 3% ~ (p=0.057 n=30+30)
JSONDecode-4 63.3ms ± 8% 63.1ms ± 7% ~ (p=0.786 n=30+30)
Mandelbrot200-4 5.17ms ± 3% 5.15ms ± 8% ~ (p=0.654 n=30+30)
GoParse-4 3.24ms ± 3% 3.23ms ± 2% ~ (p=0.091 n=30+30)
RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.575 n=30+30)
RegexpMatchEasy0_1K-4 823ns ± 2% 821ns ± 3% ~ (p=0.827 n=30+30)
RegexpMatchEasy1_32-4 113ns ± 3% 112ns ± 3% ~ (p=0.076 n=30+30)
RegexpMatchEasy1_1K-4 1.02µs ± 4% 1.01µs ± 5% ~ (p=0.087 n=30+30)
RegexpMatchMedium_32-4 129ns ± 3% 127ns ± 4% -1.55% (p=0.009 n=30+30)
RegexpMatchMedium_1K-4 39.3µs ± 4% 39.7µs ± 3% ~ (p=0.054 n=30+30)
RegexpMatchHard_32-4 2.15µs ± 4% 2.15µs ± 4% ~ (p=0.712 n=30+30)
RegexpMatchHard_1K-4 66.0µs ± 3% 65.1µs ± 3% -1.32% (p=0.002 n=30+30)
Revcomp-4 1.85s ± 2% 1.85s ± 3% ~ (p=0.168 n=30+30)
Template-4 69.5ms ± 7% 68.9ms ± 6% ~ (p=0.250 n=28+28)
TimeParse-4 434ns ± 3% 432ns ± 4% ~ (p=0.629 n=30+30)
TimeFormat-4 403ns ± 4% 408ns ± 3% +1.23% (p=0.019 n=30+29)
[Geo mean] 65.5µs 65.3µs -0.20%
name old speed new speed delta
GobDecode-4 108MB/s ± 6% 109MB/s ± 6% ~ (p=0.636 n=30+30)
GobEncode-4 113MB/s ±10% 113MB/s ± 9% ~ (p=0.982 n=30+30)
Gzip-4 48.8MB/s ± 4% 48.6MB/s ± 5% ~ (p=0.178 n=30+30)
Gunzip-4 470MB/s ± 3% 479MB/s ± 4% +1.72% (p=0.006 n=30+30)
JSONEncode-4 121MB/s ± 3% 120MB/s ± 3% ~ (p=0.057 n=30+30)
JSONDecode-4 30.7MB/s ± 8% 30.8MB/s ± 8% ~ (p=0.784 n=30+30)
GoParse-4 17.9MB/s ± 3% 17.9MB/s ± 2% ~ (p=0.090 n=30+30)
RegexpMatchEasy0_32-4 309MB/s ± 4% 309MB/s ± 3% ~ (p=0.530 n=30+30)
RegexpMatchEasy0_1K-4 1.24GB/s ± 2% 1.25GB/s ± 3% ~ (p=0.976 n=30+30)
RegexpMatchEasy1_32-4 282MB/s ± 3% 284MB/s ± 3% +0.81% (p=0.041 n=30+30)
RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.01GB/s ± 4% ~ (p=0.091 n=30+30)
RegexpMatchMedium_32-4 7.71MB/s ± 3% 7.84MB/s ± 4% +1.71% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 26.1MB/s ± 4% 25.8MB/s ± 3% ~ (p=0.051 n=30+30)
RegexpMatchHard_32-4 14.9MB/s ± 4% 14.9MB/s ± 4% ~ (p=0.712 n=30+30)
RegexpMatchHard_1K-4 15.5MB/s ± 3% 15.7MB/s ± 3% +1.34% (p=0.003 n=30+30)
Revcomp-4 138MB/s ± 2% 137MB/s ± 3% ~ (p=0.174 n=30+30)
Template-4 28.0MB/s ± 6% 28.2MB/s ± 6% ~ (p=0.251 n=28+28)
[Geo mean] 82.3MB/s 82.6MB/s +0.36%
Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1
Reviewed-on: https://go-review.googlesource.com/c/140701
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL implements indexed memory operands for the following instructions.
(ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4
(ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4
(ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4
1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding
cmd/compile/ .
2. There is little regression in the go1 benchmark test, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 3.25s ± 3% 3.25s ± 3% ~ (p=0.218 n=40+40)
Fannkuch11-4 3.53s ± 1% 3.53s ± 1% ~ (p=0.303 n=40+40)
FmtFprintfEmpty-4 44.9ns ± 3% 45.6ns ± 3% +1.48% (p=0.030 n=40+36)
FmtFprintfString-4 78.7ns ± 5% 80.1ns ± 7% ~ (p=0.217 n=36+40)
FmtFprintfInt-4 90.2ns ± 6% 89.8ns ± 5% ~ (p=0.659 n=40+38)
FmtFprintfIntInt-4 140ns ± 5% 141ns ± 5% +1.00% (p=0.027 n=40+40)
FmtFprintfPrefixedInt-4 185ns ± 3% 183ns ± 3% ~ (p=0.104 n=40+40)
FmtFprintfFloat-4 411ns ± 4% 406ns ± 3% -1.37% (p=0.005 n=40+40)
FmtManyArgs-4 590ns ± 4% 598ns ± 4% +1.35% (p=0.008 n=40+40)
GobDecode-4 7.16ms ± 5% 7.10ms ± 5% ~ (p=0.335 n=40+40)
GobEncode-4 6.85ms ± 7% 6.74ms ± 9% ~ (p=0.058 n=38+40)
Gzip-4 400ms ± 4% 399ms ± 2% -0.34% (p=0.003 n=40+33)
Gunzip-4 41.4ms ± 3% 41.4ms ± 4% -0.12% (p=0.020 n=40+40)
HTTPClientServer-4 64.1µs ± 4% 63.5µs ± 2% -1.07% (p=0.000 n=39+37)
JSONEncode-4 15.9ms ± 2% 15.9ms ± 3% ~ (p=0.103 n=40+40)
JSONDecode-4 62.2ms ± 4% 61.6ms ± 3% -0.98% (p=0.006 n=39+40)
Mandelbrot200-4 5.18ms ± 3% 5.14ms ± 4% ~ (p=0.125 n=40+40)
GoParse-4 3.29ms ± 2% 3.27ms ± 2% -0.66% (p=0.006 n=40+40)
RegexpMatchEasy0_32-4 103ns ± 4% 103ns ± 4% ~ (p=0.632 n=40+40)
RegexpMatchEasy0_1K-4 830ns ± 3% 828ns ± 3% ~ (p=0.563 n=40+40)
RegexpMatchEasy1_32-4 113ns ± 4% 113ns ± 4% ~ (p=0.494 n=40+40)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 4% ~ (p=0.665 n=40+40)
RegexpMatchMedium_32-4 130ns ± 4% 129ns ± 3% ~ (p=0.458 n=40+40)
RegexpMatchMedium_1K-4 39.4µs ± 3% 39.7µs ± 3% ~ (p=0.825 n=40+40)
RegexpMatchHard_32-4 2.16µs ± 4% 2.15µs ± 4% ~ (p=0.137 n=40+40)
RegexpMatchHard_1K-4 65.2µs ± 3% 65.4µs ± 4% ~ (p=0.160 n=40+40)
Revcomp-4 1.87s ± 2% 1.87s ± 1% +0.17% (p=0.019 n=33+33)
Template-4 69.4ms ± 3% 69.8ms ± 3% +0.60% (p=0.009 n=40+40)
TimeParse-4 437ns ± 4% 438ns ± 4% ~ (p=0.234 n=40+40)
TimeFormat-4 408ns ± 3% 408ns ± 3% ~ (p=0.904 n=40+40)
[Geo mean] 65.7µs 65.6µs -0.08%
name old speed new speed delta
GobDecode-4 107MB/s ± 5% 108MB/s ± 5% ~ (p=0.336 n=40+40)
GobEncode-4 112MB/s ± 6% 114MB/s ± 9% +1.95% (p=0.036 n=37+40)
Gzip-4 48.5MB/s ± 4% 48.6MB/s ± 2% +0.28% (p=0.003 n=40+33)
Gunzip-4 469MB/s ± 4% 469MB/s ± 4% +0.11% (p=0.021 n=40+40)
JSONEncode-4 122MB/s ± 2% 122MB/s ± 3% ~ (p=0.105 n=40+40)
JSONDecode-4 31.2MB/s ± 4% 31.5MB/s ± 4% +0.99% (p=0.007 n=39+40)
GoParse-4 17.6MB/s ± 2% 17.7MB/s ± 2% +0.66% (p=0.007 n=40+40)
RegexpMatchEasy0_32-4 310MB/s ± 4% 310MB/s ± 4% ~ (p=0.384 n=40+40)
RegexpMatchEasy0_1K-4 1.23GB/s ± 3% 1.24GB/s ± 3% ~ (p=0.186 n=40+40)
RegexpMatchEasy1_32-4 283MB/s ± 3% 281MB/s ± 4% ~ (p=0.855 n=40+40)
RegexpMatchEasy1_1K-4 1.00GB/s ± 4% 1.00GB/s ± 4% ~ (p=0.665 n=40+40)
RegexpMatchMedium_32-4 7.68MB/s ± 4% 7.73MB/s ± 3% ~ (p=0.359 n=40+40)
RegexpMatchMedium_1K-4 26.0MB/s ± 3% 25.8MB/s ± 3% ~ (p=0.825 n=40+40)
RegexpMatchHard_32-4 14.8MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.136 n=40+40)
RegexpMatchHard_1K-4 15.7MB/s ± 3% 15.7MB/s ± 4% ~ (p=0.150 n=40+40)
Revcomp-4 136MB/s ± 1% 136MB/s ± 1% -0.09% (p=0.028 n=32+33)
Template-4 28.0MB/s ± 3% 27.8MB/s ± 3% -0.59% (p=0.010 n=40+40)
[Geo mean] 82.1MB/s 82.3MB/s +0.25%
Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0
Reviewed-on: https://go-review.googlesource.com/c/140303
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL optimizes arm64's NEG/MVN/TST/CMN with a shifted operand.
1. The total size of pkg/android_arm64 decreases about 0.2KB, excluding
cmd/compile/ .
2. The go1 benchmark shows no regression, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.4s ± 1% 16.4s ± 1% ~ (p=0.914 n=29+29)
Fannkuch11-4 8.72s ± 0% 8.72s ± 0% ~ (p=0.274 n=30+29)
FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal)
FmtFprintfString-4 370ns ± 0% 370ns ± 0% ~ (all equal)
FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 672ns ± 1% 675ns ± 2% ~ (p=0.217 n=28+30)
FmtFprintfPrefixedInt-4 806ns ± 0% 806ns ± 0% ~ (p=0.402 n=30+28)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% +0.02% (p=0.011 n=22+27)
FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% ~ (p=0.279 n=29+30)
GobDecode-4 33.1ms ± 1% 33.1ms ± 0% ~ (p=0.052 n=28+29)
GobEncode-4 29.6ms ± 0% 29.6ms ± 0% +0.08% (p=0.013 n=28+29)
Gzip-4 1.38s ± 2% 1.39s ± 2% ~ (p=0.071 n=29+29)
Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.265 n=29+29)
HTTPClientServer-4 789µs ± 4% 785µs ± 4% ~ (p=0.206 n=29+28)
JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30)
JSONDecode-4 266ms ± 1% 267ms ± 1% +0.34% (p=0.000 n=30+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.835 n=28+30)
GoParse-4 15.9ms ± 0% 15.8ms ± 0% -0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 380ns ± 0% 381ns ± 0% +0.18% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 0% +0.23% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 357ns ± 0% 358ns ± 0% +0.28% (p=0.000 n=29+29)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.04µs ± 0% +0.06% (p=0.006 n=30+30)
RegexpMatchMedium_32-4 589ns ± 0% 590ns ± 0% +0.24% (p=0.000 n=28+30)
RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% -0.01% (p=0.027 n=26+29)
RegexpMatchHard_32-4 9.58µs ± 0% 9.58µs ± 0% ~ (p=0.935 n=30+30)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% ~ (p=0.387 n=29+30)
Revcomp-4 2.50s ± 0% 2.50s ± 0% -0.10% (p=0.020 n=28+28)
Template-4 310ms ± 0% 310ms ± 1% ~ (p=0.406 n=30+30)
TimeParse-4 1.68µs ± 0% 1.68µs ± 0% +0.03% (p=0.014 n=30+17)
TimeFormat-4 1.65µs ± 0% 1.66µs ± 0% +0.32% (p=0.000 n=27+29)
[Geo mean] 247µs 247µs +0.05%
name old speed new speed delta
GobDecode-4 23.2MB/s ± 0% 23.2MB/s ± 0% -0.08% (p=0.032 n=27+29)
GobEncode-4 26.0MB/s ± 0% 25.9MB/s ± 0% -0.10% (p=0.011 n=29+29)
Gzip-4 14.1MB/s ± 2% 14.0MB/s ± 2% ~ (p=0.081 n=29+29)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.290 n=29+29)
JSONEncode-4 39.0MB/s ± 0% 39.1MB/s ± 0% +0.25% (p=0.000 n=29+30)
JSONDecode-4 7.30MB/s ± 1% 7.28MB/s ± 1% -0.33% (p=0.000 n=30+30)
GoParse-4 3.65MB/s ± 0% 3.66MB/s ± 0% +0.29% (p=0.000 n=27+30)
RegexpMatchEasy0_32-4 84.1MB/s ± 0% 84.0MB/s ± 0% -0.17% (p=0.000 n=30+28)
RegexpMatchEasy0_1K-4 864MB/s ± 0% 862MB/s ± 0% -0.24% (p=0.000 n=30+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.3MB/s ± 0% -0.18% (p=0.000 n=28+24)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 502MB/s ± 0% -0.05% (p=0.008 n=30+29)
RegexpMatchMedium_32-4 1.70MB/s ± 0% 1.69MB/s ± 0% -0.59% (p=0.000 n=29+30)
RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.31MB/s ± 0% +0.05% (p=0.005 n=30+26)
RegexpMatchHard_32-4 3.34MB/s ± 0% 3.34MB/s ± 0% ~ (all equal)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal)
Revcomp-4 102MB/s ± 0% 102MB/s ± 0% +0.10% (p=0.022 n=28+28)
Template-4 6.26MB/s ± 0% 6.26MB/s ± 1% ~ (p=0.768 n=30+30)
[Geo mean] 24.2MB/s 24.1MB/s -0.08%
Change-Id: I494f9db7f8a568a00e9c74ae25086a58b2221683
Reviewed-on: https://go-review.googlesource.com/137976
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL implements constant folding for MADD/MSUB on arm64.
1. The total size of pkg/android_arm64/ decreases about 4KB,
excluding cmd/compile/ .
2. There is no regression in the go1 benchmark, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.4s ± 1% 16.5s ± 1% +0.24% (p=0.008 n=29+29)
Fannkuch11-4 8.73s ± 0% 8.71s ± 0% -0.15% (p=0.000 n=29+29)
FmtFprintfEmpty-4 174ns ± 0% 174ns ± 0% ~ (all equal)
FmtFprintfString-4 370ns ± 0% 372ns ± 2% +0.53% (p=0.007 n=24+30)
FmtFprintfInt-4 419ns ± 0% 419ns ± 0% ~ (all equal)
FmtFprintfIntInt-4 673ns ± 1% 661ns ± 1% -1.81% (p=0.000 n=30+27)
FmtFprintfPrefixedInt-4 806ns ± 0% 805ns ± 0% ~ (p=0.957 n=28+27)
FmtFprintfFloat-4 1.09µs ± 0% 1.09µs ± 0% -0.04% (p=0.001 n=22+30)
FmtManyArgs-4 2.67µs ± 0% 2.68µs ± 0% +0.03% (p=0.045 n=29+28)
GobDecode-4 33.2ms ± 1% 32.5ms ± 1% -2.11% (p=0.000 n=29+29)
GobEncode-4 29.5ms ± 0% 29.2ms ± 0% -1.04% (p=0.000 n=28+28)
Gzip-4 1.39s ± 2% 1.38s ± 1% -0.48% (p=0.023 n=30+30)
Gunzip-4 139ms ± 0% 139ms ± 0% ~ (p=0.616 n=30+28)
HTTPClientServer-4 766µs ± 4% 758µs ± 3% -1.03% (p=0.013 n=28+29)
JSONEncode-4 49.7ms ± 0% 49.6ms ± 0% -0.24% (p=0.000 n=30+30)
JSONDecode-4 266ms ± 0% 268ms ± 1% +1.07% (p=0.000 n=29+30)
Mandelbrot200-4 16.6ms ± 0% 16.6ms ± 0% ~ (p=0.248 n=30+29)
GoParse-4 15.9ms ± 0% 16.0ms ± 0% +0.76% (p=0.000 n=29+29)
RegexpMatchEasy0_32-4 381ns ± 0% 380ns ± 0% -0.14% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 1.18µs ± 0% 1.19µs ± 1% +0.30% (p=0.000 n=29+30)
RegexpMatchEasy1_32-4 357ns ± 0% 357ns ± 0% ~ (all equal)
RegexpMatchEasy1_1K-4 2.04µs ± 0% 2.05µs ± 0% +0.50% (p=0.000 n=26+28)
RegexpMatchMedium_32-4 590ns ± 0% 589ns ± 0% -0.12% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 162µs ± 0% 162µs ± 0% ~ (p=0.318 n=28+25)
RegexpMatchHard_32-4 9.56µs ± 0% 9.56µs ± 0% ~ (p=0.072 n=30+29)
RegexpMatchHard_1K-4 287µs ± 0% 287µs ± 0% -0.02% (p=0.005 n=28+28)
Revcomp-4 2.50s ± 0% 2.51s ± 0% ~ (p=0.246 n=29+29)
Template-4 312ms ± 1% 313ms ± 1% +0.46% (p=0.002 n=30+30)
TimeParse-4 1.68µs ± 0% 1.67µs ± 0% -0.31% (p=0.000 n=27+29)
TimeFormat-4 1.66µs ± 0% 1.64µs ± 0% -0.92% (p=0.000 n=29+26)
[Geo mean] 247µs 246µs -0.15%
name old speed new speed delta
GobDecode-4 23.1MB/s ± 1% 23.6MB/s ± 0% +2.17% (p=0.000 n=29+28)
GobEncode-4 26.0MB/s ± 0% 26.3MB/s ± 0% +1.05% (p=0.000 n=28+28)
Gzip-4 14.0MB/s ± 2% 14.1MB/s ± 1% +0.47% (p=0.026 n=30+30)
Gunzip-4 139MB/s ± 0% 139MB/s ± 0% ~ (p=0.624 n=30+28)
JSONEncode-4 39.1MB/s ± 0% 39.2MB/s ± 0% +0.24% (p=0.000 n=30+30)
JSONDecode-4 7.31MB/s ± 0% 7.23MB/s ± 1% -1.07% (p=0.000 n=28+30)
GoParse-4 3.65MB/s ± 0% 3.62MB/s ± 0% -0.77% (p=0.000 n=29+29)
RegexpMatchEasy0_32-4 84.0MB/s ± 0% 84.1MB/s ± 0% +0.18% (p=0.000 n=28+30)
RegexpMatchEasy0_1K-4 864MB/s ± 0% 861MB/s ± 1% -0.29% (p=0.000 n=29+30)
RegexpMatchEasy1_32-4 89.5MB/s ± 0% 89.5MB/s ± 0% ~ (p=0.841 n=28+28)
RegexpMatchEasy1_1K-4 502MB/s ± 0% 500MB/s ± 0% -0.51% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 1.69MB/s ± 0% 1.70MB/s ± 0% +0.41% (p=0.000 n=26+30)
RegexpMatchMedium_1K-4 6.31MB/s ± 0% 6.30MB/s ± 0% ~ (p=0.129 n=30+25)
RegexpMatchHard_32-4 3.35MB/s ± 0% 3.35MB/s ± 0% ~ (p=0.657 n=30+29)
RegexpMatchHard_1K-4 3.57MB/s ± 0% 3.57MB/s ± 0% ~ (all equal)
Revcomp-4 102MB/s ± 0% 101MB/s ± 0% ~ (p=0.213 n=29+29)
Template-4 6.22MB/s ± 1% 6.19MB/s ± 1% -0.42% (p=0.005 n=30+29)
[Geo mean] 24.1MB/s 24.2MB/s +0.08%
Change-Id: I6c02d3c9975f6bd8bc215cb1fc14d29602b45649
Reviewed-on: https://go-review.googlesource.com/138095
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MADD does MUL-ADD in a single instruction, and MSUB does the
similiar simplification for MUL-SUB.
The CL implements the optimization with MADD/MSUB.
1. The total size of pkg/android_arm64/ decreases about 20KB,
excluding cmd/compile/.
2. The go1 benchmark shows a little improvement for RegexpMatchHard_32-4
and Template-4, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 16.3s ± 1% 16.5s ± 1% +1.41% (p=0.000 n=26+28)
Fannkuch11-4 8.79s ± 1% 8.76s ± 0% -0.36% (p=0.000 n=26+28)
FmtFprintfEmpty-4 172ns ± 0% 172ns ± 0% ~ (all equal)
FmtFprintfString-4 362ns ± 1% 364ns ± 0% +0.55% (p=0.000 n=30+30)
FmtFprintfInt-4 416ns ± 0% 416ns ± 0% ~ (p=0.099 n=22+30)
FmtFprintfIntInt-4 655ns ± 1% 660ns ± 1% +0.76% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 810ns ± 0% 809ns ± 0% -0.08% (p=0.009 n=29+29)
FmtFprintfFloat-4 1.08µs ± 0% 1.09µs ± 0% +0.61% (p=0.000 n=30+29)
FmtManyArgs-4 2.70µs ± 0% 2.69µs ± 0% -0.23% (p=0.000 n=29+28)
GobDecode-4 32.2ms ± 1% 32.1ms ± 1% -0.39% (p=0.000 n=27+26)
GobEncode-4 27.4ms ± 2% 27.4ms ± 1% ~ (p=0.864 n=28+28)
Gzip-4 1.53s ± 1% 1.52s ± 1% -0.30% (p=0.031 n=29+29)
Gunzip-4 146ms ± 0% 146ms ± 0% -0.14% (p=0.001 n=25+30)
HTTPClientServer-4 1.00ms ± 4% 0.98ms ± 6% -1.65% (p=0.001 n=29+30)
JSONEncode-4 67.3ms ± 1% 67.2ms ± 1% ~ (p=0.520 n=28+28)
JSONDecode-4 329ms ± 5% 330ms ± 4% ~ (p=0.142 n=30+30)
Mandelbrot200-4 17.3ms ± 0% 17.3ms ± 0% ~ (p=0.055 n=26+29)
GoParse-4 16.9ms ± 1% 17.0ms ± 1% +0.82% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 382ns ± 0% 382ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K-4 1.33µs ± 0% 1.33µs ± 0% -0.25% (p=0.000 n=30+27)
RegexpMatchEasy1_32-4 361ns ± 0% 361ns ± 0% -0.08% (p=0.002 n=30+28)
RegexpMatchEasy1_1K-4 2.11µs ± 0% 2.09µs ± 0% -0.54% (p=0.000 n=30+29)
RegexpMatchMedium_32-4 594ns ± 0% 592ns ± 0% -0.32% (p=0.000 n=30+30)
RegexpMatchMedium_1K-4 173µs ± 0% 172µs ± 0% -0.77% (p=0.000 n=29+27)
RegexpMatchHard_32-4 10.4µs ± 0% 10.1µs ± 0% -3.63% (p=0.000 n=28+27)
RegexpMatchHard_1K-4 306µs ± 0% 301µs ± 0% -1.64% (p=0.000 n=29+30)
Revcomp-4 2.51s ± 1% 2.52s ± 0% +0.18% (p=0.017 n=26+27)
Template-4 394ms ± 3% 382ms ± 3% -3.22% (p=0.000 n=28+28)
TimeParse-4 1.67µs ± 0% 1.67µs ± 0% +0.05% (p=0.030 n=27+30)
TimeFormat-4 1.72µs ± 0% 1.70µs ± 0% -0.79% (p=0.000 n=28+26)
[Geo mean] 259µs 259µs -0.33%
name old speed new speed delta
GobDecode-4 23.8MB/s ± 1% 23.9MB/s ± 1% +0.40% (p=0.001 n=27+26)
GobEncode-4 28.0MB/s ± 2% 28.0MB/s ± 1% ~ (p=0.863 n=28+28)
Gzip-4 12.7MB/s ± 1% 12.7MB/s ± 1% +0.32% (p=0.026 n=29+29)
Gunzip-4 133MB/s ± 0% 133MB/s ± 0% +0.15% (p=0.001 n=24+30)
JSONEncode-4 28.8MB/s ± 1% 28.9MB/s ± 1% ~ (p=0.475 n=28+28)
JSONDecode-4 5.89MB/s ± 4% 5.87MB/s ± 5% ~ (p=0.174 n=29+30)
GoParse-4 3.43MB/s ± 0% 3.40MB/s ± 1% -0.83% (p=0.000 n=28+30)
RegexpMatchEasy0_32-4 83.6MB/s ± 0% 83.6MB/s ± 0% ~ (p=0.848 n=28+29)
RegexpMatchEasy0_1K-4 768MB/s ± 0% 770MB/s ± 0% +0.25% (p=0.000 n=30+27)
RegexpMatchEasy1_32-4 88.5MB/s ± 0% 88.5MB/s ± 0% ~ (p=0.086 n=29+29)
RegexpMatchEasy1_1K-4 486MB/s ± 0% 489MB/s ± 0% +0.54% (p=0.000 n=30+29)
RegexpMatchMedium_32-4 1.68MB/s ± 0% 1.69MB/s ± 0% +0.60% (p=0.000 n=30+23)
RegexpMatchMedium_1K-4 5.90MB/s ± 0% 5.95MB/s ± 0% +0.85% (p=0.000 n=18+20)
RegexpMatchHard_32-4 3.07MB/s ± 0% 3.18MB/s ± 0% +3.72% (p=0.000 n=29+26)
RegexpMatchHard_1K-4 3.35MB/s ± 0% 3.40MB/s ± 0% +1.69% (p=0.000 n=30+30)
Revcomp-4 101MB/s ± 0% 101MB/s ± 0% -0.18% (p=0.018 n=26+27)
Template-4 4.92MB/s ± 4% 5.09MB/s ± 3% +3.31% (p=0.000 n=28+28)
[Geo mean] 22.4MB/s 22.6MB/s +0.62%
Change-Id: I8f304b272785739f57b3c8f736316f658f8c1b2a
Reviewed-on: https://go-review.googlesource.com/129119
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add suport of read-modify-write for AND/SUB/AND/OR/XOR on amd64.
1. The total size of pkg/linux_amd64 decreases about 4KB, excluding
cmd/compile.
2. The go1 benchmark shows a little improvement, excluding noise.
name old time/op new time/op delta
BinaryTree17-4 2.63s ± 3% 2.65s ± 4% +1.01% (p=0.037 n=35+35)
Fannkuch11-4 2.33s ± 2% 2.39s ± 2% +2.49% (p=0.000 n=35+35)
FmtFprintfEmpty-4 45.4ns ± 5% 40.8ns ± 6% -10.09% (p=0.000 n=35+35)
FmtFprintfString-4 73.3ns ± 4% 70.9ns ± 3% -3.23% (p=0.000 n=30+35)
FmtFprintfInt-4 79.9ns ± 4% 79.5ns ± 3% ~ (p=0.736 n=34+35)
FmtFprintfIntInt-4 126ns ± 4% 125ns ± 4% ~ (p=0.083 n=35+35)
FmtFprintfPrefixedInt-4 152ns ± 6% 152ns ± 3% ~ (p=0.855 n=34+35)
FmtFprintfFloat-4 215ns ± 4% 213ns ± 4% ~ (p=0.066 n=35+35)
FmtManyArgs-4 522ns ± 3% 506ns ± 3% -3.15% (p=0.000 n=35+35)
GobDecode-4 6.45ms ± 8% 6.51ms ± 7% +0.96% (p=0.026 n=35+35)
GobEncode-4 6.10ms ± 6% 6.02ms ± 8% ~ (p=0.160 n=35+35)
Gzip-4 228ms ± 3% 221ms ± 3% -2.92% (p=0.000 n=35+35)
Gunzip-4 37.5ms ± 4% 37.2ms ± 3% -0.78% (p=0.036 n=35+35)
HTTPClientServer-4 58.7µs ± 2% 59.2µs ± 1% +0.80% (p=0.000 n=33+33)
JSONEncode-4 12.0ms ± 3% 12.2ms ± 3% +1.84% (p=0.008 n=35+35)
JSONDecode-4 57.0ms ± 4% 56.6ms ± 3% ~ (p=0.320 n=35+35)
Mandelbrot200-4 3.82ms ± 3% 3.79ms ± 3% ~ (p=0.074 n=35+35)
GoParse-4 3.21ms ± 5% 3.24ms ± 4% ~ (p=0.119 n=35+35)
RegexpMatchEasy0_32-4 76.3ns ± 4% 75.4ns ± 4% -1.14% (p=0.014 n=34+33)
RegexpMatchEasy0_1K-4 251ns ± 4% 254ns ± 3% +1.28% (p=0.016 n=35+35)
RegexpMatchEasy1_32-4 69.6ns ± 3% 70.1ns ± 3% +0.82% (p=0.005 n=35+35)
RegexpMatchEasy1_1K-4 367ns ± 4% 376ns ± 4% +2.47% (p=0.000 n=35+35)
RegexpMatchMedium_32-4 108ns ± 5% 104ns ± 4% -3.18% (p=0.000 n=35+35)
RegexpMatchMedium_1K-4 33.8µs ± 3% 32.7µs ± 3% -3.27% (p=0.000 n=35+35)
RegexpMatchHard_32-4 1.55µs ± 3% 1.52µs ± 3% -1.64% (p=0.000 n=35+35)
RegexpMatchHard_1K-4 46.6µs ± 3% 46.6µs ± 4% ~ (p=0.149 n=35+35)
Revcomp-4 416ms ± 7% 412ms ± 6% -0.95% (p=0.033 n=33+35)
Template-4 64.3ms ± 3% 62.4ms ± 7% -2.94% (p=0.000 n=35+35)
TimeParse-4 320ns ± 2% 322ns ± 3% ~ (p=0.589 n=35+35)
TimeFormat-4 300ns ± 3% 300ns ± 3% ~ (p=0.597 n=35+35)
[Geo mean] 47.4µs 47.0µs -0.86%
name old speed new speed delta
GobDecode-4 119MB/s ± 7% 118MB/s ± 7% -0.96% (p=0.027 n=35+35)
GobEncode-4 126MB/s ± 7% 127MB/s ± 6% ~ (p=0.157 n=34+34)
Gzip-4 85.3MB/s ± 3% 87.9MB/s ± 3% +3.02% (p=0.000 n=35+35)
Gunzip-4 518MB/s ± 4% 522MB/s ± 3% +0.79% (p=0.037 n=35+35)
JSONEncode-4 162MB/s ± 3% 159MB/s ± 3% -1.81% (p=0.009 n=35+35)
JSONDecode-4 34.1MB/s ± 4% 34.3MB/s ± 3% ~ (p=0.318 n=35+35)
GoParse-4 18.0MB/s ± 5% 17.9MB/s ± 4% ~ (p=0.117 n=35+35)
RegexpMatchEasy0_32-4 419MB/s ± 3% 425MB/s ± 4% +1.46% (p=0.003 n=32+33)
RegexpMatchEasy0_1K-4 4.07GB/s ± 4% 4.02GB/s ± 3% -1.28% (p=0.014 n=35+35)
RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 4% -0.82% (p=0.004 n=35+35)
RegexpMatchEasy1_1K-4 2.79GB/s ± 4% 2.72GB/s ± 4% -2.39% (p=0.000 n=35+35)
RegexpMatchMedium_32-4 9.23MB/s ± 4% 9.53MB/s ± 4% +3.16% (p=0.000 n=35+35)
RegexpMatchMedium_1K-4 30.3MB/s ± 3% 31.3MB/s ± 3% +3.38% (p=0.000 n=35+35)
RegexpMatchHard_32-4 20.7MB/s ± 3% 21.0MB/s ± 3% +1.67% (p=0.000 n=35+35)
RegexpMatchHard_1K-4 22.0MB/s ± 3% 21.9MB/s ± 4% ~ (p=0.277 n=35+33)
Revcomp-4 612MB/s ± 7% 618MB/s ± 6% +0.96% (p=0.034 n=33+35)
Template-4 30.2MB/s ± 3% 31.1MB/s ± 6% +3.05% (p=0.000 n=35+35)
[Geo mean] 123MB/s 124MB/s +0.64%
Change-Id: Ia025da272e07d0069413824bfff3471b106d6280
Reviewed-on: https://go-review.googlesource.com/121535
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New read-modify-write operations are introduced in this CL for 386.
1. The total size of pkg/linux_386 decreases about 10KB (excluding
cmd/compile).
2. The go1 benchmark shows little regression.
name old time/op new time/op delta
BinaryTree17-4 3.32s ± 4% 3.29s ± 2% ~ (p=0.059 n=30+30)
Fannkuch11-4 3.49s ± 1% 3.46s ± 1% -0.92% (p=0.001 n=30+30)
FmtFprintfEmpty-4 47.7ns ± 2% 46.8ns ± 5% -1.93% (p=0.011 n=25+30)
FmtFprintfString-4 79.5ns ± 7% 80.2ns ± 3% +0.89% (p=0.001 n=28+29)
FmtFprintfInt-4 90.5ns ± 2% 92.1ns ± 2% +1.82% (p=0.014 n=22+30)
FmtFprintfIntInt-4 141ns ± 1% 144ns ± 3% +2.23% (p=0.013 n=22+30)
FmtFprintfPrefixedInt-4 183ns ± 2% 184ns ± 3% ~ (p=0.080 n=21+30)
FmtFprintfFloat-4 409ns ± 3% 412ns ± 3% +0.83% (p=0.040 n=30+30)
FmtManyArgs-4 597ns ± 6% 607ns ± 4% +1.71% (p=0.006 n=30+30)
GobDecode-4 7.21ms ± 5% 7.18ms ± 6% ~ (p=0.665 n=30+30)
GobEncode-4 7.17ms ± 6% 7.09ms ± 7% ~ (p=0.117 n=29+30)
Gzip-4 413ms ± 4% 399ms ± 4% -3.48% (p=0.000 n=30+30)
Gunzip-4 41.3ms ± 4% 41.7ms ± 3% +1.05% (p=0.011 n=30+30)
HTTPClientServer-4 63.5µs ± 3% 62.9µs ± 2% -0.97% (p=0.017 n=30+27)
JSONEncode-4 20.3ms ± 5% 20.1ms ± 5% -1.16% (p=0.004 n=30+30)
JSONDecode-4 66.2ms ± 4% 67.7ms ± 4% +2.21% (p=0.000 n=30+30)
Mandelbrot200-4 5.16ms ± 3% 5.18ms ± 3% ~ (p=0.123 n=30+30)
GoParse-4 3.23ms ± 2% 3.27ms ± 2% +1.08% (p=0.006 n=30+30)
RegexpMatchEasy0_32-4 98.9ns ± 5% 97.1ns ± 4% -1.83% (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4 842ns ± 3% 842ns ± 3% ~ (p=0.550 n=30+30)
RegexpMatchEasy1_32-4 107ns ± 4% 105ns ± 4% -1.93% (p=0.012 n=30+30)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.04µs ± 4% ~ (p=0.304 n=30+30)
RegexpMatchMedium_32-4 132ns ± 2% 129ns ± 4% -2.02% (p=0.000 n=21+30)
RegexpMatchMedium_1K-4 44.1µs ± 4% 43.8µs ± 3% ~ (p=0.641 n=30+30)
RegexpMatchHard_32-4 2.26µs ± 4% 2.23µs ± 4% -1.28% (p=0.023 n=30+30)
RegexpMatchHard_1K-4 68.1µs ± 3% 68.6µs ± 4% ~ (p=0.089 n=30+30)
Revcomp-4 1.85s ± 2% 1.84s ± 2% ~ (p=0.072 n=30+30)
Template-4 69.2ms ± 3% 68.5ms ± 3% -1.04% (p=0.012 n=30+30)
TimeParse-4 441ns ± 3% 446ns ± 4% +1.21% (p=0.001 n=30+30)
TimeFormat-4 415ns ± 3% 415ns ± 3% ~ (p=0.436 n=30+30)
[Geo mean] 67.0µs 66.9µs -0.17%
name old speed new speed delta
GobDecode-4 107MB/s ± 5% 107MB/s ± 6% ~ (p=0.663 n=30+30)
GobEncode-4 107MB/s ± 6% 108MB/s ± 7% ~ (p=0.117 n=29+30)
Gzip-4 47.0MB/s ± 4% 48.7MB/s ± 4% +3.61% (p=0.000 n=30+30)
Gunzip-4 470MB/s ± 4% 466MB/s ± 4% -1.05% (p=0.011 n=30+30)
JSONEncode-4 95.6MB/s ± 5% 96.7MB/s ± 5% +1.16% (p=0.005 n=30+30)
JSONDecode-4 29.3MB/s ± 4% 28.7MB/s ± 4% -2.17% (p=0.000 n=30+30)
GoParse-4 17.9MB/s ± 2% 17.7MB/s ± 2% -1.06% (p=0.007 n=30+30)
RegexpMatchEasy0_32-4 323MB/s ± 5% 329MB/s ± 4% +1.93% (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4 1.22GB/s ± 3% 1.22GB/s ± 3% ~ (p=0.496 n=30+30)
RegexpMatchEasy1_32-4 298MB/s ± 4% 303MB/s ± 4% +1.84% (p=0.017 n=30+30)
RegexpMatchEasy1_1K-4 995MB/s ± 4% 989MB/s ± 4% ~ (p=0.307 n=30+30)
RegexpMatchMedium_32-4 7.56MB/s ± 4% 7.74MB/s ± 4% +2.46% (p=0.000 n=22+30)
RegexpMatchMedium_1K-4 23.2MB/s ± 4% 23.4MB/s ± 3% ~ (p=0.651 n=30+30)
RegexpMatchHard_32-4 14.2MB/s ± 4% 14.3MB/s ± 4% +1.29% (p=0.021 n=30+30)
RegexpMatchHard_1K-4 15.0MB/s ± 3% 14.9MB/s ± 4% ~ (p=0.069 n=30+29)
Revcomp-4 138MB/s ± 2% 138MB/s ± 2% ~ (p=0.072 n=30+30)
Template-4 28.1MB/s ± 3% 28.4MB/s ± 3% +1.05% (p=0.012 n=30+30)
[Geo mean] 79.7MB/s 80.2MB/s +0.60%
Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31
Reviewed-on: https://go-review.googlesource.com/120916
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IMULL/DIVSS/DIVSD all can take the source operand from memory
directly. And this CL implement that optimization.
1. The total size of pkg/linux_386 decreases about 84KB (excluding
cmd/compile).
2. The go1 benchmark shows little regression in total (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 3.29s ± 2% 3.27s ± 4% ~ (p=0.192 n=30+30)
Fannkuch11-4 3.49s ± 2% 3.54s ± 1% +1.48% (p=0.000 n=30+30)
FmtFprintfEmpty-4 45.9ns ± 3% 46.3ns ± 4% +0.89% (p=0.037 n=30+30)
FmtFprintfString-4 78.8ns ± 3% 78.7ns ± 4% ~ (p=0.209 n=30+27)
FmtFprintfInt-4 91.0ns ± 2% 90.3ns ± 2% -0.82% (p=0.031 n=30+27)
FmtFprintfIntInt-4 142ns ± 4% 143ns ± 4% ~ (p=0.136 n=30+30)
FmtFprintfPrefixedInt-4 181ns ± 3% 183ns ± 4% +1.40% (p=0.005 n=30+30)
FmtFprintfFloat-4 404ns ± 4% 408ns ± 3% ~ (p=0.397 n=30+30)
FmtManyArgs-4 601ns ± 3% 609ns ± 5% ~ (p=0.059 n=30+30)
GobDecode-4 7.21ms ± 5% 7.24ms ± 5% ~ (p=0.612 n=30+30)
GobEncode-4 6.91ms ± 6% 6.91ms ± 6% ~ (p=0.797 n=30+30)
Gzip-4 398ms ± 6% 399ms ± 4% ~ (p=0.173 n=30+30)
Gunzip-4 41.7ms ± 3% 41.8ms ± 3% ~ (p=0.423 n=30+30)
HTTPClientServer-4 62.3µs ± 2% 62.7µs ± 3% ~ (p=0.085 n=29+30)
JSONEncode-4 21.0ms ± 4% 20.7ms ± 5% -1.39% (p=0.014 n=30+30)
JSONDecode-4 66.3ms ± 3% 67.4ms ± 1% +1.71% (p=0.003 n=30+24)
Mandelbrot200-4 5.15ms ± 3% 5.16ms ± 3% ~ (p=0.697 n=30+30)
GoParse-4 3.24ms ± 3% 3.27ms ± 4% +0.91% (p=0.032 n=30+30)
RegexpMatchEasy0_32-4 101ns ± 5% 99ns ± 4% -1.82% (p=0.008 n=29+30)
RegexpMatchEasy0_1K-4 848ns ± 4% 841ns ± 2% -0.77% (p=0.043 n=30+30)
RegexpMatchEasy1_32-4 106ns ± 6% 106ns ± 3% ~ (p=0.939 n=29+30)
RegexpMatchEasy1_1K-4 1.02µs ± 3% 1.03µs ± 4% ~ (p=0.297 n=28+30)
RegexpMatchMedium_32-4 129ns ± 4% 127ns ± 4% ~ (p=0.073 n=30+30)
RegexpMatchMedium_1K-4 43.9µs ± 3% 43.8µs ± 3% ~ (p=0.186 n=30+30)
RegexpMatchHard_32-4 2.24µs ± 4% 2.22µs ± 4% ~ (p=0.332 n=30+29)
RegexpMatchHard_1K-4 68.0µs ± 4% 67.5µs ± 3% ~ (p=0.290 n=30+30)
Revcomp-4 1.85s ± 3% 1.85s ± 3% ~ (p=0.358 n=30+30)
Template-4 69.6ms ± 3% 70.0ms ± 4% ~ (p=0.273 n=30+30)
TimeParse-4 445ns ± 3% 441ns ± 3% ~ (p=0.494 n=30+30)
TimeFormat-4 412ns ± 3% 412ns ± 6% ~ (p=0.841 n=30+30)
[Geo mean] 66.7µs 66.8µs +0.13%
name old speed new speed delta
GobDecode-4 107MB/s ± 5% 106MB/s ± 5% ~ (p=0.615 n=30+30)
GobEncode-4 111MB/s ± 6% 111MB/s ± 6% ~ (p=0.790 n=30+30)
Gzip-4 48.8MB/s ± 6% 48.7MB/s ± 4% ~ (p=0.167 n=30+30)
Gunzip-4 465MB/s ± 3% 465MB/s ± 3% ~ (p=0.420 n=30+30)
JSONEncode-4 92.4MB/s ± 4% 93.7MB/s ± 5% +1.42% (p=0.015 n=30+30)
JSONDecode-4 29.3MB/s ± 3% 28.8MB/s ± 1% -1.72% (p=0.003 n=30+24)
GoParse-4 17.9MB/s ± 3% 17.7MB/s ± 4% -0.89% (p=0.037 n=30+30)
RegexpMatchEasy0_32-4 317MB/s ± 8% 324MB/s ± 4% +2.14% (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4 1.21GB/s ± 4% 1.22GB/s ± 2% +0.77% (p=0.036 n=30+30)
RegexpMatchEasy1_32-4 298MB/s ± 7% 299MB/s ± 4% ~ (p=0.511 n=30+30)
RegexpMatchEasy1_1K-4 1.00GB/s ± 3% 1.00GB/s ± 4% ~ (p=0.304 n=28+30)
RegexpMatchMedium_32-4 7.75MB/s ± 4% 7.82MB/s ± 4% ~ (p=0.089 n=30+30)
RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.181 n=30+30)
RegexpMatchHard_32-4 14.3MB/s ± 4% 14.4MB/s ± 4% ~ (p=0.320 n=30+29)
RegexpMatchHard_1K-4 15.1MB/s ± 4% 15.2MB/s ± 3% ~ (p=0.273 n=30+30)
Revcomp-4 137MB/s ± 3% 137MB/s ± 3% ~ (p=0.352 n=30+30)
Template-4 27.9MB/s ± 3% 27.7MB/s ± 4% ~ (p=0.277 n=30+30)
[Geo mean] 79.9MB/s 80.1MB/s +0.15%
Change-Id: I97333cd8ddabb3c7c88ca5aa9e14a005b74d306d
Reviewed-on: https://go-review.googlesource.com/120695
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DIVSSload & DIVSDload directly operate on a memory operand. And
binary size can be reduced by them, while the performance is
not affected.
The total size of pkg/linux_amd64 (excluding cmd/compile) decreases
about 6KB.
There is little regression in the go1 benchmark test (excluding noise).
name old time/op new time/op delta
BinaryTree17-4 2.63s ± 4% 2.62s ± 4% ~ (p=0.809 n=30+30)
Fannkuch11-4 2.40s ± 2% 2.40s ± 2% ~ (p=0.109 n=30+30)
FmtFprintfEmpty-4 43.1ns ± 4% 43.2ns ± 9% ~ (p=0.168 n=30+30)
FmtFprintfString-4 73.6ns ± 4% 74.1ns ± 4% ~ (p=0.069 n=30+30)
FmtFprintfInt-4 81.0ns ± 3% 81.4ns ± 5% ~ (p=0.350 n=30+30)
FmtFprintfIntInt-4 127ns ± 4% 129ns ± 4% +0.99% (p=0.021 n=30+30)
FmtFprintfPrefixedInt-4 156ns ± 4% 155ns ± 4% ~ (p=0.415 n=30+30)
FmtFprintfFloat-4 219ns ± 4% 218ns ± 4% ~ (p=0.071 n=30+30)
FmtManyArgs-4 522ns ± 3% 518ns ± 3% -0.68% (p=0.034 n=30+30)
GobDecode-4 6.49ms ± 6% 6.52ms ± 6% ~ (p=0.832 n=30+30)
GobEncode-4 6.10ms ± 9% 6.14ms ± 7% ~ (p=0.485 n=30+30)
Gzip-4 227ms ± 1% 224ms ± 4% ~ (p=0.484 n=24+30)
Gunzip-4 37.2ms ± 3% 36.8ms ± 4% ~ (p=0.889 n=30+30)
HTTPClientServer-4 58.9µs ± 1% 58.7µs ± 2% -0.42% (p=0.003 n=28+28)
JSONEncode-4 12.0ms ± 3% 12.0ms ± 4% ~ (p=0.523 n=30+30)
JSONDecode-4 54.6ms ± 4% 54.5ms ± 4% ~ (p=0.708 n=30+30)
Mandelbrot200-4 3.78ms ± 4% 3.81ms ± 3% +0.99% (p=0.016 n=30+30)
GoParse-4 3.20ms ± 4% 3.20ms ± 5% ~ (p=0.994 n=30+30)
RegexpMatchEasy0_32-4 77.0ns ± 4% 75.9ns ± 3% -1.39% (p=0.006 n=29+30)
RegexpMatchEasy0_1K-4 255ns ± 4% 253ns ± 4% ~ (p=0.091 n=30+30)
RegexpMatchEasy1_32-4 69.7ns ± 3% 70.3ns ± 4% ~ (p=0.120 n=30+30)
RegexpMatchEasy1_1K-4 373ns ± 2% 378ns ± 3% +1.43% (p=0.000 n=21+26)
RegexpMatchMedium_32-4 107ns ± 2% 108ns ± 4% +1.50% (p=0.012 n=22+30)
RegexpMatchMedium_1K-4 34.0µs ± 1% 34.3µs ± 3% +1.08% (p=0.008 n=24+30)
RegexpMatchHard_32-4 1.53µs ± 3% 1.54µs ± 3% ~ (p=0.234 n=30+30)
RegexpMatchHard_1K-4 46.7µs ± 4% 47.0µs ± 4% ~ (p=0.420 n=30+30)
Revcomp-4 411ms ± 7% 415ms ± 6% ~ (p=0.059 n=30+30)
Template-4 65.5ms ± 5% 66.9ms ± 4% +2.21% (p=0.001 n=30+30)
TimeParse-4 317ns ± 3% 311ns ± 3% -1.97% (p=0.000 n=30+30)
TimeFormat-4 293ns ± 3% 294ns ± 3% ~ (p=0.243 n=30+30)
[Geo mean] 47.4µs 47.5µs +0.17%
name old speed new speed delta
GobDecode-4 118MB/s ± 5% 118MB/s ± 6% ~ (p=0.832 n=30+30)
GobEncode-4 125MB/s ± 7% 125MB/s ± 7% ~ (p=0.625 n=29+30)
Gzip-4 85.3MB/s ± 1% 86.6MB/s ± 4% ~ (p=0.486 n=24+30)
Gunzip-4 522MB/s ± 3% 527MB/s ± 4% ~ (p=0.889 n=30+30)
JSONEncode-4 162MB/s ± 3% 162MB/s ± 4% ~ (p=0.520 n=30+30)
JSONDecode-4 35.5MB/s ± 4% 35.6MB/s ± 4% ~ (p=0.701 n=30+30)
GoParse-4 18.1MB/s ± 4% 18.1MB/s ± 4% ~ (p=0.891 n=29+30)
RegexpMatchEasy0_32-4 416MB/s ± 4% 422MB/s ± 3% +1.43% (p=0.005 n=29+30)
RegexpMatchEasy0_1K-4 4.01GB/s ± 4% 4.04GB/s ± 4% ~ (p=0.091 n=30+30)
RegexpMatchEasy1_32-4 460MB/s ± 3% 456MB/s ± 5% ~ (p=0.123 n=30+30)
RegexpMatchEasy1_1K-4 2.74GB/s ± 2% 2.70GB/s ± 3% -1.33% (p=0.000 n=22+26)
RegexpMatchMedium_32-4 9.39MB/s ± 3% 9.19MB/s ± 4% -2.06% (p=0.001 n=28+30)
RegexpMatchMedium_1K-4 30.1MB/s ± 1% 29.8MB/s ± 3% -1.04% (p=0.008 n=24+30)
RegexpMatchHard_32-4 20.9MB/s ± 3% 20.8MB/s ± 3% ~ (p=0.234 n=30+30)
RegexpMatchHard_1K-4 21.9MB/s ± 4% 21.8MB/s ± 4% ~ (p=0.420 n=30+30)
Revcomp-4 619MB/s ± 7% 612MB/s ± 7% ~ (p=0.059 n=30+30)
Template-4 29.6MB/s ± 4% 29.0MB/s ± 4% -2.16% (p=0.002 n=30+30)
[Geo mean] 123MB/s 123MB/s -0.33%
Change-Id: Ia59e077feae4f2824df79059daea4d0f678e3e4c
Reviewed-on: https://go-review.googlesource.com/120275
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination,
and this CL optimize the compiler to emit such instructions on
386 for more compact binary.
Here is test report:
1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases
about 14KB.
(pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded)
2. The go1 benchmark shows little change, excluding ±2% noise.
name old time/op new time/op delta
BinaryTree17-4 3.34s ± 2% 3.38s ± 2% +1.27% (p=0.000 n=40+39)
Fannkuch11-4 3.55s ± 1% 3.51s ± 1% -1.33% (p=0.000 n=40+40)
FmtFprintfEmpty-4 46.3ns ± 3% 46.9ns ± 4% +1.41% (p=0.002 n=40+40)
FmtFprintfString-4 80.8ns ± 3% 80.4ns ± 6% -0.54% (p=0.044 n=40+40)
FmtFprintfInt-4 93.0ns ± 3% 92.2ns ± 4% -0.88% (p=0.007 n=39+40)
FmtFprintfIntInt-4 144ns ± 5% 145ns ± 2% +0.78% (p=0.015 n=40+40)
FmtFprintfPrefixedInt-4 184ns ± 2% 182ns ± 2% -1.06% (p=0.004 n=40+40)
FmtFprintfFloat-4 415ns ± 4% 419ns ± 4% ~ (p=0.434 n=40+40)
FmtManyArgs-4 615ns ± 3% 619ns ± 3% ~ (p=0.100 n=40+40)
GobDecode-4 7.30ms ± 6% 7.36ms ± 6% ~ (p=0.074 n=40+40)
GobEncode-4 7.10ms ± 6% 7.21ms ± 5% ~ (p=0.082 n=40+39)
Gzip-4 364ms ± 3% 362ms ± 6% -0.71% (p=0.020 n=40+40)
Gunzip-4 42.4ms ± 3% 42.2ms ± 3% ~ (p=0.303 n=40+40)
HTTPClientServer-4 62.9µs ± 1% 62.9µs ± 1% ~ (p=0.768 n=38+39)
JSONEncode-4 21.4ms ± 4% 21.5ms ± 5% ~ (p=0.210 n=40+40)
JSONDecode-4 67.7ms ± 3% 67.9ms ± 4% ~ (p=0.713 n=40+40)
Mandelbrot200-4 5.18ms ± 3% 5.21ms ± 3% +0.59% (p=0.021 n=40+40)
GoParse-4 3.35ms ± 3% 3.34ms ± 2% ~ (p=0.996 n=40+40)
RegexpMatchEasy0_32-4 98.5ns ± 5% 96.3ns ± 4% -2.15% (p=0.001 n=40+40)
RegexpMatchEasy0_1K-4 851ns ± 4% 850ns ± 5% ~ (p=0.700 n=40+40)
RegexpMatchEasy1_32-4 105ns ± 7% 107ns ± 4% +1.50% (p=0.017 n=40+40)
RegexpMatchEasy1_1K-4 1.03µs ± 5% 1.03µs ± 4% ~ (p=0.992 n=40+40)
RegexpMatchMedium_32-4 130ns ± 6% 128ns ± 4% -1.66% (p=0.012 n=40+40)
RegexpMatchMedium_1K-4 44.0µs ± 5% 43.6µs ± 3% ~ (p=0.704 n=40+40)
RegexpMatchHard_32-4 2.29µs ± 3% 2.23µs ± 4% -2.38% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 69.0µs ± 3% 68.1µs ± 3% -1.28% (p=0.003 n=40+40)
Revcomp-4 1.85s ± 2% 1.87s ± 3% +1.11% (p=0.000 n=40+40)
Template-4 69.8ms ± 3% 69.6ms ± 3% ~ (p=0.125 n=40+40)
TimeParse-4 442ns ± 5% 440ns ± 3% ~ (p=0.585 n=40+40)
TimeFormat-4 419ns ± 3% 420ns ± 3% ~ (p=0.824 n=40+40)
[Geo mean] 67.3µs 67.2µs -0.11%
name old speed new speed delta
GobDecode-4 105MB/s ± 6% 104MB/s ± 6% ~ (p=0.074 n=40+40)
GobEncode-4 108MB/s ± 7% 107MB/s ± 5% ~ (p=0.080 n=40+39)
Gzip-4 53.3MB/s ± 3% 53.7MB/s ± 6% +0.73% (p=0.021 n=40+40)
Gunzip-4 458MB/s ± 3% 460MB/s ± 3% ~ (p=0.301 n=40+40)
JSONEncode-4 90.8MB/s ± 4% 90.3MB/s ± 4% ~ (p=0.213 n=40+40)
JSONDecode-4 28.7MB/s ± 3% 28.6MB/s ± 4% ~ (p=0.679 n=40+40)
GoParse-4 17.3MB/s ± 3% 17.3MB/s ± 2% ~ (p=1.000 n=40+40)
RegexpMatchEasy0_32-4 325MB/s ± 5% 333MB/s ± 4% +2.44% (p=0.000 n=40+38)
RegexpMatchEasy0_1K-4 1.20GB/s ± 4% 1.21GB/s ± 5% ~ (p=0.684 n=40+40)
RegexpMatchEasy1_32-4 303MB/s ± 7% 298MB/s ± 4% -1.52% (p=0.022 n=40+40)
RegexpMatchEasy1_1K-4 995MB/s ± 5% 996MB/s ± 4% ~ (p=0.996 n=40+40)
RegexpMatchMedium_32-4 7.67MB/s ± 6% 7.80MB/s ± 4% +1.68% (p=0.011 n=40+40)
RegexpMatchMedium_1K-4 23.3MB/s ± 5% 23.5MB/s ± 3% ~ (p=0.697 n=40+40)
RegexpMatchHard_32-4 14.0MB/s ± 3% 14.3MB/s ± 4% +2.43% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 14.8MB/s ± 3% 15.0MB/s ± 3% +1.30% (p=0.003 n=40+40)
Revcomp-4 137MB/s ± 2% 136MB/s ± 3% -1.10% (p=0.000 n=40+40)
Template-4 27.8MB/s ± 3% 27.9MB/s ± 3% ~ (p=0.128 n=40+40)
[Geo mean] 79.6MB/s 79.9MB/s +0.28%
Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733
Reviewed-on: https://go-review.googlesource.com/110157
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL.
Bit of a special case, but the single-instruction
LEA is both shorter and faster than SHL then ADD.
Triggers 293 times during make.bash.
Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1
Reviewed-on: https://go-review.googlesource.com/108938
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SUBL instruction can take a memory operand, and this CL
implements this optimization.
The go1 benchmark shows a little improvement.
name old time/op new time/op delta
BinaryTree17-4 3.27s ± 2% 3.29s ± 3% ~ (p=0.322 n=37+40)
Fannkuch11-4 3.49s ± 0% 3.53s ± 1% +1.21% (p=0.000 n=31+40)
FmtFprintfEmpty-4 46.2ns ± 3% 46.3ns ± 2% ~ (p=0.351 n=40+28)
FmtFprintfString-4 82.0ns ± 3% 81.5ns ± 2% -0.69% (p=0.002 n=40+30)
FmtFprintfInt-4 94.6ns ± 3% 94.6ns ± 6% ~ (p=0.913 n=39+37)
FmtFprintfIntInt-4 147ns ± 3% 150ns ± 2% +1.72% (p=0.000 n=40+25)
FmtFprintfPrefixedInt-4 186ns ± 3% 186ns ± 0% -0.33% (p=0.006 n=40+25)
FmtFprintfFloat-4 388ns ± 4% 388ns ± 4% ~ (p=0.162 n=40+40)
FmtManyArgs-4 612ns ± 3% 616ns ± 4% ~ (p=0.223 n=40+40)
GobDecode-4 7.35ms ± 5% 7.42ms ± 5% ~ (p=0.095 n=40+40)
GobEncode-4 7.21ms ± 8% 7.23ms ± 4% ~ (p=0.294 n=40+40)
Gzip-4 360ms ± 4% 359ms ± 4% ~ (p=0.097 n=40+40)
Gunzip-4 46.1ms ± 3% 45.6ms ± 3% -1.20% (p=0.000 n=40+40)
HTTPClientServer-4 64.0µs ± 2% 64.1µs ± 2% ~ (p=0.648 n=39+40)
JSONEncode-4 21.9ms ± 4% 22.1ms ± 5% ~ (p=0.086 n=40+40)
JSONDecode-4 67.9ms ± 4% 66.7ms ± 4% -1.63% (p=0.000 n=40+40)
Mandelbrot200-4 5.19ms ± 3% 5.17ms ± 3% ~ (p=0.881 n=40+40)
GoParse-4 3.34ms ± 3% 3.28ms ± 2% -1.78% (p=0.000 n=40+40)
RegexpMatchEasy0_32-4 101ns ± 5% 99ns ± 3% -2.40% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 851ns ± 1% 848ns ± 3% -0.36% (p=0.004 n=33+40)
RegexpMatchEasy1_32-4 109ns ± 5% 105ns ± 3% -3.53% (p=0.000 n=39+40)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 3% ~ (p=0.638 n=40+38)
RegexpMatchMedium_32-4 131ns ± 5% 127ns ± 4% -3.36% (p=0.000 n=38+40)
RegexpMatchMedium_1K-4 43.4µs ± 4% 43.2µs ± 3% -0.46% (p=0.008 n=40+40)
RegexpMatchHard_32-4 2.21µs ± 4% 2.23µs ± 1% +0.77% (p=0.014 n=40+28)
RegexpMatchHard_1K-4 67.6µs ± 4% 67.7µs ± 3% +0.11% (p=0.016 n=40+40)
Revcomp-4 1.86s ± 3% 1.77s ± 2% -4.81% (p=0.000 n=40+40)
Template-4 71.7ms ± 3% 71.6ms ± 4% ~ (p=0.200 n=40+40)
TimeParse-4 436ns ± 4% 433ns ± 3% ~ (p=0.358 n=40+40)
TimeFormat-4 413ns ± 4% 412ns ± 3% ~ (p=0.415 n=40+40)
[Geo mean] 63.9µs 63.6µs -0.49%
name old speed new speed delta
GobDecode-4 105MB/s ± 5% 104MB/s ± 5% ~ (p=0.096 n=40+40)
GobEncode-4 106MB/s ± 7% 106MB/s ± 3% ~ (p=0.385 n=39+40)
Gzip-4 54.0MB/s ± 4% 54.0MB/s ± 4% ~ (p=0.100 n=40+40)
Gunzip-4 421MB/s ± 3% 426MB/s ± 3% +1.21% (p=0.000 n=40+40)
JSONEncode-4 88.5MB/s ± 5% 88.0MB/s ± 5% ~ (p=0.083 n=40+40)
JSONDecode-4 28.6MB/s ± 4% 29.1MB/s ± 4% +1.65% (p=0.000 n=40+40)
GoParse-4 17.3MB/s ± 3% 17.7MB/s ± 2% +1.82% (p=0.000 n=40+40)
RegexpMatchEasy0_32-4 316MB/s ± 5% 323MB/s ± 4% +2.44% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 1.20GB/s ± 1% 1.21GB/s ± 3% +0.40% (p=0.004 n=33+40)
RegexpMatchEasy1_32-4 291MB/s ± 7% 302MB/s ± 4% +3.82% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 993MB/s ± 4% 990MB/s ± 3% ~ (p=0.623 n=40+38)
RegexpMatchMedium_32-4 7.61MB/s ± 5% 7.87MB/s ± 4% +3.36% (p=0.000 n=38+40)
RegexpMatchMedium_1K-4 23.6MB/s ± 4% 23.7MB/s ± 4% +0.46% (p=0.007 n=40+40)
RegexpMatchHard_32-4 14.5MB/s ± 4% 14.3MB/s ± 1% -0.79% (p=0.017 n=40+28)
RegexpMatchHard_1K-4 15.1MB/s ± 4% 15.1MB/s ± 3% -0.11% (p=0.015 n=40+40)
Revcomp-4 137MB/s ± 3% 144MB/s ± 3% +5.06% (p=0.000 n=40+40)
Template-4 27.1MB/s ± 3% 27.1MB/s ± 4% ~ (p=0.211 n=40+40)
[Geo mean] 78.9MB/s 79.7MB/s +1.01%
Change-Id: I638fa4fef85833e8605919d693f9570cc3cf7334
Reviewed-on: https://go-review.googlesource.com/107275
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
| |
And delete them from asm_test.
Change-Id: I9a75efe9858ef9d7ac86065f860c2ae3f25b0941
Reviewed-on: https://go-review.googlesource.com/105597
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
|