summaryrefslogtreecommitdiff
path: root/test/codegen
Commit message (Collapse)AuthorAgeFilesLines
* cmd/compile: set LocalPkg.Path to -p flagMatthew Dempsky2022-05-166-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since CL 391014, cmd/compile now requires the -p flag to be set the build system. This CL changes it to initialize LocalPkg.Path to the provided path, rather than relying on writing out `"".` into object files and expecting cmd/link to substitute them. However, this actually involved a rather long tail of fixes. Many have already been submitted, but a few notable ones that have to land simultaneously with changing LocalPkg: 1. When compiling package runtime, there are really two "runtime" packages: types.LocalPkg (the source package itself) and ir.Pkgs.Runtime (the compiler's internal representation, for synthetic references). Previously, these ended up creating separate link symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both end up as `runtime.xxx`, which causes lsym collisions (notably inittask and funcsyms). 2. test/codegen tests need to be updated to expect symbols to be named `command-line-arguments.xxx` rather than `"".foo`. 3. The issue20014 test case is sensitive to the sort order of field tracking symbols. In particular, the local package now sorts to its natural place in the list, rather than to the front. Thanks to David Chase for helping track down all of the fixes needed for this CL. Updates #51734. Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565 Reviewed-on: https://go-review.googlesource.com/c/go/+/393715 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
* cmd/compile: use jump table on ARM64Cherry Mui2022-05-131-0/+2
| | | | | | | | | | | | | | | | | | | Following CL 357330, use jump tables on ARM64. name old time/op new time/op delta Switch8Predictable-4 3.41ns ± 0% 3.21ns ± 0% ~ (p=0.079 n=4+5) Switch8Unpredictable-4 12.0ns ± 0% 9.5ns ± 0% -21.17% (p=0.000 n=5+4) Switch32Predictable-4 3.06ns ± 0% 2.82ns ± 0% -7.78% (p=0.008 n=5+5) Switch32Unpredictable-4 13.3ns ± 0% 9.5ns ± 0% -28.87% (p=0.016 n=4+5) SwitchStringPredictable-4 3.71ns ± 0% 3.21ns ± 0% -13.43% (p=0.000 n=5+4) SwitchStringUnpredictable-4 14.8ns ± 0% 15.1ns ± 0% +2.37% (p=0.008 n=5+5) Change-Id: Ia0b85df7ca9273cf70c05eb957225c6e61822fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/403979 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>
* cmd/compile: lower Add64/Sub64 into ssa on PPC64Paul E. Murphy2022-05-101-6/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | math/bits.Add64 and math/bits.Sub64 now lower and optimize directly in SSA form. The optimization of carry chains focuses around eliding XER<->GPR transfers of the CA bit when used exclusively as an input to a single carry operations, or when the CA value is known. This also adds support for handling XER spills in the assembler which could happen if carry chains contain inter-dependencies on each other (which seems very unlikely with practical usage), or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA). With PPC64 Add64/Sub64 lowering into SSA and this patch, the net performance difference in crypto/elliptic benchmarks on P9/ppc64le are: name old time/op new time/op delta ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34% ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14% ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14% ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27% ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17% ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56% ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86% ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32% MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63% MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06% MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42% MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87% MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93% MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03% MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81% MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67% Note, P256 uses an asm implementation, thus, little variation is expected. Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71 Reviewed-on: https://go-review.googlesource.com/c/go/+/346870 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
* cmd/compile: fold constants found by proveJorropo2022-05-041-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is hit ~70k times building go. This make the go binary, 0.04% smaller. I didn't included benchmarks because this is just constant foldings and is hard to mesure objectively. For example, this enable rewriting things like: if x == 20 { return x + 30 + z } Into: if x == 20 { return 50 + z } It's not just fixing programer's code, the ssa generator generate code like this sometimes. Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc GitHub-Last-Rev: 4c2f9b521692bc61acff137a269917895f4da08a GitHub-Pull-Request: golang/go#52669 Reviewed-on: https://go-review.googlesource.com/c/go/+/403735 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
* cmd/compile: combine OR + NOT into ORN on PPC64Paul E. Murphy2022-05-041-0/+8
| | | | | | | | | | | | | This shows up in a few crypto functions, and other assorted places. Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab Reviewed-on: https://go-review.googlesource.com/c/go/+/404057 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
* cmd/compile: support pointers to arrays in arrayClearCuong Manh Le2022-05-031-0/+36
| | | | | | | | | | | Fixes #52635 Change-Id: I85f182931e30292983ef86c55a0ab6e01282395c Reviewed-on: https://go-review.googlesource.com/c/go/+/403337 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
* cmd/compile: add jump table codegen testKeith Randall2022-04-141-0/+50
| | | | | | | | | Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/400375 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
* cmd/compile: add SHLX&SHRX without loadWayne Zuo2022-04-131-2/+18
| | | | | | | | Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391 Reviewed-on: https://go-review.googlesource.com/c/go/+/399061 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: add SARXQload and SARXLloadWayne Zuo2022-04-131-0/+16
| | | | | | | | | | Change-Id: I4e8dc5009a5b8af37d71b62f1322f94806d3e9d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/399056 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: add SARX instruction for GOAMD64>=3Wayne Zuo2022-04-121-0/+10
| | | | | | | | | | | | | | name old time/op new time/op delta ShiftArithmeticRight-8 0.68ns ± 5% 0.30ns ± 6% -56.14% (p=0.000 n=10+10) Change-Id: I052a0d7b9e6526d526276444e588b0cc288beff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/399055 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: add MOVBE index load/storeWayne Zuo2022-04-111-5/+5
| | | | | | | | | Fixes #51724 Change-Id: I94e650a7482dc4c479d597f0162a6a89d779708d Reviewed-on: https://go-review.googlesource.com/c/go/+/395474 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>
* test: adjust load and store testWayne Zuo2022-04-111-48/+44
| | | | | | | | | | | | | | | | | | | | | | | In the load tests, we only want to test the assembly produced by the load operations. If we use the global variable sink, it will produce one load operation and one store operation(assign to sink). For example: func load_be64(b []byte) uint64 { sink64 = binary.BigEndian.Uint64(b) } If we compile this function with GOAMD64=v3, it may produce MOVBEQload and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload. Discovered when developing CL 395474. Same for the store tests. Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e Reviewed-on: https://go-review.googlesource.com/c/go/+/397875 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>
* cmd/compile: use shlx&shrx instruction for GOAMD64>=v3Wayne Zuo2022-04-041-0/+16
| | | | | | | | | | | The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction. For #47120 Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a Reviewed-on: https://go-review.googlesource.com/c/go/+/385174 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>
* cmd/compile: use LZCNT instruction for GOAMD64>=3Wayne Zuo2022-04-041-10/+20
| | | | | | | | | | | | | | | | | | | | | | | LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using LZCNT can avoid a special case for zero input. Except that case, LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x). And according to https://www.agner.org/optimize/instruction_tables.pdf, LZCNT instructions are much faster than BSR on AMD CPU. name old time/op new time/op delta LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9) LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9) LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10) LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10) LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10) Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a Reviewed-on: https://go-review.googlesource.com/c/go/+/396794 Reviewed-by: Keith Randall <khr@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Gopher Robot <gobot@golang.org>
* cmd/compile: add MOVBEWstore support for GOAMD64>=3Wayne Zuo2022-04-031-7/+10
| | | | | | | | | | | | | | This CL add MOVBE support for 16-bit version, but MOVBEWload is excluded because it does not satisfy zero extented. For #51724 Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac Reviewed-on: https://go-review.googlesource.com/c/go/+/396617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com>
* cmd/compile: merge ANDconst and UBFX into UBFX on arm64fanzha022022-03-251-0/+6
| | | | | | | | | | | | | | Add a new rewrite rule to merge ANDconst and UBFX into UBFX. Add test cases. Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670 Reviewed-on: https://go-review.googlesource.com/c/go/+/377054 Trust: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
* cmd/compile: include all entries in map literal hint sizeKeith Randall2022-03-011-0/+30
| | | | | | | | | | | | | | | | | | | | Currently we only include static entries in the hint for sizing the map when allocating a map for a map literal. Change that to include all entries. This will be an overallocation if the dynamic entries in the map have equal keys, but equal keys in map literals are rare, and at worst we waste a bit of space. Fixes #43020 Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce Reviewed-on: https://go-review.googlesource.com/c/go/+/383634 Run-TryBot: Keith Randall <khr@golang.org> Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
* test/codegen: updated arithmetic tests to verify on ppc64,ppc64leArchana R2021-11-011-0/+36
| | | | | | | | | | | | Updated multiple tests in test/codegen/arithmetic.go to verify on ppc64/ppc64le as well Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/358546 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* test/codegen: updated comparison test to verify on ppc64,ppc64leArchana R2021-10-211-0/+16
| | | | | | | | | | | | Updated test/codegen/comparison.go to verify memequal is inlined as implemented in CL 328291. Change-Id: If7824aed37ee1f8640e54fda0f9b7610582ba316 Reviewed-on: https://go-review.googlesource.com/c/go/+/357289 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: use MOVBE instruction for GOAMD64>=v3wdvxdr2021-10-191-12/+24
| | | | | | | | | | | | | | | In CL 354670, I copied some existing rules for convenience but forgot to update the last rule which broke `GOAMD64=v3 ./make.bat` Revive CL 354670 Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019 Reviewed-on: https://go-review.googlesource.com/c/go/+/356810 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org>
* Revert "cmd/compile: use MOVBE instruction for GOAMD64>=v3"Daniel Martí2021-10-191-24/+12
| | | | | | | | | | | | | | | | This reverts CL 354670. Reason for revert: broke make.bash with GOAMD64=v3. Fixes #49061. Change-Id: I7f2ed99b7c10100c4e0c1462ea91c4c9d8c609b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/356790 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Koichi Shiraishi <zchee.io@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
* cmd/compile/internal/ssagen: set BitLen32 as intrinsic on PPC64Lynn Boger2021-10-181-0/+2
| | | | | | | | | | | | | It was noticed through some other investigation that BitLen32 was not generating the best code and found that it wasn't recognized as an intrinsic. This corrects that and enables the test for PPC64. Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/355872 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: use MOVBE instruction for GOAMD64>=v3wdvxdr2021-10-181-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | encoding/binary benchmark on my laptop: name old time/op new time/op delta ReadSlice1000Int32s-8 4.42µs ± 5% 4.20µs ± 1% -4.94% (p=0.046 n=9+8) ReadStruct-8 359ns ± 8% 368ns ± 5% +2.35% (p=0.041 n=9+10) WriteStruct-8 349ns ± 1% 357ns ± 1% +2.15% (p=0.000 n=8+10) ReadInts-8 235ns ± 1% 233ns ± 1% -1.01% (p=0.005 n=10+10) WriteInts-8 265ns ± 1% 274ns ± 1% +3.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 4.61µs ± 5% 4.59µs ± 5% ~ (p=0.986 n=10+10) PutUint16-8 0.56ns ± 4% 0.57ns ± 4% ~ (p=0.101 n=10+10) PutUint32-8 0.83ns ± 2% 0.56ns ± 6% -32.91% (p=0.000 n=10+10) PutUint64-8 0.81ns ± 3% 0.62ns ± 4% -23.82% (p=0.000 n=10+10) LittleEndianPutUint16-8 0.55ns ± 4% 0.55ns ± 3% ~ (p=0.926 n=10+10) LittleEndianPutUint32-8 0.41ns ± 4% 0.42ns ± 3% ~ (p=0.148 n=10+9) LittleEndianPutUint64-8 0.55ns ± 2% 0.56ns ± 6% ~ (p=0.897 n=10+10) ReadFloats-8 60.4ns ± 4% 59.0ns ± 1% -2.25% (p=0.007 n=10+10) WriteFloats-8 72.3ns ± 2% 71.5ns ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 4.21µs ± 3% 4.18µs ± 2% ~ (p=0.197 n=10+10) WriteSlice1000Float32s-8 4.61µs ± 2% 4.68µs ± 7% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 250ns ± 4% 247ns ± 4% ~ (p=0.324 n=10+10) WriteSlice1000Uint8s-8 227ns ± 5% 229ns ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 15.3ns ± 2% 15.4ns ± 4% ~ (p=0.782 n=10+10) PutUvarint64-8 38.5ns ± 1% 38.6ns ± 5% ~ (p=0.396 n=8+10) name old speed new speed delta ReadSlice1000Int32s-8 890MB/s ±17% 953MB/s ± 1% +7.00% (p=0.027 n=10+8) ReadStruct-8 209MB/s ± 8% 204MB/s ± 5% -2.42% (p=0.043 n=9+10) WriteStruct-8 214MB/s ± 3% 210MB/s ± 1% -1.75% (p=0.003 n=9+10) ReadInts-8 127MB/s ± 1% 129MB/s ± 1% +1.01% (p=0.006 n=10+10) WriteInts-8 113MB/s ± 1% 109MB/s ± 1% -3.34% (p=0.000 n=10+10) WriteSlice1000Int32s-8 868MB/s ± 5% 872MB/s ± 5% ~ (p=1.000 n=10+10) PutUint16-8 3.55GB/s ± 4% 3.50GB/s ± 4% ~ (p=0.093 n=10+10) PutUint32-8 4.83GB/s ± 2% 7.21GB/s ± 6% +49.16% (p=0.000 n=10+10) PutUint64-8 9.89GB/s ± 3% 12.99GB/s ± 4% +31.26% (p=0.000 n=10+10) LittleEndianPutUint16-8 3.65GB/s ± 4% 3.65GB/s ± 4% ~ (p=0.912 n=10+10) LittleEndianPutUint32-8 9.74GB/s ± 3% 9.63GB/s ± 3% ~ (p=0.222 n=9+9) LittleEndianPutUint64-8 14.4GB/s ± 2% 14.3GB/s ± 5% ~ (p=0.912 n=10+10) ReadFloats-8 199MB/s ± 4% 203MB/s ± 1% +2.27% (p=0.007 n=10+10) WriteFloats-8 166MB/s ± 2% 168MB/s ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 949MB/s ± 3% 958MB/s ± 2% ~ (p=0.218 n=10+10) WriteSlice1000Float32s-8 867MB/s ± 2% 857MB/s ± 6% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 4.00GB/s ± 4% 4.06GB/s ± 4% ~ (p=0.353 n=10+10) WriteSlice1000Uint8s-8 4.40GB/s ± 4% 4.36GB/s ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 262MB/s ± 2% 260MB/s ± 4% ~ (p=0.739 n=10+10) PutUvarint64-8 208MB/s ± 1% 207MB/s ± 5% ~ (p=0.408 n=8+10) Updates #45453 Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/354670 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
* cmd/compile: use ANDL for small immediatesJake Ciolek2021-10-123-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode. Looking at Go binary itself, before the change there was: ANDL: 2337 ANDQ: 4476 After the change: ANDL: 3790 ANDQ: 3024 So we got rid of 1452 ANDQs This makes the Linux x86_64 binary 0.03% smaller. There seems to be an impact on performance. Intel Cascade Lake benchmarks (with perflock): name old time/op new time/op delta BinaryTree17-8 1.91s ± 1% 1.89s ± 1% -1.22% (p=0.000 n=21+18) Fannkuch11-8 2.34s ± 0% 2.34s ± 0% ~ (p=0.052 n=20+20) FmtFprintfEmpty-8 27.7ns ± 1% 27.4ns ± 3% ~ (p=0.497 n=21+21) FmtFprintfString-8 53.2ns ± 0% 51.5ns ± 0% -3.21% (p=0.000 n=20+19) FmtFprintfInt-8 57.3ns ± 0% 55.7ns ± 0% -2.89% (p=0.000 n=19+19) FmtFprintfIntInt-8 92.3ns ± 0% 88.4ns ± 1% -4.23% (p=0.000 n=20+21) FmtFprintfPrefixedInt-8 103ns ± 0% 103ns ± 0% +0.23% (p=0.000 n=20+21) FmtFprintfFloat-8 147ns ± 0% 148ns ± 0% +0.75% (p=0.000 n=20+21) FmtManyArgs-8 384ns ± 0% 381ns ± 0% -0.63% (p=0.000 n=21+21) GobDecode-8 3.86ms ± 1% 3.88ms ± 1% +0.52% (p=0.000 n=20+21) GobEncode-8 2.77ms ± 1% 2.77ms ± 0% ~ (p=0.078 n=21+21) Gzip-8 168ms ± 1% 168ms ± 0% +0.24% (p=0.000 n=20+20) Gunzip-8 25.1ms ± 0% 24.3ms ± 0% -3.03% (p=0.000 n=21+21) HTTPClientServer-8 61.4µs ± 8% 59.1µs ±10% ~ (p=0.088 n=20+21) JSONEncode-8 6.86ms ± 0% 6.70ms ± 0% -2.29% (p=0.000 n=20+19) JSONDecode-8 30.8ms ± 1% 30.6ms ± 1% -0.82% (p=0.000 n=20+20) Mandelbrot200-8 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.191 n=16+17) GoParse-8 2.61ms ± 2% 2.60ms ± 1% ~ (p=0.561 n=21+20) RegexpMatchEasy0_32-8 48.5ns ± 2% 45.9ns ± 3% -5.26% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 139ns ± 0% 139ns ± 0% +0.27% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 41.3ns ± 0% 42.1ns ± 4% +1.95% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 216ns ± 2% 216ns ± 0% +0.17% (p=0.020 n=21+19) RegexpMatchMedium_32-8 790ns ± 7% 803ns ± 8% ~ (p=0.178 n=21+21) RegexpMatchMedium_1K-8 23.5µs ± 5% 23.7µs ± 5% ~ (p=0.421 n=21+21) RegexpMatchHard_32-8 1.09µs ± 1% 1.09µs ± 1% -0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.610 n=21+20) Revcomp-8 348ms ± 0% 353ms ± 0% +1.38% (p=0.000 n=17+18) Template-8 42.0ms ± 1% 41.9ms ± 1% -0.30% (p=0.049 n=20+20) TimeParse-8 185ns ± 0% 185ns ± 0% ~ (p=0.387 n=20+18) TimeFormat-8 237ns ± 1% 241ns ± 1% +1.57% (p=0.000 n=21+21) [Geo mean] 35.4µs 35.2µs -0.66% name old speed new speed delta GobDecode-8 199MB/s ± 1% 198MB/s ± 1% -0.52% (p=0.000 n=20+21) GobEncode-8 277MB/s ± 1% 277MB/s ± 0% ~ (p=0.075 n=21+21) Gzip-8 116MB/s ± 1% 115MB/s ± 0% -0.25% (p=0.000 n=20+20) Gunzip-8 773MB/s ± 0% 797MB/s ± 0% +3.12% (p=0.000 n=21+21) JSONEncode-8 283MB/s ± 0% 290MB/s ± 0% +2.35% (p=0.000 n=20+19) JSONDecode-8 63.0MB/s ± 1% 63.5MB/s ± 1% +0.82% (p=0.000 n=20+20) GoParse-8 22.2MB/s ± 2% 22.3MB/s ± 1% ~ (p=0.539 n=21+20) RegexpMatchEasy0_32-8 660MB/s ± 2% 697MB/s ± 3% +5.57% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 7.36GB/s ± 0% 7.34GB/s ± 0% -0.26% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 775MB/s ± 0% 761MB/s ± 4% -1.88% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 4.74GB/s ± 2% 4.74GB/s ± 0% -0.18% (p=0.020 n=21+19) RegexpMatchMedium_32-8 40.6MB/s ± 7% 39.9MB/s ± 9% ~ (p=0.191 n=21+21) RegexpMatchMedium_1K-8 43.7MB/s ± 5% 43.2MB/s ± 5% ~ (p=0.435 n=21+21) RegexpMatchHard_32-8 29.3MB/s ± 1% 29.4MB/s ± 1% +0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 31.0MB/s ± 0% 31.0MB/s ± 0% ~ (p=0.572 n=21+20) Revcomp-8 730MB/s ± 0% 720MB/s ± 0% -1.36% (p=0.000 n=17+18) Template-8 46.2MB/s ± 1% 46.3MB/s ± 1% +0.30% (p=0.041 n=20+20) [Geo mean] 204MB/s 205MB/s +0.30% Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514 Reviewed-on: https://go-review.googlesource.com/c/go/+/354970 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>
* cmd/compile: eliminate successive swapsAlejandro García Montoro2021-10-091-0/+12
| | | | | | | | | | | | | | | | | | | The code generated when storing eight bytes loaded from memory in big endian introduced two successive byte swaps that did not actually modified the data. The new rules match this specific pattern both for amd64 and for arm64, eliminating the double swap. Fixes #41684 Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3 Reviewed-on: https://go-review.googlesource.com/c/go/+/320073 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Trust: Dmitri Shuralyov <dmitshur@golang.org>
* cmd/compile: inline memequal(x, const, sz) for small sizesRuslan Andreev2021-10-061-0/+62
| | | | | | | | | | | | | | | | This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8 bytes size. This PoC is using the same method as CL 248404. This optimization fires about 100 times in Go compiler (1675 occurrences reduced to 1574, so -6%). Also, added unit-tests to codegen/comparisions.go file. Updates #37275 Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da Reviewed-on: https://go-review.googlesource.com/c/go/+/328291 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Trust: David Chase <drchase@google.com>
* cmd/compile: don't emit unnecessary amd64 extension checksnimelehin2021-10-052-1/+14
| | | | | | | | | | | | | In case of amd64 the compiler issues checks if extensions are available on a platform. With GOAMD64 microarchitecture levels provided, some of the checks could be eliminated. Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087 Reviewed-on: https://go-review.googlesource.com/c/go/+/352010 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org>
* cmd/compile: use TZCNT instruction for GOAMD64>=v3wdvxdr2021-10-051-8/+16
| | | | | | | | | | | | | | | | | | | on my Intel CoffeeLake CPU: name old time/op new time/op delta TrailingZeros-8 0.68ns ± 1% 0.64ns ± 1% -6.26% (p=0.000 n=10+10) TrailingZeros8-8 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.697 n=10+10) TrailingZeros16-8 0.70ns ± 1% 0.70ns ± 1% +0.57% (p=0.043 n=10+10) TrailingZeros32-8 0.66ns ± 1% 0.64ns ± 1% -3.35% (p=0.000 n=10+10) TrailingZeros64-8 0.68ns ± 1% 0.64ns ± 1% -5.84% (p=0.000 n=9+10) Updates #45453 Change-Id: I228ff2d51df24b1306136f061432f8a12bb1d6fd Reviewed-on: https://go-review.googlesource.com/c/go/+/353249 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
* test: update test/codegen/noextend.go to work with either ABI on ppc64xLynn Boger2021-09-291-76/+44
| | | | | | | | | | | | This updates the codegen tests in noextend.go so they are not dependent on the ABI. Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a Reviewed-on: https://go-review.googlesource.com/c/go/+/353070 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: add PPC64-specific inlining for runtime.memmoveArchana R2021-09-291-0/+8
| | | | | | | | | | | | | | | | Add rule to PPC64.rules to inline runtime.memmove in more cases, as is done for other target architectures Updated tests in codegen/copy.go to verify changes are done on ppc64/ppc64le Updates #41662 Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/352054 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
* cmd/compile: optimise immediate operands with constants on riscv64Joel Sing2021-09-241-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instructions with immediates can be precomputed when operating on a constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI and ADDI. Additionally, optimise ANDI and ORI when the immediate is all ones or all zeroes. In particular, the RISCV64 logical left and right shift rules (Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount exceeds 64 and if so returns zero. When the shift amount is a constant we can precompute and eliminate the filter entirely. Likewise the arithmetic right shift rules produce sequences that check if the shift amount exceeds 64 and if so, ensures that the lower six bits of the shift are all ones. When the shift amount is a constant we can precompute the shift value. Arithmetic right shift sequences like: 117fc: 00100513 li a0,1 11800: 04053593 sltiu a1,a0,64 11804: fff58593 addi a1,a1,-1 11808: 0015e593 ori a1,a1,1 1180c: 40b45433 sra s0,s0,a1 Are now a single srai instruction: 117fc: 40145413 srai s0,s0,0x1 Likewise for logical left shift (and logical right shift): 1d560: 01100413 li s0,17 1d564: 04043413 sltiu s0,s0,64 1d568: 40800433 neg s0,s0 1d56c: 01131493 slli s1,t1,0x11 1d570: 0084f433 and s0,s1,s0 Which are now a single slli (or srli) instruction: 1d120: 01131413 slli s0,t1,0x11 This removes more than 30,000 instructions from the Go binary and should improve performance in a variety of areas - of note runtime.makemap_small drops from 48 to 36 instructions. Similar gains exist in at least other parts of runtime and math/bits. Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350689 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* test/codegen: add shift tests for RISCV64Joel Sing2021-09-241-28/+92
| | | | | | | | | | | | Add tests for shift by constant, masked shifts and bounded shifts. While here, sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh). Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc Reviewed-on: https://go-review.googlesource.com/c/go/+/351289 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
* cmd/compile: use BMI1 instructions for GOAMD64=v3 and higherMatthew Dempsky2021-09-221-0/+47
| | | | | | | | | | | | | | | BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are easy to peephole optimize, and which GCC always seems to favor using when available and applicable. Updates #45453. Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46 Reviewed-on: https://go-review.googlesource.com/c/go/+/351130 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
* cmd/compile: allow rotates to be merged with logical ops on arm64Keith Randall2021-09-201-0/+23
| | | | | | | | | | | Fixes #48002 Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642 Reviewed-on: https://go-review.googlesource.com/c/go/+/350912 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: fold double negate on arm64Keith Randall2021-09-191-0/+8
| | | | | | | | | | | | Fixes #48467 Change-Id: I52305dbf561ee3eee6c1f053e555a3a6ec1ab892 Reviewed-on: https://go-review.googlesource.com/c/go/+/350910 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
* cmd/compile: implement constant rotates on arm64Keith Randall2021-09-191-2/+17
| | | | | | | | | | | | | | | Explicit constant rotates work, but constant arguments to bits.RotateLeft* needed the additional rule. Fixes #48465 Change-Id: Ia7544f21d0e7587b6b6506f72421459cd769aea6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350909 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
* cmd/compile: add support for Abs and Copysign intrinsics on riscv64Michael Munday2021-09-101-0/+4
| | | | | | | | | | | | | | Also, add the FABSS and FABSD pseudo instructions to the assembler. The compiler could use FSGNJX[SD] directly but there doesn't seem to be much advantage to doing so and the pseudo instructions are easier to understand. Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d Reviewed-on: https://go-review.googlesource.com/c/go/+/348989 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>
* cmd/compile: simiplify arm64 bitfield optimizationsfanzha022021-09-101-44/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some rewrite rules for arm64 bitfield optimizations, the bitfield lsb value and the bitfield width value are related to datasize, some of them use datasize directly to check the bitfield lsb value is valid, to get the bitfiled width value, but some of them call isARM64BFMask() and arm64BFWidth() functions. In order to be consistent, this patch changes them all to use datasize. Besides, this patch sorts the codegen test cases. Run the "toolstash-check -all" command and find one inconsistent code is as the following. new: src/math/fma.go:104 BEQ 247 master: src/math/fma.go:104 BEQ 248 The above inconsistence is due to this patch changing the range of the field lsb value in "UBFIZ" optimization rules from "lc+(32|16|8)<64" to "lc<64", so that the following code is generated as "UBFIZ". The logical of changed code is still correct. The code of src/math/fma.go:160: const uvinf = 0x7FF0000000000000 func FMA(a, b uint32) float64 { ps := a+b return Float64frombits(uint64(ps)<<63 | uvinf) } The new assembly code: TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 UBFIZ $63, R0, $1, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) The master assembly code: TEXT "".FMA(SB), LEAF|NOFRAME|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 MOVWU R0, R0 LSL $63, R0, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35 Reviewed-on: https://go-review.googlesource.com/c/go/+/265038 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* test/codegen: fix package name for test caseMichael Munday2021-09-081-1/+1
| | | | | | | | | | | | | | | | The codegen tests are currently skipped (see #48247). The test added in CL 346050 did not compile because it was in the main package but did not contain a main function. Changing the package to 'codegen' fixes the issue. Updates #48247. Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/348392 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
* test/codegen: fix compilation of bitfield testsMichael Munday2021-09-081-4/+4
| | | | | | | | | | | | | | | | The codegen tests are currently skipped (see #48247) and the bitfield tests do not actually compile due to a duplicate function name (sbfiz5) added in CL 267602. Renaming the function fixes the issue. Updates #48247. Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21 Reviewed-on: https://go-review.googlesource.com/c/go/+/348391 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* test/codegen: remove broken riscv64 testMichael Munday2021-09-081-8/+0
| | | | | | | | | | | | | | | | | This test is not executed by default (see #48247) and does not actually pass. It was added in CL 346689. The code generation changes made in that CL only change how instructions are assembled, they do not actually affect the output of the compiler. This test is unfortunately therefore invalid and will never pass. Updates #48247. Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f Reviewed-on: https://go-review.googlesource.com/c/go/+/348390 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
* cmd/compile: simplify less with non-negative number and constant 0 or 1wdvxdr2021-09-071-0/+31
| | | | | | | | | | | | | | | | | The most common cases: len(s) > 0 len(s) < 1 and they can be simplified to: len(s) != 0 len(s) == 0 Fixes #48054 Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000 Reviewed-on: https://go-review.googlesource.com/c/go/+/346050 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/compile: merge sign extension and shift into SBFIZfanzha022021-09-061-0/+12
| | | | | | | | | | | | | | | This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)" expression as "SBFIZ". Add the test cases. Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d Reviewed-on: https://go-review.googlesource.com/c/go/+/267602 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
* cmd/compile: merge zero/sign extensions with UBFX/SBFX on arm64fanzha022021-09-061-0/+16
| | | | | | | | | | | | | | | | | The UBFX and SBFX already zero/sign extend the result. Further zero/sign extensions are thus unnecessary as long as they leave the top bits unaltered. This patch absorbs zero/sign extensions into UBFX/SBFX. Add the related test cases. Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/265039 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
* cmd/internal/obj/riscv: simplify addition with constantBen Shi2021-09-021-0/+8
| | | | | | | | | | | | | | | | | | | | This CL simplifies riscv addition (add r, imm) to (ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges. (-4096 <= imm <= -2049 or 2048 <= imm <= 4094) There is little impact to the go1 benchmark, while the total size of pkg/linux_riscv64 decreased by about 11KB. Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e GitHub-Last-Rev: a2f56a07635344a40d6b8a9571f236743122be34 GitHub-Pull-Request: golang/go#48110 Reviewed-on: https://go-review.googlesource.com/c/go/+/346689 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@lowrisc.org>
* cmd/{asm,compile}: add fused multiply-add support on riscv64Michael Munday2021-09-011-0/+16
| | | | | | | | | | | | | | | | | Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions. Argument order is: OP RS1, RS2, RS3, RD Also, add support for the FMA intrinsic to the compiler. Automatic FMA matching is left to a future CL. Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/293030 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>
* cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer ↵Jake Ciolek2021-08-251-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arithmetic expressions This applies the following generic integer addition/subtraction transformations: x - (x + y) = -y (x - y) - x = -y y + (x - y) = x y + (z + (x - y) = x + z There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself: runtime.stackfree() runtime.runqdrain() Go binary size reduced by 0.05% on Linux x86_64. StackCopy bench (perflocked Cascade Lake x86): name old time/op new time/op delta StackCopyPtr-8 87.3ms ± 1% 86.9ms ± 0% -0.45% (p=0.000 n=20+20) StackCopy-8 77.6ms ± 1% 77.0ms ± 0% -0.76% (p=0.000 n=20+20) StackCopyNoCache-8 2.28ms ± 2% 2.26ms ± 2% -0.93% (p=0.008 n=19+20) test/bench/go1 benchmarks (perflocked Cascade Lake x86): name old time/op new time/op delta BinaryTree17-8 1.88s ± 1% 1.88s ± 0% ~ (p=0.373 n=15+12) Fannkuch11-8 2.31s ± 0% 2.35s ± 0% +1.52% (p=0.000 n=15+14) FmtFprintfEmpty-8 26.6ns ± 0% 26.6ns ± 0% ~ (p=0.081 n=14+13) FmtFprintfString-8 48.6ns ± 0% 50.0ns ± 0% +2.86% (p=0.000 n=15+14) FmtFprintfInt-8 56.9ns ± 0% 54.8ns ± 0% -3.70% (p=0.000 n=15+15) FmtFprintfIntInt-8 90.4ns ± 0% 88.8ns ± 0% -1.78% (p=0.000 n=15+15) FmtFprintfPrefixedInt-8 104ns ± 0% 104ns ± 0% ~ (p=0.905 n=14+13) FmtFprintfFloat-8 148ns ± 0% 144ns ± 0% -2.19% (p=0.000 n=14+15) FmtManyArgs-8 389ns ± 0% 390ns ± 0% +0.35% (p=0.000 n=12+15) GobDecode-8 3.90ms ± 1% 3.88ms ± 0% -0.49% (p=0.000 n=15+14) GobEncode-8 2.73ms ± 0% 2.73ms ± 0% ~ (p=0.425 n=15+14) Gzip-8 169ms ± 0% 168ms ± 0% -0.52% (p=0.000 n=13+13) Gunzip-8 24.7ms ± 0% 24.8ms ± 0% +0.61% (p=0.000 n=15+15) HTTPClientServer-8 60.5µs ± 6% 60.4µs ± 7% ~ (p=0.595 n=15+15) JSONEncode-8 6.97ms ± 1% 6.93ms ± 0% -0.69% (p=0.000 n=14+14) JSONDecode-8 31.2ms ± 1% 30.8ms ± 1% -1.27% (p=0.000 n=14+14) Mandelbrot200-8 3.87ms ± 0% 3.87ms ± 0% ~ (p=0.652 n=15+14) GoParse-8 2.65ms ± 2% 2.64ms ± 1% ~ (p=0.202 n=15+15) RegexpMatchEasy0_32-8 45.1ns ± 0% 45.9ns ± 0% +1.68% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 140ns ± 0% 139ns ± 0% -0.44% (p=0.000 n=15+14) RegexpMatchEasy1_32-8 40.9ns ± 3% 40.5ns ± 0% -0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 215ns ± 1% 220ns ± 1% +2.27% (p=0.000 n=15+15) RegexpMatchMedium_32-8 783ns ± 7% 738ns ± 0% ~ (p=0.361 n=15+15) RegexpMatchMedium_1K-8 24.1µs ± 6% 23.4µs ± 6% -2.94% (p=0.004 n=15+15) RegexpMatchHard_32-8 1.10µs ± 1% 1.09µs ± 1% -0.40% (p=0.006 n=15+14) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.535 n=12+14) Revcomp-8 354ms ± 0% 353ms ± 0% -0.23% (p=0.002 n=15+13) Template-8 42.0ms ± 1% 41.8ms ± 2% -0.37% (p=0.023 n=14+15) TimeParse-8 181ns ± 0% 180ns ± 1% -0.18% (p=0.014 n=12+13) TimeFormat-8 240ns ± 0% 242ns ± 1% +0.69% (p=0.000 n=12+15) [Geo mean] 35.2µs 35.1µs -0.43% name old speed new speed delta GobDecode-8 197MB/s ± 1% 198MB/s ± 0% +0.49% (p=0.000 n=15+14) GobEncode-8 281MB/s ± 0% 281MB/s ± 0% ~ (p=0.419 n=15+14) Gzip-8 115MB/s ± 0% 115MB/s ± 0% +0.52% (p=0.000 n=13+13) Gunzip-8 786MB/s ± 0% 781MB/s ± 0% -0.60% (p=0.000 n=15+15) JSONEncode-8 278MB/s ± 1% 280MB/s ± 0% +0.69% (p=0.000 n=14+14) JSONDecode-8 62.3MB/s ± 1% 63.1MB/s ± 1% +1.29% (p=0.000 n=14+14) GoParse-8 21.9MB/s ± 2% 22.0MB/s ± 1% ~ (p=0.205 n=15+15) RegexpMatchEasy0_32-8 709MB/s ± 0% 697MB/s ± 0% -1.65% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 7.34GB/s ± 0% 7.37GB/s ± 0% +0.43% (p=0.000 n=15+15) RegexpMatchEasy1_32-8 783MB/s ± 2% 790MB/s ± 0% +0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 4.77GB/s ± 1% 4.66GB/s ± 1% -2.23% (p=0.000 n=15+15) RegexpMatchMedium_32-8 41.0MB/s ± 7% 43.3MB/s ± 0% ~ (p=0.360 n=15+15) RegexpMatchMedium_1K-8 42.5MB/s ± 6% 43.8MB/s ± 6% +3.07% (p=0.004 n=15+15) RegexpMatchHard_32-8 29.2MB/s ± 1% 29.3MB/s ± 1% +0.41% (p=0.006 n=15+14) RegexpMatchHard_1K-8 31.1MB/s ± 0% 31.1MB/s ± 0% ~ (p=0.495 n=12+14) Revcomp-8 718MB/s ± 0% 720MB/s ± 0% +0.23% (p=0.002 n=15+13) Template-8 46.3MB/s ± 1% 46.4MB/s ± 2% +0.38% (p=0.021 n=14+15) [Geo mean] 205MB/s 206MB/s +0.57% Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/344930 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com>
* cmd/compile: intrinsify Mul64 on riscv64Meng Zhuo2021-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | According to RISCV instruction set manual v2.2 Sec 6.1 MULHU followed by MUL will be fused into one multiply by microarchitecture Benchstat on Hifive unmatched: name old time/op new time/op delta Hash8Bytes 245ns ± 3% 186ns ± 4% -23.99% (p=0.000 n=10+10) Hash320Bytes 1.94µs ± 1% 1.31µs ± 1% -32.38% (p=0.000 n=9+10) Hash1K 5.84µs ± 0% 3.84µs ± 0% -34.20% (p=0.000 n=10+9) Hash8K 45.3µs ± 0% 29.4µs ± 0% -35.04% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes 32.7MB/s ± 3% 43.0MB/s ± 4% +31.61% (p=0.000 n=10+10) Hash320Bytes 165MB/s ± 1% 244MB/s ± 1% +47.88% (p=0.000 n=9+10) Hash1K 175MB/s ± 0% 266MB/s ± 0% +51.98% (p=0.000 n=10+9) Hash8K 181MB/s ± 0% 279MB/s ± 0% +53.94% (p=0.000 n=10+10) Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12 Reviewed-on: https://go-review.googlesource.com/c/go/+/329970 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Go Bot <gobot@golang.org> Trust: Meng Zhuo <mzh@golangcn.org>
* [dev.typeparams] cmd/compile: implement clobberdead mode on ARM64Cherry Mui2021-06-031-2/+5
| | | | | | | | | | | For debugging. Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/324765 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org>
* [dev.typeparams] test: adjust codegen test for register ABI on ARM64Cherry Mui2021-06-031-2/+2
| | | | | | | | | | | | | In codegen/arithmetic.go, previously there are MOVD's that match for loads of arguments. With register ABI there are no more such loads. Remove the MOVD matches. Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64 Reviewed-on: https://go-review.googlesource.com/c/go/+/324251 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>