<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/go-git.git/test/writebarrier.go, branch dev.typeparams</title>
<subtitle>github.com: golang/go
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/'/>
<entry>
<title>cmd/compile: improve generated code for concrete cases in type switches</title>
<updated>2020-04-14T17:34:31+00:00</updated>
<author>
<name>Josh Bleecher Snyder</name>
<email>josharian@gmail.com</email>
</author>
<published>2020-04-13T00:34:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=2db4cc38a02d6e74c45365344c6d8738fdb38a00'/>
<id>2db4cc38a02d6e74c45365344c6d8738fdb38a00</id>
<content type='text'>
Consider

switch x:= x.(type) {
case int:
  // int stmts
case error:
  // error stmts
}

Prior to this change, we lowered this roughly as:

if x, ok := x.(int); ok {
  // int stmts
} else if x, ok := x.(error); ok {
  // error stmts
}

x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2.

x, ok := x.(int) generates inline code that checks whether x has type int,
and populates x and ok as appropriate. We then immediately branch again on ok.
The shortcircuit pass in the SSA backend is designed to recognize situations
like this, in which we are immediately branching on a bool value
that we just calculated with a branch.

However, the shortcircuit pass has limitations when the intermediate state has phis.
In this case, the phi value is x (the int).
CL 222923 improved the situation, but many cases are still unhandled.
I have further improvements in progress, which is how I found this particular problem,
but they are expensive, and may or may not see the light of day.

In the common case of a lone concrete type in a type switch case,
it is easier and cheaper to simply lower a different way, roughly:

if _, ok := x.(int); ok {
  x := x.(int)
  // int stmts
}

Instead of using a type assertion, though, we extract the value of x
from the interface directly.

This removes the need to track x (the int) across the branch on ok,
which removes the phi, which lets the shortcircuit pass do its job.

Benchmarks for encoding/binary show improvements, as well as some
wild swings on the super fast benchmarks (alignment effects?):

name                      old time/op    new time/op    delta
ReadSlice1000Int32s-8       5.25µs ± 2%    4.87µs ± 3%   -7.11%  (p=0.000 n=44+49)
ReadStruct-8                 451ns ± 2%     417ns ± 2%   -7.39%  (p=0.000 n=45+46)
WriteStruct-8                412ns ± 2%     405ns ± 3%   -1.58%  (p=0.000 n=46+48)
ReadInts-8                   296ns ± 8%     275ns ± 3%   -7.23%  (p=0.000 n=48+50)
WriteInts-8                  324ns ± 1%     318ns ± 2%   -1.67%  (p=0.000 n=44+49)
WriteSlice1000Int32s-8      5.21µs ± 2%    4.92µs ± 1%   -5.67%  (p=0.000 n=46+44)
PutUint16-8                 0.58ns ± 2%    0.59ns ± 2%   +0.63%  (p=0.000 n=49+49)
PutUint32-8                 0.87ns ± 1%    0.58ns ± 1%  -33.10%  (p=0.000 n=46+44)
PutUint64-8                 0.66ns ± 2%    0.87ns ± 2%  +33.07%  (p=0.000 n=47+48)
LittleEndianPutUint16-8     0.86ns ± 2%    0.87ns ± 2%   +0.55%  (p=0.003 n=47+50)
LittleEndianPutUint32-8     0.87ns ± 1%    0.87ns ± 1%     ~     (p=0.547 n=45+47)
LittleEndianPutUint64-8     0.87ns ± 2%    0.87ns ± 1%     ~     (p=0.451 n=46+47)
ReadFloats-8                79.8ns ± 5%    75.9ns ± 2%   -4.83%  (p=0.000 n=50+47)
WriteFloats-8               89.3ns ± 1%    88.9ns ± 1%   -0.48%  (p=0.000 n=46+44)
ReadSlice1000Float32s-8     5.51µs ± 1%    4.87µs ± 2%  -11.74%  (p=0.000 n=47+46)
WriteSlice1000Float32s-8    5.51µs ± 1%    4.93µs ± 1%  -10.60%  (p=0.000 n=48+47)
PutUvarint32-8              25.9ns ± 2%    24.0ns ± 2%   -7.02%  (p=0.000 n=48+50)
PutUvarint64-8              75.1ns ± 1%    61.5ns ± 2%  -18.12%  (p=0.000 n=45+47)
[Geo mean]                  57.3ns         54.3ns        -5.33%

Despite the rarity of type switches, this generates noticeably smaller binaries.

file      before    after     Δ       %
addr2line 4413296   4409200   -4096   -0.093%
api       5982648   5962168   -20480  -0.342%
cgo       4854168   4833688   -20480  -0.422%
compile   19694784  19682560  -12224  -0.062%
cover     5278008   5265720   -12288  -0.233%
doc       4694824   4682536   -12288  -0.262%
fix       3411336   3394952   -16384  -0.480%
link      6721496   6717400   -4096   -0.061%
nm        4371152   4358864   -12288  -0.281%
objdump   4760960   4752768   -8192   -0.172%
pprof     14810820  14790340  -20480  -0.138%
trace     11681076  11668788  -12288  -0.105%
vet       8285464   8244504   -40960  -0.494%
total     115824120 115627576 -196544 -0.170%

Compiler performance is marginally improved (note that go/types has many type switches):

name        old alloc/op      new alloc/op      delta
Template         35.0MB ± 0%       35.0MB ± 0%  +0.09%  (p=0.008 n=5+5)
Unicode          28.5MB ± 0%       28.5MB ± 0%    ~     (p=0.548 n=5+5)
GoTypes           114MB ± 0%        114MB ± 0%  -0.76%  (p=0.008 n=5+5)
Compiler          541MB ± 0%        541MB ± 0%  -0.03%  (p=0.008 n=5+5)
SSA              1.17GB ± 0%       1.17GB ± 0%    ~     (p=0.841 n=5+5)
Flate            21.9MB ± 0%       21.9MB ± 0%    ~     (p=0.421 n=5+5)
GoParser         26.9MB ± 0%       26.9MB ± 0%    ~     (p=0.222 n=5+5)
Reflect          74.6MB ± 0%       74.6MB ± 0%    ~     (p=1.000 n=5+5)
Tar              32.9MB ± 0%       32.8MB ± 0%    ~     (p=0.056 n=5+5)
XML              42.4MB ± 0%       42.1MB ± 0%  -0.77%  (p=0.008 n=5+5)
[Geo mean]       73.2MB            73.1MB       -0.15%

name        old allocs/op     new allocs/op     delta
Template           377k ± 0%         377k ± 0%  +0.06%  (p=0.008 n=5+5)
Unicode            354k ± 0%         354k ± 0%    ~     (p=0.095 n=5+5)
GoTypes           1.31M ± 0%        1.30M ± 0%  -0.73%  (p=0.008 n=5+5)
Compiler          5.44M ± 0%        5.44M ± 0%  -0.04%  (p=0.008 n=5+5)
SSA               11.7M ± 0%        11.7M ± 0%    ~     (p=1.000 n=5+5)
Flate              239k ± 0%         239k ± 0%    ~     (p=1.000 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            977k ± 0%         977k ± 0%    ~     (p=0.690 n=5+5)
Tar                346k ± 0%         346k ± 0%    ~     (p=0.889 n=5+5)
XML                431k ± 0%         430k ± 0%  -0.25%  (p=0.008 n=5+5)
[Geo mean]         806k              806k       -0.10%

For packages with many type switches, this considerably shrinks function text size.
Some examples:

file                                                           before   after    Δ       %
encoding/binary.s                                              30726    29504    -1222   -3.977%
go/printer.s                                                   77597    76005    -1592   -2.052%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                 65704    63318    -2386   -3.631%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047     7714     -333    -4.138%

Text size regressions are rare.

Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28
Reviewed-on: https://go-review.googlesource.com/c/go/+/228106
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cuong Manh Le &lt;cuong.manhle.vn@gmail.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Consider

switch x:= x.(type) {
case int:
  // int stmts
case error:
  // error stmts
}

Prior to this change, we lowered this roughly as:

if x, ok := x.(int); ok {
  // int stmts
} else if x, ok := x.(error); ok {
  // error stmts
}

x, ok := x.(error) is implemented with a call to runtime.assertE2I2 or runtime.assertI2I2.

x, ok := x.(int) generates inline code that checks whether x has type int,
and populates x and ok as appropriate. We then immediately branch again on ok.
The shortcircuit pass in the SSA backend is designed to recognize situations
like this, in which we are immediately branching on a bool value
that we just calculated with a branch.

However, the shortcircuit pass has limitations when the intermediate state has phis.
In this case, the phi value is x (the int).
CL 222923 improved the situation, but many cases are still unhandled.
I have further improvements in progress, which is how I found this particular problem,
but they are expensive, and may or may not see the light of day.

In the common case of a lone concrete type in a type switch case,
it is easier and cheaper to simply lower a different way, roughly:

if _, ok := x.(int); ok {
  x := x.(int)
  // int stmts
}

Instead of using a type assertion, though, we extract the value of x
from the interface directly.

This removes the need to track x (the int) across the branch on ok,
which removes the phi, which lets the shortcircuit pass do its job.

Benchmarks for encoding/binary show improvements, as well as some
wild swings on the super fast benchmarks (alignment effects?):

name                      old time/op    new time/op    delta
ReadSlice1000Int32s-8       5.25µs ± 2%    4.87µs ± 3%   -7.11%  (p=0.000 n=44+49)
ReadStruct-8                 451ns ± 2%     417ns ± 2%   -7.39%  (p=0.000 n=45+46)
WriteStruct-8                412ns ± 2%     405ns ± 3%   -1.58%  (p=0.000 n=46+48)
ReadInts-8                   296ns ± 8%     275ns ± 3%   -7.23%  (p=0.000 n=48+50)
WriteInts-8                  324ns ± 1%     318ns ± 2%   -1.67%  (p=0.000 n=44+49)
WriteSlice1000Int32s-8      5.21µs ± 2%    4.92µs ± 1%   -5.67%  (p=0.000 n=46+44)
PutUint16-8                 0.58ns ± 2%    0.59ns ± 2%   +0.63%  (p=0.000 n=49+49)
PutUint32-8                 0.87ns ± 1%    0.58ns ± 1%  -33.10%  (p=0.000 n=46+44)
PutUint64-8                 0.66ns ± 2%    0.87ns ± 2%  +33.07%  (p=0.000 n=47+48)
LittleEndianPutUint16-8     0.86ns ± 2%    0.87ns ± 2%   +0.55%  (p=0.003 n=47+50)
LittleEndianPutUint32-8     0.87ns ± 1%    0.87ns ± 1%     ~     (p=0.547 n=45+47)
LittleEndianPutUint64-8     0.87ns ± 2%    0.87ns ± 1%     ~     (p=0.451 n=46+47)
ReadFloats-8                79.8ns ± 5%    75.9ns ± 2%   -4.83%  (p=0.000 n=50+47)
WriteFloats-8               89.3ns ± 1%    88.9ns ± 1%   -0.48%  (p=0.000 n=46+44)
ReadSlice1000Float32s-8     5.51µs ± 1%    4.87µs ± 2%  -11.74%  (p=0.000 n=47+46)
WriteSlice1000Float32s-8    5.51µs ± 1%    4.93µs ± 1%  -10.60%  (p=0.000 n=48+47)
PutUvarint32-8              25.9ns ± 2%    24.0ns ± 2%   -7.02%  (p=0.000 n=48+50)
PutUvarint64-8              75.1ns ± 1%    61.5ns ± 2%  -18.12%  (p=0.000 n=45+47)
[Geo mean]                  57.3ns         54.3ns        -5.33%

Despite the rarity of type switches, this generates noticeably smaller binaries.

file      before    after     Δ       %
addr2line 4413296   4409200   -4096   -0.093%
api       5982648   5962168   -20480  -0.342%
cgo       4854168   4833688   -20480  -0.422%
compile   19694784  19682560  -12224  -0.062%
cover     5278008   5265720   -12288  -0.233%
doc       4694824   4682536   -12288  -0.262%
fix       3411336   3394952   -16384  -0.480%
link      6721496   6717400   -4096   -0.061%
nm        4371152   4358864   -12288  -0.281%
objdump   4760960   4752768   -8192   -0.172%
pprof     14810820  14790340  -20480  -0.138%
trace     11681076  11668788  -12288  -0.105%
vet       8285464   8244504   -40960  -0.494%
total     115824120 115627576 -196544 -0.170%

Compiler performance is marginally improved (note that go/types has many type switches):

name        old alloc/op      new alloc/op      delta
Template         35.0MB ± 0%       35.0MB ± 0%  +0.09%  (p=0.008 n=5+5)
Unicode          28.5MB ± 0%       28.5MB ± 0%    ~     (p=0.548 n=5+5)
GoTypes           114MB ± 0%        114MB ± 0%  -0.76%  (p=0.008 n=5+5)
Compiler          541MB ± 0%        541MB ± 0%  -0.03%  (p=0.008 n=5+5)
SSA              1.17GB ± 0%       1.17GB ± 0%    ~     (p=0.841 n=5+5)
Flate            21.9MB ± 0%       21.9MB ± 0%    ~     (p=0.421 n=5+5)
GoParser         26.9MB ± 0%       26.9MB ± 0%    ~     (p=0.222 n=5+5)
Reflect          74.6MB ± 0%       74.6MB ± 0%    ~     (p=1.000 n=5+5)
Tar              32.9MB ± 0%       32.8MB ± 0%    ~     (p=0.056 n=5+5)
XML              42.4MB ± 0%       42.1MB ± 0%  -0.77%  (p=0.008 n=5+5)
[Geo mean]       73.2MB            73.1MB       -0.15%

name        old allocs/op     new allocs/op     delta
Template           377k ± 0%         377k ± 0%  +0.06%  (p=0.008 n=5+5)
Unicode            354k ± 0%         354k ± 0%    ~     (p=0.095 n=5+5)
GoTypes           1.31M ± 0%        1.30M ± 0%  -0.73%  (p=0.008 n=5+5)
Compiler          5.44M ± 0%        5.44M ± 0%  -0.04%  (p=0.008 n=5+5)
SSA               11.7M ± 0%        11.7M ± 0%    ~     (p=1.000 n=5+5)
Flate              239k ± 0%         239k ± 0%    ~     (p=1.000 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            977k ± 0%         977k ± 0%    ~     (p=0.690 n=5+5)
Tar                346k ± 0%         346k ± 0%    ~     (p=0.889 n=5+5)
XML                431k ± 0%         430k ± 0%  -0.25%  (p=0.008 n=5+5)
[Geo mean]         806k              806k       -0.10%

For packages with many type switches, this considerably shrinks function text size.
Some examples:

file                                                           before   after    Δ       %
encoding/binary.s                                              30726    29504    -1222   -3.977%
go/printer.s                                                   77597    76005    -1592   -2.052%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                 65704    63318    -2386   -3.631%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8047     7714     -333    -4.138%

Text size regressions are rare.

Change-Id: Ic10982bbb04876250eaa5bfee97990141ae5fc28
Reviewed-on: https://go-review.googlesource.com/c/go/+/228106
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cuong Manh Le &lt;cuong.manhle.vn@gmail.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: better write barrier removal when initializing new objects</title>
<updated>2019-03-18T21:16:19+00:00</updated>
<author>
<name>Keith Randall</name>
<email>khr@google.com</email>
</author>
<published>2019-01-05T01:34:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=ca36af215f78b670000b31e7573f0fcf0c5de594'/>
<id>ca36af215f78b670000b31e7573f0fcf0c5de594</id>
<content type='text'>
When initializing a new object, we're often writing
1) to a location that doesn't have a pointer to a heap object
2) a pointer that doesn't point to a heap object

When both those conditions are true, we can avoid the write barrier.

This CL detects case 1 by looking for writes to known-zeroed
locations.  The results of runtime.newobject are zeroed, and we
perform a simple tracking of which parts of that object are written so
we can determine what part remains zero at each write.

This CL detects case 2 by looking for addresses of globals (including
the types and itabs which are used in interfaces) and for nil pointers.

Makes cmd/go 0.3% smaller. Some particular cases, like the slice
literal in #29573, can get much smaller.

TODO: we can remove actual zero writes also with this mechanism.

Update #29573

Change-Id: Ie74a3533775ea88da0495ba02458391e5db26cb9
Reviewed-on: https://go-review.googlesource.com/c/go/+/156363
Run-TryBot: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When initializing a new object, we're often writing
1) to a location that doesn't have a pointer to a heap object
2) a pointer that doesn't point to a heap object

When both those conditions are true, we can avoid the write barrier.

This CL detects case 1 by looking for writes to known-zeroed
locations.  The results of runtime.newobject are zeroed, and we
perform a simple tracking of which parts of that object are written so
we can determine what part remains zero at each write.

This CL detects case 2 by looking for addresses of globals (including
the types and itabs which are used in interfaces) and for nil pointers.

Makes cmd/go 0.3% smaller. Some particular cases, like the slice
literal in #29573, can get much smaller.

TODO: we can remove actual zero writes also with this mechanism.

Update #29573

Change-Id: Ie74a3533775ea88da0495ba02458391e5db26cb9
Reviewed-on: https://go-review.googlesource.com/c/go/+/156363
Run-TryBot: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: eliminate write barriers when writing non-heap ptrs</title>
<updated>2018-11-29T22:23:02+00:00</updated>
<author>
<name>Keith Randall</name>
<email>khr@google.com</email>
</author>
<published>2018-11-27T20:40:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=2140975ebde164ea1eaa70fc72775c03567f2bc9'/>
<id>2140975ebde164ea1eaa70fc72775c03567f2bc9</id>
<content type='text'>
We don't need a write barrier if:
1) The location we're writing to doesn't hold a heap pointer, and
2) The value we're writing isn't a heap pointer.

The freshly returned value from runtime.newobject satisfies (1).
Pointers to globals, and the contents of the read-only data section satisfy (2).

This is particularly helpful for code like:
p := []string{"abc", "def", "ghi"}

Where the compiler generates:
   a := new([3]string)
   move(a, statictmp_)  // eliminates write barriers here
   p := a[:]

For big slice literals, this makes the code a smaller and faster to
compile.

Update #13554. Reduces the compile time by ~10% and RSS by ~30%.

Change-Id: Icab81db7591c8777f68e5d528abd48c7e44c87eb
Reviewed-on: https://go-review.googlesource.com/c/151498
Run-TryBot: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't need a write barrier if:
1) The location we're writing to doesn't hold a heap pointer, and
2) The value we're writing isn't a heap pointer.

The freshly returned value from runtime.newobject satisfies (1).
Pointers to globals, and the contents of the read-only data section satisfy (2).

This is particularly helpful for code like:
p := []string{"abc", "def", "ghi"}

Where the compiler generates:
   a := new([3]string)
   move(a, statictmp_)  // eliminates write barriers here
   p := a[:]

For big slice literals, this makes the code a smaller and faster to
compile.

Update #13554. Reduces the compile time by ~10% and RSS by ~30%.

Change-Id: Icab81db7591c8777f68e5d528abd48c7e44c87eb
Reviewed-on: https://go-review.googlesource.com/c/151498
Run-TryBot: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: don't use statictmps for SSA-able composite literals</title>
<updated>2017-05-11T18:28:40+00:00</updated>
<author>
<name>Josh Bleecher Snyder</name>
<email>josharian@gmail.com</email>
</author>
<published>2017-05-10T19:48:17+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=ee69c21747b40f79351dab63b5ac6715c86e04f2'/>
<id>ee69c21747b40f79351dab63b5ac6715c86e04f2</id>
<content type='text'>
The writebarrier test has to change.
Now that T23 composite literals are passed to the backend,
they get SSA'd, so writes to their fields are treated separately,
so the relevant part of the first write to t23 is now a dead store.
Preserve the intent of the test by splitting it up into two functions.

Reduces code size a bit:

name        old object-bytes  new object-bytes  delta
Template           386k ± 0%         386k ± 0%    ~     (all equal)
Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (all equal)
Compiler          3.92M ± 0%        3.91M ± 0%  -0.19%  (p=0.008 n=5+5)
SSA               7.91M ± 0%        7.91M ± 0%    ~     (all equal)
Flate              228k ± 0%         228k ± 0%  -0.05%  (p=0.008 n=5+5)
GoParser           283k ± 0%         283k ± 0%    ~     (all equal)
Reflect            952k ± 0%         952k ± 0%  -0.06%  (p=0.008 n=5+5)
Tar                188k ± 0%         188k ± 0%  -0.09%  (p=0.008 n=5+5)
XML                406k ± 0%         406k ± 0%  -0.02%  (p=0.008 n=5+5)
[Geo mean]         649k              648k       -0.04%

Fixes #18872

Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
Reviewed-on: https://go-review.googlesource.com/43154
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The writebarrier test has to change.
Now that T23 composite literals are passed to the backend,
they get SSA'd, so writes to their fields are treated separately,
so the relevant part of the first write to t23 is now a dead store.
Preserve the intent of the test by splitting it up into two functions.

Reduces code size a bit:

name        old object-bytes  new object-bytes  delta
Template           386k ± 0%         386k ± 0%    ~     (all equal)
Unicode            202k ± 0%         202k ± 0%    ~     (all equal)
GoTypes           1.16M ± 0%        1.16M ± 0%    ~     (all equal)
Compiler          3.92M ± 0%        3.91M ± 0%  -0.19%  (p=0.008 n=5+5)
SSA               7.91M ± 0%        7.91M ± 0%    ~     (all equal)
Flate              228k ± 0%         228k ± 0%  -0.05%  (p=0.008 n=5+5)
GoParser           283k ± 0%         283k ± 0%    ~     (all equal)
Reflect            952k ± 0%         952k ± 0%  -0.06%  (p=0.008 n=5+5)
Tar                188k ± 0%         188k ± 0%  -0.09%  (p=0.008 n=5+5)
XML                406k ± 0%         406k ± 0%  -0.02%  (p=0.008 n=5+5)
[Geo mean]         649k              648k       -0.04%

Fixes #18872

Change-Id: Ifeed0f71f13849732999aa731cc2bf40c0f0e32a
Reviewed-on: https://go-review.googlesource.com/43154
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: move writebarrier pass after dse</title>
<updated>2017-04-29T16:37:02+00:00</updated>
<author>
<name>Josh Bleecher Snyder</name>
<email>josharian@gmail.com</email>
</author>
<published>2017-04-27T20:15:24+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=e5c9358fe2ff1d0c1e9a76e5e5d0e7d65c308da6'/>
<id>e5c9358fe2ff1d0c1e9a76e5e5d0e7d65c308da6</id>
<content type='text'>
This avoids generating writeBarrier.enabled
blocks for dead stores.

Change-Id: Ib11d8e2ba952f3f1f01d16776e40a7200a7683cf
Reviewed-on: https://go-review.googlesource.com/42012
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This avoids generating writeBarrier.enabled
blocks for dead stores.

Change-Id: Ib11d8e2ba952f3f1f01d16776e40a7200a7683cf
Reviewed-on: https://go-review.googlesource.com/42012
Run-TryBot: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: do not use "oaslit" for global</title>
<updated>2017-02-07T17:23:23+00:00</updated>
<author>
<name>Cherry Zhang</name>
<email>cherryyz@google.com</email>
</author>
<published>2017-02-06T19:24:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=160914e33ca6521d74297291d801062cc44d794d'/>
<id>160914e33ca6521d74297291d801062cc44d794d</id>
<content type='text'>
The compiler did not emit write barrier for assigning global with
struct literal, like global = T{} where T contains pointer.

The relevant code path is:
walkexpr OAS var_ OSTRUCTLIT
    oaslit
        anylit OSTRUCTLIT
            walkexpr OAS var_ nil
            return without adding write barrier
    return true
break (without adding write barrier)

This CL makes oaslit not apply to globals. See also CL
https://go-review.googlesource.com/c/36355/ for an alternative
fix.

The downside of this is that it generates static data for zeroing
struct now. Also this only covers global. If there is any lurking
bug with implicit zeroing other than globals, this doesn't fix.

Fixes #18956.

Change-Id: Ibcd27e4fae3aa38390ffa94a32a9dd7a802e4b37
Reviewed-on: https://go-review.googlesource.com/36410
Reviewed-by: Russ Cox &lt;rsc@golang.org&gt;
Run-TryBot: Cherry Zhang &lt;cherryyz@google.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The compiler did not emit write barrier for assigning global with
struct literal, like global = T{} where T contains pointer.

The relevant code path is:
walkexpr OAS var_ OSTRUCTLIT
    oaslit
        anylit OSTRUCTLIT
            walkexpr OAS var_ nil
            return without adding write barrier
    return true
break (without adding write barrier)

This CL makes oaslit not apply to globals. See also CL
https://go-review.googlesource.com/c/36355/ for an alternative
fix.

The downside of this is that it generates static data for zeroing
struct now. Also this only covers global. If there is any lurking
bug with implicit zeroing other than globals, this doesn't fix.

Fixes #18956.

Change-Id: Ibcd27e4fae3aa38390ffa94a32a9dd7a802e4b37
Reviewed-on: https://go-review.googlesource.com/36410
Reviewed-by: Russ Cox &lt;rsc@golang.org&gt;
Run-TryBot: Cherry Zhang &lt;cherryyz@google.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: disable various write barrier optimizations</title>
<updated>2016-10-28T20:05:58+00:00</updated>
<author>
<name>Austin Clements</name>
<email>austin@google.com</email>
</author>
<published>2016-10-18T14:26:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=c39918a04991387f14cab1204f54fafab81bc105'/>
<id>c39918a04991387f14cab1204f54fafab81bc105</id>
<content type='text'>
Several of our current write barrier elision optimizations are invalid
with the hybrid barrier. Eliding the hybrid barrier requires that
*both* the current and new pointer be already shaded and, since we
don't have the flow analysis to figure out anything about the slot's
current value, for now we have to just disable several of these
optimizations.

This has a slight impact on binary size. On linux/amd64, the go tool
binary increases by 0.7% and the compile binary increases by 1.5%.

It also has a slight impact on performance, as one would expect. We'll
win some of this back in subsequent commits.

name                      old time/op    new time/op    delta
BinaryTree17-12              2.38s ± 1%     2.40s ± 1%  +0.82%  (p=0.000 n=18+20)
Fannkuch11-12                2.84s ± 1%     2.70s ± 0%  -4.97%  (p=0.000 n=18+18)
FmtFprintfEmpty-12          44.2ns ± 1%    46.4ns ± 2%  +4.89%  (p=0.000 n=16+18)
FmtFprintfString-12          131ns ± 0%     134ns ± 1%  +2.05%  (p=0.000 n=12+19)
FmtFprintfInt-12             114ns ± 1%     117ns ± 1%  +3.26%  (p=0.000 n=19+20)
FmtFprintfIntInt-12          176ns ± 1%     181ns ± 1%  +3.25%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     185ns ± 1%     190ns ± 1%  +2.77%  (p=0.000 n=19+18)
FmtFprintfFloat-12           249ns ± 1%     254ns ± 1%  +1.71%  (p=0.000 n=18+20)
FmtManyArgs-12               747ns ± 1%     743ns ± 1%  -0.58%  (p=0.000 n=19+18)
GobDecode-12                6.57ms ± 1%    6.61ms ± 0%  +0.73%  (p=0.000 n=19+20)
GobEncode-12                5.58ms ± 1%    5.60ms ± 0%  +0.27%  (p=0.001 n=18+18)
Gzip-12                      223ms ± 1%     223ms ± 1%    ~     (p=0.351 n=19+20)
Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=0.095 n=16+20)
HTTPClientServer-12         77.8µs ± 1%    78.5µs ± 1%  +0.97%  (p=0.000 n=19+20)
JSONEncode-12               14.8ms ± 1%    14.8ms ± 1%    ~     (p=0.079 n=20+19)
JSONDecode-12               53.7ms ± 1%    54.2ms ± 1%  +0.92%  (p=0.000 n=20+19)
Mandelbrot200-12            3.81ms ± 1%    3.81ms ± 0%    ~     (p=0.916 n=19+18)
GoParse-12                  3.19ms ± 1%    3.19ms ± 1%    ~     (p=0.175 n=20+19)
RegexpMatchEasy0_32-12      71.9ns ± 1%    70.6ns ± 1%  -1.87%  (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12       946ns ± 0%     944ns ± 0%  -0.22%  (p=0.000 n=19+16)
RegexpMatchEasy1_32-12      67.3ns ± 2%    66.8ns ± 1%  -0.72%  (p=0.008 n=20+20)
RegexpMatchEasy1_1K-12       374ns ± 1%     384ns ± 1%  +2.69%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 1%    ~     (p=1.000 n=20+20)
RegexpMatchMedium_1K-12     34.3µs ± 1%    34.6µs ± 1%  +0.90%  (p=0.000 n=20+20)
RegexpMatchHard_32-12       1.78µs ± 1%    1.80µs ± 1%  +1.45%  (p=0.000 n=20+19)
RegexpMatchHard_1K-12       53.6µs ± 0%    54.5µs ± 1%  +1.52%  (p=0.000 n=19+18)
Revcomp-12                   417ms ± 5%     391ms ± 1%  -6.42%  (p=0.000 n=16+19)
Template-12                 61.1ms ± 1%    64.2ms ± 0%  +5.07%  (p=0.000 n=19+20)
TimeParse-12                 302ns ± 1%     305ns ± 1%  +0.90%  (p=0.000 n=18+18)
TimeFormat-12                319ns ± 1%     315ns ± 1%  -1.25%  (p=0.000 n=18+18)
[Geo mean]                  54.0µs         54.3µs       +0.58%

name         old time/op  new time/op  delta
XGarbage-12  2.24ms ± 2%  2.28ms ± 1%  +1.68%  (p=0.000 n=18+17)
XHTTP-12     11.4µs ± 1%  11.6µs ± 2%  +1.63%  (p=0.000 n=18+18)
XJSON-12     11.6ms ± 0%  12.5ms ± 0%  +7.84%  (p=0.000 n=18+17)

Updates #17503.

Change-Id: I1899f8e35662971e24bf692b517dfbe2b533c00c
Reviewed-on: https://go-review.googlesource.com/31572
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Several of our current write barrier elision optimizations are invalid
with the hybrid barrier. Eliding the hybrid barrier requires that
*both* the current and new pointer be already shaded and, since we
don't have the flow analysis to figure out anything about the slot's
current value, for now we have to just disable several of these
optimizations.

This has a slight impact on binary size. On linux/amd64, the go tool
binary increases by 0.7% and the compile binary increases by 1.5%.

It also has a slight impact on performance, as one would expect. We'll
win some of this back in subsequent commits.

name                      old time/op    new time/op    delta
BinaryTree17-12              2.38s ± 1%     2.40s ± 1%  +0.82%  (p=0.000 n=18+20)
Fannkuch11-12                2.84s ± 1%     2.70s ± 0%  -4.97%  (p=0.000 n=18+18)
FmtFprintfEmpty-12          44.2ns ± 1%    46.4ns ± 2%  +4.89%  (p=0.000 n=16+18)
FmtFprintfString-12          131ns ± 0%     134ns ± 1%  +2.05%  (p=0.000 n=12+19)
FmtFprintfInt-12             114ns ± 1%     117ns ± 1%  +3.26%  (p=0.000 n=19+20)
FmtFprintfIntInt-12          176ns ± 1%     181ns ± 1%  +3.25%  (p=0.000 n=20+20)
FmtFprintfPrefixedInt-12     185ns ± 1%     190ns ± 1%  +2.77%  (p=0.000 n=19+18)
FmtFprintfFloat-12           249ns ± 1%     254ns ± 1%  +1.71%  (p=0.000 n=18+20)
FmtManyArgs-12               747ns ± 1%     743ns ± 1%  -0.58%  (p=0.000 n=19+18)
GobDecode-12                6.57ms ± 1%    6.61ms ± 0%  +0.73%  (p=0.000 n=19+20)
GobEncode-12                5.58ms ± 1%    5.60ms ± 0%  +0.27%  (p=0.001 n=18+18)
Gzip-12                      223ms ± 1%     223ms ± 1%    ~     (p=0.351 n=19+20)
Gunzip-12                   37.9ms ± 0%    37.9ms ± 1%    ~     (p=0.095 n=16+20)
HTTPClientServer-12         77.8µs ± 1%    78.5µs ± 1%  +0.97%  (p=0.000 n=19+20)
JSONEncode-12               14.8ms ± 1%    14.8ms ± 1%    ~     (p=0.079 n=20+19)
JSONDecode-12               53.7ms ± 1%    54.2ms ± 1%  +0.92%  (p=0.000 n=20+19)
Mandelbrot200-12            3.81ms ± 1%    3.81ms ± 0%    ~     (p=0.916 n=19+18)
GoParse-12                  3.19ms ± 1%    3.19ms ± 1%    ~     (p=0.175 n=20+19)
RegexpMatchEasy0_32-12      71.9ns ± 1%    70.6ns ± 1%  -1.87%  (p=0.000 n=19+20)
RegexpMatchEasy0_1K-12       946ns ± 0%     944ns ± 0%  -0.22%  (p=0.000 n=19+16)
RegexpMatchEasy1_32-12      67.3ns ± 2%    66.8ns ± 1%  -0.72%  (p=0.008 n=20+20)
RegexpMatchEasy1_1K-12       374ns ± 1%     384ns ± 1%  +2.69%  (p=0.000 n=18+20)
RegexpMatchMedium_32-12      107ns ± 1%     107ns ± 1%    ~     (p=1.000 n=20+20)
RegexpMatchMedium_1K-12     34.3µs ± 1%    34.6µs ± 1%  +0.90%  (p=0.000 n=20+20)
RegexpMatchHard_32-12       1.78µs ± 1%    1.80µs ± 1%  +1.45%  (p=0.000 n=20+19)
RegexpMatchHard_1K-12       53.6µs ± 0%    54.5µs ± 1%  +1.52%  (p=0.000 n=19+18)
Revcomp-12                   417ms ± 5%     391ms ± 1%  -6.42%  (p=0.000 n=16+19)
Template-12                 61.1ms ± 1%    64.2ms ± 0%  +5.07%  (p=0.000 n=19+20)
TimeParse-12                 302ns ± 1%     305ns ± 1%  +0.90%  (p=0.000 n=18+18)
TimeFormat-12                319ns ± 1%     315ns ± 1%  -1.25%  (p=0.000 n=18+18)
[Geo mean]                  54.0µs         54.3µs       +0.58%

name         old time/op  new time/op  delta
XGarbage-12  2.24ms ± 2%  2.28ms ± 1%  +1.68%  (p=0.000 n=18+17)
XHTTP-12     11.4µs ± 1%  11.6µs ± 2%  +1.63%  (p=0.000 n=18+18)
XJSON-12     11.6ms ± 0%  12.5ms ± 0%  +7.84%  (p=0.000 n=18+17)

Updates #17503.

Change-Id: I1899f8e35662971e24bf692b517dfbe2b533c00c
Reviewed-on: https://go-review.googlesource.com/31572
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile: add a writebarrier phase in SSA</title>
<updated>2016-10-25T21:53:40+00:00</updated>
<author>
<name>Cherry Zhang</name>
<email>cherryyz@google.com</email>
</author>
<published>2016-10-13T10:57:00+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=f6aec889e1c880316b1989bdc6ce3b926cbe5fe4'/>
<id>f6aec889e1c880316b1989bdc6ce3b926cbe5fe4</id>
<content type='text'>
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This may have false
positives which result in write barriers for stack writes.

A new phase, writebarrier, is added to the SSA backend, to delay
the decision and eliminate false positives. The frontend still
makes conservative decisions. When building SSA, instead of
emitting runtime calls directly, it emits WB ops (StoreWB,
MoveWB, etc.), which will be expanded to branches and runtime
calls in writebarrier phase. Writes to static locations on stack
are detected and write barriers are removed.

All write barriers of stack writes found by the script from
issue #17330 are eliminated (except two false positives).

Fixes #17330.

Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8
Reviewed-on: https://go-review.googlesource.com/31131
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This may have false
positives which result in write barriers for stack writes.

A new phase, writebarrier, is added to the SSA backend, to delay
the decision and eliminate false positives. The frontend still
makes conservative decisions. When building SSA, instead of
emitting runtime calls directly, it emits WB ops (StoreWB,
MoveWB, etc.), which will be expanded to branches and runtime
calls in writebarrier phase. Writes to static locations on stack
are detected and write barriers are removed.

All write barriers of stack writes found by the script from
issue #17330 are eliminated (except two false positives).

Fixes #17330.

Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8
Reviewed-on: https://go-review.googlesource.com/31131
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>all: make copyright headers consistent with one space after period</title>
<updated>2016-05-02T13:43:18+00:00</updated>
<author>
<name>Emmanuel Odeke</name>
<email>emm.odeke@gmail.com</email>
</author>
<published>2016-04-10T21:32:26+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=53fd522c0db58f3bd75d85295f46bb06e8ab1a9b'/>
<id>53fd522c0db58f3bd75d85295f46bb06e8ab1a9b</id>
<content type='text'>
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor &lt;iant@golang.org&gt;
Run-TryBot: Ian Lance Taylor &lt;iant@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor &lt;iant@golang.org&gt;
Run-TryBot: Ian Lance Taylor &lt;iant@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>runtime: don't rescan globals</title>
<updated>2016-04-27T18:48:16+00:00</updated>
<author>
<name>Austin Clements</name>
<email>austin@google.com</email>
</author>
<published>2016-03-18T15:27:59+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=b49b71ae192c72faf699edd321ff0637f90e794c'/>
<id>b49b71ae192c72faf699edd321ff0637f90e794c</id>
<content type='text'>
Currently the runtime rescans globals during mark 2 and mark
termination. This costs as much as 500µs/MB in STW time, which is
enough to surpass the 10ms STW limit with only 20MB of globals.

It's also basically unnecessary. The compiler already generates write
barriers for global -&gt; heap pointer updates and the regular write
barrier doesn't check whether the slot is a global or in the heap.
Some less common write barriers do cause problems.
heapBitsBulkBarrier, which is used by typedmemmove and related
functions, currently depends on having access to the pointer bitmap
and as a result ignores writes to globals. Likewise, the
reflect-related write barriers reflect_typedmemmovepartial and
callwritebarrier ignore non-heap destinations; though it appears they
can never be called with global pointers anyway.

This commit makes heapBitsBulkBarrier issue write barriers for writes
to global pointers using the data and BSS pointer bitmaps, removes the
inheap checks from the reflection write barriers, and eliminates the
rescans during mark 2 and mark termination. It also adds a test that
writes to globals have write barriers.

Programs with large data+BSS segments (with pointers) aren't common,
but for programs that do have large data+BSS segments, this
significantly reduces pause time:

name \ 95%ile-time/markTerm              old         new  delta
LargeBSS/bss:1GB/gomaxprocs:4  148200µs ± 6%  302µs ±52%  -99.80% (p=0.008 n=5+5)

This very slightly improves the go1 benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.62s ± 3%     2.62s ± 4%    ~     (p=0.904 n=20+20)
Fannkuch11-12                2.15s ± 1%     2.13s ± 0%  -1.29%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          48.3ns ± 2%    47.6ns ± 1%  -1.52%  (p=0.000 n=20+16)
FmtFprintfString-12          152ns ± 0%     152ns ± 1%    ~     (p=0.725 n=18+18)
FmtFprintfInt-12             150ns ± 1%     149ns ± 1%  -1.14%  (p=0.000 n=19+20)
FmtFprintfIntInt-12          250ns ± 0%     244ns ± 1%  -2.12%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12     219ns ± 1%     217ns ± 1%  -1.20%  (p=0.000 n=19+20)
FmtFprintfFloat-12           280ns ± 0%     281ns ± 1%  +0.47%  (p=0.000 n=19+19)
FmtManyArgs-12               928ns ± 0%     923ns ± 1%  -0.53%  (p=0.000 n=19+18)
GobDecode-12                7.21ms ± 1%    7.24ms ± 2%    ~     (p=0.091 n=19+19)
GobEncode-12                6.07ms ± 1%    6.05ms ± 1%  -0.36%  (p=0.002 n=20+17)
Gzip-12                      265ms ± 1%     265ms ± 1%    ~     (p=0.496 n=20+19)
Gunzip-12                   39.6ms ± 1%    39.3ms ± 1%  -0.85%  (p=0.000 n=19+19)
HTTPClientServer-12         74.0µs ± 2%    73.8µs ± 1%    ~     (p=0.569 n=20+19)
JSONEncode-12               15.4ms ± 1%    15.3ms ± 1%  -0.25%  (p=0.049 n=17+17)
JSONDecode-12               53.7ms ± 2%    53.0ms ± 1%  -1.29%  (p=0.000 n=18+17)
Mandelbrot200-12            3.97ms ± 1%    3.97ms ± 0%    ~     (p=0.072 n=17+18)
GoParse-12                  3.35ms ± 2%    3.36ms ± 1%  +0.51%  (p=0.005 n=18+20)
RegexpMatchEasy0_32-12      72.7ns ± 2%    72.2ns ± 1%  -0.70%  (p=0.005 n=19+19)
RegexpMatchEasy0_1K-12       246ns ± 1%     245ns ± 0%  -0.60%  (p=0.000 n=18+16)
RegexpMatchEasy1_32-12      72.8ns ± 1%    72.5ns ± 1%  -0.37%  (p=0.011 n=18+18)
RegexpMatchEasy1_1K-12       380ns ± 1%     385ns ± 1%  +1.34%  (p=0.000 n=20+19)
RegexpMatchMedium_32-12      115ns ± 2%     115ns ± 1%  +0.44%  (p=0.047 n=20+20)
RegexpMatchMedium_1K-12     35.4µs ± 1%    35.5µs ± 1%    ~     (p=0.079 n=18+19)
RegexpMatchHard_32-12       1.83µs ± 0%    1.80µs ± 1%  -1.76%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       55.1µs ± 0%    54.3µs ± 1%  -1.42%  (p=0.000 n=18+19)
Revcomp-12                   386ms ± 1%     381ms ± 1%  -1.14%  (p=0.000 n=18+18)
Template-12                 61.5ms ± 2%    61.5ms ± 2%    ~     (p=0.647 n=19+20)
TimeParse-12                 338ns ± 0%     336ns ± 1%  -0.72%  (p=0.000 n=14+19)
TimeFormat-12                350ns ± 0%     357ns ± 0%  +2.05%  (p=0.000 n=19+18)
[Geo mean]                  55.3µs         55.0µs       -0.41%

Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
Reviewed-on: https://go-review.googlesource.com/20829
Run-TryBot: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Rick Hudson &lt;rlh@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently the runtime rescans globals during mark 2 and mark
termination. This costs as much as 500µs/MB in STW time, which is
enough to surpass the 10ms STW limit with only 20MB of globals.

It's also basically unnecessary. The compiler already generates write
barriers for global -&gt; heap pointer updates and the regular write
barrier doesn't check whether the slot is a global or in the heap.
Some less common write barriers do cause problems.
heapBitsBulkBarrier, which is used by typedmemmove and related
functions, currently depends on having access to the pointer bitmap
and as a result ignores writes to globals. Likewise, the
reflect-related write barriers reflect_typedmemmovepartial and
callwritebarrier ignore non-heap destinations; though it appears they
can never be called with global pointers anyway.

This commit makes heapBitsBulkBarrier issue write barriers for writes
to global pointers using the data and BSS pointer bitmaps, removes the
inheap checks from the reflection write barriers, and eliminates the
rescans during mark 2 and mark termination. It also adds a test that
writes to globals have write barriers.

Programs with large data+BSS segments (with pointers) aren't common,
but for programs that do have large data+BSS segments, this
significantly reduces pause time:

name \ 95%ile-time/markTerm              old         new  delta
LargeBSS/bss:1GB/gomaxprocs:4  148200µs ± 6%  302µs ±52%  -99.80% (p=0.008 n=5+5)

This very slightly improves the go1 benchmarks:

name                      old time/op    new time/op    delta
BinaryTree17-12              2.62s ± 3%     2.62s ± 4%    ~     (p=0.904 n=20+20)
Fannkuch11-12                2.15s ± 1%     2.13s ± 0%  -1.29%  (p=0.000 n=18+20)
FmtFprintfEmpty-12          48.3ns ± 2%    47.6ns ± 1%  -1.52%  (p=0.000 n=20+16)
FmtFprintfString-12          152ns ± 0%     152ns ± 1%    ~     (p=0.725 n=18+18)
FmtFprintfInt-12             150ns ± 1%     149ns ± 1%  -1.14%  (p=0.000 n=19+20)
FmtFprintfIntInt-12          250ns ± 0%     244ns ± 1%  -2.12%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-12     219ns ± 1%     217ns ± 1%  -1.20%  (p=0.000 n=19+20)
FmtFprintfFloat-12           280ns ± 0%     281ns ± 1%  +0.47%  (p=0.000 n=19+19)
FmtManyArgs-12               928ns ± 0%     923ns ± 1%  -0.53%  (p=0.000 n=19+18)
GobDecode-12                7.21ms ± 1%    7.24ms ± 2%    ~     (p=0.091 n=19+19)
GobEncode-12                6.07ms ± 1%    6.05ms ± 1%  -0.36%  (p=0.002 n=20+17)
Gzip-12                      265ms ± 1%     265ms ± 1%    ~     (p=0.496 n=20+19)
Gunzip-12                   39.6ms ± 1%    39.3ms ± 1%  -0.85%  (p=0.000 n=19+19)
HTTPClientServer-12         74.0µs ± 2%    73.8µs ± 1%    ~     (p=0.569 n=20+19)
JSONEncode-12               15.4ms ± 1%    15.3ms ± 1%  -0.25%  (p=0.049 n=17+17)
JSONDecode-12               53.7ms ± 2%    53.0ms ± 1%  -1.29%  (p=0.000 n=18+17)
Mandelbrot200-12            3.97ms ± 1%    3.97ms ± 0%    ~     (p=0.072 n=17+18)
GoParse-12                  3.35ms ± 2%    3.36ms ± 1%  +0.51%  (p=0.005 n=18+20)
RegexpMatchEasy0_32-12      72.7ns ± 2%    72.2ns ± 1%  -0.70%  (p=0.005 n=19+19)
RegexpMatchEasy0_1K-12       246ns ± 1%     245ns ± 0%  -0.60%  (p=0.000 n=18+16)
RegexpMatchEasy1_32-12      72.8ns ± 1%    72.5ns ± 1%  -0.37%  (p=0.011 n=18+18)
RegexpMatchEasy1_1K-12       380ns ± 1%     385ns ± 1%  +1.34%  (p=0.000 n=20+19)
RegexpMatchMedium_32-12      115ns ± 2%     115ns ± 1%  +0.44%  (p=0.047 n=20+20)
RegexpMatchMedium_1K-12     35.4µs ± 1%    35.5µs ± 1%    ~     (p=0.079 n=18+19)
RegexpMatchHard_32-12       1.83µs ± 0%    1.80µs ± 1%  -1.76%  (p=0.000 n=18+18)
RegexpMatchHard_1K-12       55.1µs ± 0%    54.3µs ± 1%  -1.42%  (p=0.000 n=18+19)
Revcomp-12                   386ms ± 1%     381ms ± 1%  -1.14%  (p=0.000 n=18+18)
Template-12                 61.5ms ± 2%    61.5ms ± 2%    ~     (p=0.647 n=19+20)
TimeParse-12                 338ns ± 0%     336ns ± 1%  -0.72%  (p=0.000 n=14+19)
TimeFormat-12                350ns ± 0%     357ns ± 0%  +2.05%  (p=0.000 n=19+18)
[Geo mean]                  55.3µs         55.0µs       -0.41%

Change-Id: I57e8720385a1b991aeebd111b6874354308e2a6b
Reviewed-on: https://go-review.googlesource.com/20829
Run-TryBot: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Rick Hudson &lt;rlh@golang.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
