<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/go-git.git/src/runtime/funcdata.h, branch dev.boringcrypto</title>
<subtitle>github.com: golang/go
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/'/>
<entry>
<title>cmd/compile, runtime: use unwrapped PC for goroutine creation tracing</title>
<updated>2022-02-11T20:01:24+00:00</updated>
<author>
<name>Cherry Mui</name>
<email>cherryyz@google.com</email>
</author>
<published>2022-02-07T17:00:44+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=bcee121ae4f67281450280c72399890a3c7a7d5b'/>
<id>bcee121ae4f67281450280c72399890a3c7a7d5b</id>
<content type='text'>
With the switch to the register ABI, we now generate wrapper
functions for go statements in many cases. A new goroutine's start
PC now points to the wrapper function. This does not affect
execution, but the runtime tracer uses the start PC and the
function name as the name/label of that goroutine. If the start
function is a named function, using the name of the wrapper loses
that information. Furthur, the tracer's goroutine view groups
goroutines by start PC. For multiple go statements with the same
callee, they are grouped together. With the wrappers, which is
context-dependent as it is a closure, they are no longer grouped.

This CL fixes the problem by providing the underlying unwrapped
PC for tracing. The compiler emits metadata to link the unwrapped
PC to the wrapper function. And the runtime reads that metadata
and record that unwrapped PC for tracing.

(This doesn't work for shared buildmode. Unfortunate.)

TODO: is there a way to test?

Fixes #50622.

Change-Id: Iaa20e1b544111c0255eb0fc04427aab7a5e3b877
Reviewed-on: https://go-review.googlesource.com/c/go/+/384158
Trust: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Than McIntosh &lt;thanm@google.com&gt;
Run-TryBot: Cherry Mui &lt;cherryyz@google.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the switch to the register ABI, we now generate wrapper
functions for go statements in many cases. A new goroutine's start
PC now points to the wrapper function. This does not affect
execution, but the runtime tracer uses the start PC and the
function name as the name/label of that goroutine. If the start
function is a named function, using the name of the wrapper loses
that information. Furthur, the tracer's goroutine view groups
goroutines by start PC. For multiple go statements with the same
callee, they are grouped together. With the wrappers, which is
context-dependent as it is a closure, they are no longer grouped.

This CL fixes the problem by providing the underlying unwrapped
PC for tracing. The compiler emits metadata to link the unwrapped
PC to the wrapper function. And the runtime reads that metadata
and record that unwrapped PC for tracing.

(This doesn't work for shared buildmode. Unfortunate.)

TODO: is there a way to test?

Fixes #50622.

Change-Id: Iaa20e1b544111c0255eb0fc04427aab7a5e3b877
Reviewed-on: https://go-review.googlesource.com/c/go/+/384158
Trust: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Than McIntosh &lt;thanm@google.com&gt;
Run-TryBot: Cherry Mui &lt;cherryyz@google.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile, runtime: track argument stack slot liveness</title>
<updated>2021-10-27T20:27:02+00:00</updated>
<author>
<name>Cherry Mui</name>
<email>cherryyz@google.com</email>
</author>
<published>2021-09-24T20:46:05+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=30a82efcf403fed76bf1542e9477047660d5f54d'/>
<id>30a82efcf403fed76bf1542e9477047660d5f54d</id>
<content type='text'>
Currently, for stack traces (e.g. at panic or when runtime.Stack
is called), we print argument values from the stack. With register
ABI, we may never store the argument to stack therefore the
argument value on stack may be meaningless. This causes confusion.

This CL makes the compiler keep trace of which argument stack
slots are meaningful. If it is meaningful, it will be printed in
stack traces as before. If it may not be meaningful, it will be
printed as the stack value with a question mark ("?"). In general,
the value could be meaningful on some code paths but not others
depending on the execution, and the compiler couldn't know
statically, so we still print the stack value, instead of not
printing it at all. Also note that if the argument variable is
updated in the function body the printed value may be stale (like
before register ABI) but still considered meaningful.

Arguments passed on stack are always meaningful therefore always
printed without a question mark. Results are never printed, as
before.

(Due to a bug in the compiler we sometimes don't spill args into
their dedicated spill slots (as we should), causing it having
fewer meaningful values than it should be.)

This increases binary sizes a bit:
            old       new
hello      1129760   1142080  +1.09%
cmd/go    13932320  14088016  +1.12%
cmd/link   6267696   6329168  +0.98%

Fixes #45728.

Change-Id: I308a0402e5c5ab94ca0953f8bd85a56acd28f58c
Reviewed-on: https://go-review.googlesource.com/c/go/+/352057
Trust: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, for stack traces (e.g. at panic or when runtime.Stack
is called), we print argument values from the stack. With register
ABI, we may never store the argument to stack therefore the
argument value on stack may be meaningless. This causes confusion.

This CL makes the compiler keep trace of which argument stack
slots are meaningful. If it is meaningful, it will be printed in
stack traces as before. If it may not be meaningful, it will be
printed as the stack value with a question mark ("?"). In general,
the value could be meaningful on some code paths but not others
depending on the execution, and the compiler couldn't know
statically, so we still print the stack value, instead of not
printing it at all. Also note that if the argument variable is
updated in the function body the printed value may be stale (like
before register ABI) but still considered meaningful.

Arguments passed on stack are always meaningful therefore always
printed without a question mark. Results are never printed, as
before.

(Due to a bug in the compiler we sometimes don't spill args into
their dedicated spill slots (as we should), causing it having
fewer meaningful values than it should be.)

This increases binary sizes a bit:
            old       new
hello      1129760   1142080  +1.09%
cmd/go    13932320  14088016  +1.12%
cmd/link   6267696   6329168  +0.98%

Fixes #45728.

Change-Id: I308a0402e5c5ab94ca0953f8bd85a56acd28f58c
Reviewed-on: https://go-review.googlesource.com/c/go/+/352057
Trust: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd, runtime: eliminate runtime.no_pointers_stackmap</title>
<updated>2021-10-04T22:45:17+00:00</updated>
<author>
<name>Josh Bleecher Snyder</name>
<email>josharian@gmail.com</email>
</author>
<published>2021-10-01T23:25:32+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=c2483a5c034152fcdfbb2e6dbcf48b0103d8db6a'/>
<id>c2483a5c034152fcdfbb2e6dbcf48b0103d8db6a</id>
<content type='text'>
runtime.no_pointers_stackmap is an odd beast.
It is defined in a Go file, populated by assembly,
used by the GC, and its address is magic used
by async pre-emption to ascertain whether a
routine was implemented in assembly.

A subsequent change will force all GC data into the go.func.* linker symbol.
runtime.no_pointers_stackmap is GC data, so it must go there.
Yet it also needs to go into rodata, for the runtime address trick.

This change eliminates it entirely.

Replace the runtime address check with the newly introduced asm funcflag.

Handle the assembly macro as magic, similarly to our handling of go_args_stackmap.
This allows the no_pointers_stackmap to be identical in all ways
to other gclocals stackmaps, including content-addressability.

Change-Id: Id2f20a262cfab0719beb88e6342984ec4b196268
Reviewed-on: https://go-review.googlesource.com/c/go/+/353672
Trust: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
runtime.no_pointers_stackmap is an odd beast.
It is defined in a Go file, populated by assembly,
used by the GC, and its address is magic used
by async pre-emption to ascertain whether a
routine was implemented in assembly.

A subsequent change will force all GC data into the go.func.* linker symbol.
runtime.no_pointers_stackmap is GC data, so it must go there.
Yet it also needs to go into rodata, for the runtime address trick.

This change eliminates it entirely.

Replace the runtime address check with the newly introduced asm funcflag.

Handle the assembly macro as magic, similarly to our handling of go_args_stackmap.
This allows the no_pointers_stackmap to be identical in all ways
to other gclocals stackmaps, including content-addressability.

Change-Id: Id2f20a262cfab0719beb88e6342984ec4b196268
Reviewed-on: https://go-review.googlesource.com/c/go/+/353672
Trust: Josh Bleecher Snyder &lt;josharian@gmail.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile, runtime: add metadata for argument printing in traceback</title>
<updated>2021-04-22T17:47:59+00:00</updated>
<author>
<name>Cherry Zhang</name>
<email>cherryyz@google.com</email>
</author>
<published>2021-01-15T22:58:41+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=537cde0b4b411f1dc3016cac430b9494cf91caf0'/>
<id>537cde0b4b411f1dc3016cac430b9494cf91caf0</id>
<content type='text'>
Currently, when the runtime printing a stack track (at panic, or
when runtime.Stack is called), it prints the function arguments
as words in memory. With a register-based calling convention,
the layout of argument area of the memory changes, so the
printing also needs to change. In particular, the memory order
and the syntax order of the arguments may differ. To address
that, this CL lets the compiler to emit some metadata about the
memory layout of the arguments, and the runtime will use this
information to print arguments in syntax order.

Previously we print the memory contents of the results along with
the arguments. The results are likely uninitialized when the
traceback is taken, so that information is rarely useful. Also,
with a register-based calling convention the results may not
have corresponding locations in memory. This CL changes it to not
print results.

Previously the runtime simply prints the memory contents as
pointer-sized words. With a register-based calling convention,
as the layout changes, arguments that were packed in one word
may no longer be in one word. Also, as the spill slots are not
always initialized, it is possible that some part of a word
contains useful informationwhile the rest contains garbage.
Instead of letting the runtime recreating the ABI0 layout and
print them as words, we now print each component separately.
Aggregate-typed argument/component is surrounded by "{}".

For example, for a function

F(int, [3]byte, byte) int

when called as F(1, [3]byte{2, 3, 4}, 5), it used to print

F(0x1, 0x5040302, 0xXXXXXXXX) // assuming little endian, 0xXXXXXXXX is uninitilized result

Now prints

F(0x1, {0x2, 0x3, 0x4}, 0x5).

Note: the liveness tracking of the spill splots has not been
implemented in this CL. Currently the runtime just assumes all
the slots are live and print them all.

Increase binary sizes by ~1.5%.

                     old          new
hello (println)    1171328      1187712 (+1.4%)
hello (fmt)        1877024      1901600 (+1.3%)
cmd/compile       22326928     22662800 (+1.5%)
cmd/go            13505024     13726208 (+1.6%)

Updates #40724.

Change-Id: I351e0bf497f99bdbb3f91df2fb17e3c2c5c316dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/304470
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Run-TryBot: Cherry Zhang &lt;cherryyz@google.com&gt;
TryBot-Result: Go Bot &lt;gobot@golang.org&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, when the runtime printing a stack track (at panic, or
when runtime.Stack is called), it prints the function arguments
as words in memory. With a register-based calling convention,
the layout of argument area of the memory changes, so the
printing also needs to change. In particular, the memory order
and the syntax order of the arguments may differ. To address
that, this CL lets the compiler to emit some metadata about the
memory layout of the arguments, and the runtime will use this
information to print arguments in syntax order.

Previously we print the memory contents of the results along with
the arguments. The results are likely uninitialized when the
traceback is taken, so that information is rarely useful. Also,
with a register-based calling convention the results may not
have corresponding locations in memory. This CL changes it to not
print results.

Previously the runtime simply prints the memory contents as
pointer-sized words. With a register-based calling convention,
as the layout changes, arguments that were packed in one word
may no longer be in one word. Also, as the spill slots are not
always initialized, it is possible that some part of a word
contains useful informationwhile the rest contains garbage.
Instead of letting the runtime recreating the ABI0 layout and
print them as words, we now print each component separately.
Aggregate-typed argument/component is surrounded by "{}".

For example, for a function

F(int, [3]byte, byte) int

when called as F(1, [3]byte{2, 3, 4}, 5), it used to print

F(0x1, 0x5040302, 0xXXXXXXXX) // assuming little endian, 0xXXXXXXXX is uninitilized result

Now prints

F(0x1, {0x2, 0x3, 0x4}, 0x5).

Note: the liveness tracking of the spill splots has not been
implemented in this CL. Currently the runtime just assumes all
the slots are live and print them all.

Increase binary sizes by ~1.5%.

                     old          new
hello (println)    1171328      1187712 (+1.4%)
hello (fmt)        1877024      1901600 (+1.3%)
cmd/compile       22326928     22662800 (+1.5%)
cmd/go            13505024     13726208 (+1.6%)

Updates #40724.

Change-Id: I351e0bf497f99bdbb3f91df2fb17e3c2c5c316dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/304470
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Run-TryBot: Cherry Zhang &lt;cherryyz@google.com&gt;
TryBot-Result: Go Bot &lt;gobot@golang.org&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/internal/objabi, runtime: compact FUNCDATA indices</title>
<updated>2020-10-30T21:14:09+00:00</updated>
<author>
<name>Cherry Zhang</name>
<email>cherryyz@google.com</email>
</author>
<published>2020-10-27T23:47:29+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=f96b62be2edd8acc08b79777d692937e8ed79b4a'/>
<id>f96b62be2edd8acc08b79777d692937e8ed79b4a</id>
<content type='text'>
As we deleted register maps, move FUNCDATA indices of stack
objects, inline trees, and open-coded defers earlier.

Change-Id: If73797b8c11fd207655c9498802fca9f6f9ac338
Reviewed-on: https://go-review.googlesource.com/c/go/+/265761
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As we deleted register maps, move FUNCDATA indices of stack
objects, inline trees, and open-coded defers earlier.

Change-Id: If73797b8c11fd207655c9498802fca9f6f9ac338
Reviewed-on: https://go-review.googlesource.com/c/go/+/265761
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>runtime: remove go115ReduceLiveness and go115RestartSeq</title>
<updated>2020-10-30T21:13:24+00:00</updated>
<author>
<name>Cherry Zhang</name>
<email>cherryyz@google.com</email>
</author>
<published>2020-10-22T00:43:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=8414b1a5a40e5ef19508e4895b4c12a91fa498e7'/>
<id>8414b1a5a40e5ef19508e4895b4c12a91fa498e7</id>
<content type='text'>
Make them always true. Delete code that are only executed when
they are false.

Change-Id: I6194fa00de23486c2b0a0c9075fe3a09d9c52762
Reviewed-on: https://go-review.googlesource.com/c/go/+/264339
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make them always true. Delete code that are only executed when
they are false.

Change-Id: I6194fa00de23486c2b0a0c9075fe3a09d9c52762
Reviewed-on: https://go-review.googlesource.com/c/go/+/264339
Trust: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>runtime/internal/atomic: drop package prefixes</title>
<updated>2020-10-16T17:31:16+00:00</updated>
<author>
<name>Austin Clements</name>
<email>austin@google.com</email>
</author>
<published>2020-10-15T20:11:10+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=afba990169f41d9026c923da5235584db32cab67'/>
<id>afba990169f41d9026c923da5235584db32cab67</id>
<content type='text'>
This drops package prefixes from the assembly code on 386 and arm. In
addition to just being nicer, this allows the assembler to
automatically pick up the argument stack map from the Go signatures of
these functions. This doesn't matter right now because these functions
never call back out to Go, but prepares us for the next CL.

Change-Id: I90fed7d4dd63ad49274529c62804211b6390e2e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/262777
Trust: Austin Clements &lt;austin@google.com&gt;
Run-TryBot: Austin Clements &lt;austin@google.com&gt;
TryBot-Result: Go Bot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This drops package prefixes from the assembly code on 386 and arm. In
addition to just being nicer, this allows the assembler to
automatically pick up the argument stack map from the Go signatures of
these functions. This doesn't matter right now because these functions
never call back out to Go, but prepares us for the next CL.

Change-Id: I90fed7d4dd63ad49274529c62804211b6390e2e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/262777
Trust: Austin Clements &lt;austin@google.com&gt;
Run-TryBot: Austin Clements &lt;austin@google.com&gt;
TryBot-Result: Go Bot &lt;gobot@golang.org&gt;
Reviewed-by: Cherry Zhang &lt;cherryyz@google.com&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata</title>
<updated>2019-10-24T13:54:11+00:00</updated>
<author>
<name>Dan Scales</name>
<email>danscales@google.com</email>
</author>
<published>2019-06-24T19:59:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=be64a19d99918c843f8555aad580221207ea35bc'/>
<id>be64a19d99918c843f8555aad580221207ea35bc</id>
<content type='text'>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go binary without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"</title>
<updated>2019-10-16T20:59:53+00:00</updated>
<author>
<name>Bryan C. Mills</name>
<email>bcmills@google.com</email>
</author>
<published>2019-10-16T20:41:53+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=b76e6f88251712a21071d4ea11573bd8cdfa21de'/>
<id>b76e6f88251712a21071d4ea11573bd8cdfa21de</id>
<content type='text'>
This reverts CL 190098.

Reason for revert: broke several builders.

Change-Id: I69161352f9ded02537d8815f259c4d391edd9220
Reviewed-on: https://go-review.googlesource.com/c/go/+/201519
Run-TryBot: Bryan C. Mills &lt;bcmills@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Dan Scales &lt;danscales@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts CL 190098.

Reason for revert: broke several builders.

Change-Id: I69161352f9ded02537d8815f259c4d391edd9220
Reviewed-on: https://go-review.googlesource.com/c/go/+/201519
Run-TryBot: Bryan C. Mills &lt;bcmills@google.com&gt;
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
Reviewed-by: Dan Scales &lt;danscales@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata</title>
<updated>2019-10-16T18:27:16+00:00</updated>
<author>
<name>Dan Scales</name>
<email>danscales@google.com</email>
</author>
<published>2019-06-24T19:59:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/go-git.git/commit/?id=dad616375f054d0644e006d7794724186d7a6720'/>
<id>dad616375f054d0644e006d7794724186d7a6720</id>
<content type='text'>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).

When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.

In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).

I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().

The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).

Cost of defer statement  [ go test -run NONE -bench BenchmarkDefer$ runtime ]
  With normal (stack-allocated) defers only:         35.4  ns/op
  With open-coded defers:                             5.6  ns/op
  Cost of function call alone (remove defer keyword): 4.4  ns/op

Text size increase (including funcdata) for go cmd without/with open-coded defers:  0.09%

The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.

The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:

Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
  Without open-coded defers:        62.0 ns/op
  With open-coded defers:           255  ns/op

A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:

CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
  Without open-coded defers:        443 ns/op
  With open-coded defers:           347 ns/op

Updates #14939 (defer performance)
Updates #34481 (design doc)

Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements &lt;austin@google.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
