summaryrefslogtreecommitdiff
path: root/src/amd/compiler/aco_insert_exec_mask.cpp
Commit message (Collapse)AuthorAgeFilesLines
* aco: Don't use nir_selection_control in aco_ir.Timur Kristóf2023-04-101-4/+4
| | | | | | | | | We don't want to rely on any NIR structures in ACO, because we would like to avoid the need to include nir.h in aco_ir. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241>
* aco/insert_exec_mask: allow for disconnected CFGDaniel Schürmann2023-03-121-8/+7
| | | | Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20853>
* aco: Enable constant exec mask based optimization on compute shaders.Timur Kristóf2023-01-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | We know for sure exec is initially -1 when the shader always has full subgroups. Fossil DB stats on GFX11: Totals from 3884 (2.88% of 134913) affected shaders: SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11% SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39% CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04% Scratch: 217088 -> 216832 (-0.12%) Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03% Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03% InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07% VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01% SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04% Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48% Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03% PreSGPRs: 155637 -> 149491 (-3.95%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
* aco: allow Builder::Result to be dereferencedRhys Perry2023-01-101-2/+2
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
* aco: add p_dual_src_export_gfx11 for dual source blending on GFX11Samuel Pitoiset2022-11-161-1/+2
| | | | | | | | | Dual source blending must be in strict WQM mode. Cc: 22.3 mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643>
* aco: Allow explicitly removing jumps on GFX10+ when beneficial.Timur Kristóf2022-10-111-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Removing jumps" in ACO means skipping the jump instruction at the beginning of a divergent branch (but still modify exec). ACO already supports implicitly removing jumps when it decides that executing a branch with empty exec mask is more beneficial than a jump. This commit adds the possibility to use this explicitly through nir_selection_control. ACO will respect this setting and remove the branch instructions when this is specified, unless it decides that this would cause bugs (eg. exp instruction). There are two cases that benefit from the new change: 1. When the application requests to "flatten" a branch (ie. remove control flow), we now respect that. 2. When the compiler stack determines that a divergent branch is always taken. v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks Fossil DB stats on Navi 21: Totals from 13 (0.01% of 134906) affected shaders: CodeSize: 136616 -> 136496 (-0.09%) Instrs: 26196 -> 26166 (-0.11%) Latency: 417928 -> 417889 (-0.01%) Branches: 1241 -> 1211 (-2.42%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
* aco: requires Exact for p_jump_to_epilogSamuel Pitoiset2022-07-191-1/+5
| | | | | | | | | | | | Otherwise, in presence of p_exit_early_if the main FS will always jump to the PS epilog regardless the exact mask. This fixes dEQP-VK.draw.renderpass.shader_invocation.helper_invocation and few vkd3d-proton regressions when PS epilogs are forced. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17617>
* aco: fix assertion in insert_exec_maskDaniel Schürmann2022-07-191-1/+2
| | | | | | | | The exec mask might also be of type mask_type_loop. Fixes: d068eb53e84ca1e44ad96c31dab63476880b3c72 ('aco/insert_exec_mask: optimize top-level transition to exact before demote') Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17402>
* aco: Avoid live-range splits in Exact modeDaniel Schürmann2022-07-191-1/+30
| | | | | | | | | | | | | Because the data register of atomic VMEM instructions is shared between src and dst, it might be necessary to create live-range splits during RA. Make the live-range splits explicit in WQM mode. Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3) Latency: 17209 -> 17210 (+0.01%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347>
* aco: initialize scratch base registers on GFX9-GFX10.3Rhys Perry2022-07-081-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | fossil-db (navi21): Totals from 1142 (0.70% of 162293) affected shaders: Instrs: 271636 -> 271974 (+0.12%) CodeSize: 1532020 -> 1533792 (+0.12%) Latency: 7484066 -> 7485698 (+0.02%) InvThroughput: 4048824 -> 4049579 (+0.02%) SClause: 4171 -> 4212 (+0.98%) PreSGPRs: 11203 -> 12276 (+9.58%) fossil-db (vega10): Totals from 3327 (2.06% of 161355) affected shaders: Instrs: 257413 -> 257601 (+0.07%) CodeSize: 1424244 -> 1425372 (+0.08%) Latency: 8598402 -> 8600466 (+0.02%) InvThroughput: 7906335 -> 7908234 (+0.02%) SClause: 4932 -> 4973 (+0.83%) PreSGPRs: 22010 -> 25405 (+15.42%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17079>
* aco/insert_exec_mask: optimize top-level transition to exact before demoteRhys Perry2022-03-081-3/+10
| | | | | | | | | | | | | | fossil-db (Sienna Cichlid): Totals from 5767 (3.55% of 162293) affected shaders: Instrs: 3264949 -> 3257527 (-0.23%); split: -0.23%, +0.00% CodeSize: 17835692 -> 17806004 (-0.17%); split: -0.17%, +0.00% Latency: 45990060 -> 45987924 (-0.00%); split: -0.00%, +0.00% InvThroughput: 7643850 -> 7643835 (-0.00%); split: -0.00%, +0.00% Copies: 193641 -> 186219 (-3.83%); split: -3.84%, +0.01% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
* aco/insert_exec_mask: use get_exec_opRhys Perry2022-03-081-5/+4
| | | | | | | | No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
* aco/insert_exec_mask: fix top-level to-exact with non-global exact maskRhys Perry2022-03-081-4/+6
| | | | | | | | | After transitioning to exact after a discard, the exec stack might be: [exact|global, wqm, exact] Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
* aco: remove vcc hint from branch definitionsRhys Perry2022-03-031-7/+7
| | | | | | | | | | | | | | | | This doesn't seem to have much benefit anymore. fossil-db (Sienna Cichlid): Totals from 198 (0.15% of 134913) affected shaders: CodeSize: 2610536 -> 2610872 (+0.01%); split: -0.01%, +0.02% Instrs: 479001 -> 479085 (+0.02%); split: -0.01%, +0.03% Latency: 7310684 -> 7300735 (-0.14%); split: -0.16%, +0.02% InvThroughput: 2439084 -> 2437446 (-0.07%); split: -0.07%, +0.00% SClause: 14760 -> 14722 (-0.26%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
* aco/insert_exec_mask: refactor and remove some unnecessary WQM handling codeDaniel Schürmann2022-02-111-83/+21
| | | | | | | Some cases cannot happen and don't need to be handled anymore. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco/insert_exec_mask: refactor and simplify get_block_needs()Daniel Schürmann2022-02-111-32/+12
| | | | | Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco/insert_exec_mask: remove ever_again_needs and Exact_BranchDaniel Schürmann2022-02-111-35/+14
| | | | | | | This information is not required anymore. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco/insert_exec_mask: remove some unnecessary WQM loop handling codeDaniel Schürmann2022-02-111-102/+4
| | | | | | | | These workarounds are were necessary to prevent infinite loops with helper lane registers containing wrong data. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco/insert_exec_mask: remove Preserve_WQM flagDaniel Schürmann2022-02-111-26/+6
| | | | | | | | If WQM is needed anywhere after discard_if(), it will also be flagged as WQM. We can rely on that to preserve the WQM mask. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco: don't propagate WQM for p_as_uniformDaniel Schürmann2022-02-111-2/+1
| | | | | | | | | | | | | | | This was needed, so that in case of active helper lanes, these contain the correct value. It is now handled implicitly. Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3) CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00% Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00% Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00% InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 47498 -> 47273 (-0.47%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco/insert_exec_mask: stay in WQM while helper lanes are still neededDaniel Schürmann2022-02-111-28/+13
| | | | | | | | | | | | | | | | | | | | | | | | | This patch flags all instructions WQM which don't require Exact mode, but depend on the exec mask as long as WQM is needed on any control flow path afterwards. This will mostly prevent accidental copies of WQM values within Exact mode, and also makes a lot of other workarounds unnecessary. Totals from 17374 (12.88% of 134913) affected shaders: (GFX10.3) VGPRs: 526952 -> 527384 (+0.08%); split: -0.01%, +0.09% CodeSize: 33740512 -> 33766636 (+0.08%); split: -0.06%, +0.14% MaxWaves: 488166 -> 488108 (-0.01%); split: +0.00%, -0.02% Instrs: 6254240 -> 6260557 (+0.10%); split: -0.08%, +0.18% Latency: 66497580 -> 66463472 (-0.05%); split: -0.15%, +0.10% InvThroughput: 13265741 -> 13264036 (-0.01%); split: -0.03%, +0.01% VClause: 122962 -> 122975 (+0.01%); split: -0.01%, +0.02% SClause: 334805 -> 334405 (-0.12%); split: -0.51%, +0.39% Copies: 275728 -> 282341 (+2.40%); split: -0.91%, +3.31% Branches: 92546 -> 90990 (-1.68%); split: -1.68%, +0.00% PreSGPRs: 504119 -> 504352 (+0.05%); split: -0.00%, +0.05% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
* aco: optimize discard_if when WQM is not needed afterwardsDaniel Schürmann2022-02-081-14/+24
| | | | | | | | | | | | | | Totals from 11560 (8.57% of 134913) affected shaders: (GFX10.3) CodeSize: 12092560 -> 11997652 (-0.78%) Instrs: 2205325 -> 2181598 (-1.08%) Latency: 15376048 -> 15356958 (-0.12%); split: -0.12%, +0.00% InvThroughput: 3526105 -> 3525120 (-0.03%); split: -0.03%, +0.00% Copies: 98543 -> 87601 (-11.10%) Branches: 16919 -> 16873 (-0.27%) PreSGPRs: 291584 -> 291532 (-0.02%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
* aco: merge block_kind_uses_[demote|discard_if]Daniel Schürmann2022-02-081-4/+3
| | | | | | | | These serve the same purpose. The new name is block_kind_uses_discard. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
* aco: make Preserve_WQM independent from block_kind_uses_discard_ifDaniel Schürmann2022-02-081-5/+4
| | | | | Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
* aco: remove block_kind_discardDaniel Schürmann2022-02-081-40/+3
| | | | | | | | | | This case doesn't seem to happen in practice. No need to micro-optimize it. This patch merges instruction selection for discard/discard_if. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
* aco: emit nir_intrinsic_discard() as p_discard_if()Daniel Schürmann2022-02-081-9/+18
| | | | | | | | This simplifies the code and emits a slightly better sequence in some cases. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
* aco: Allow elect to take advantage of knowing when all lanes are active.Timur Kristóf2021-07-161-0/+14
| | | | | | | | | | | | | | | | | | | | Implement elect using a pseudo-op which is lowered during the insert_exec_mask pass. This makes it possible to emit a more optimal sequence when the exec mask is constant. Fossil DB results on Sienna Cichlid: Totals from 211 (0.16% of 128647) affected shaders: CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00% Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01% Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00% InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00% Copies: 40058 -> 40162 (+0.26%) Branches: 31209 -> 31211 (+0.01%) PreSGPRs: 9927 -> 10125 (+1.99%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
* aco: Remove use of deprecated Operand constructorsTony Wasserka2021-07-131-6/+6
| | | | | | | | | | | | | | | | | This migration was done with libclang-based automatic tooling, which performed these replacements: * Operand(uint8_t) -> Operand::c8 * Operand(uint16_t) -> Operand::c16 * Operand(uint32_t, false) -> Operand::c32 * Operand(uint32_t, bool) -> Operand::c32_or_c64 * Operand(uint64_t) -> Operand::c64 * Operand(0) -> Operand::zero(num_bytes) Casts that were previously used for constructor selection have automatically been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)). Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
* aco: Format.Daniel Schürmann2021-07-121-98/+129
| | | | | | | Manually adjusted some comments for more intuitive line breaks. Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>
* aco: reorder and cleanup #includesDaniel Schürmann2021-07-121-1/+4
| | | | | Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
* aco: remove condition operand from branch in invert blockDaniel Schürmann2021-05-201-1/+1
| | | | | | | | | As value numbering only handles logical blocks, this could lead to invalid IR until insert_exec_mask(). No fossil-db changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10894>
* aco/insert_exec_mask: Fixed unused variable warning in release build.Timur Kristóf2021-05-201-2/+1
| | | | | | Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10806>
* aco: Don't use s_and_saveexec with branches when exec is constant.Timur Kristóf2021-05-181-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | When exec is constant, we can remember the constant as the old exec, and just copy the condition and use it as the new exec. There is no need to save the constant. Due to using p_parallelcopy which is lowered to s_mov_b64 (or 32), many exec restores now become copies, hence the increase in the copy stats. Fossil DB changes on Sienna Cichlid: Totals from 73969 (49.37% of 149839) affected shaders: SpillSGPRs: 1768 -> 1610 (-8.94%) CodeSize: 99053892 -> 99047884 (-0.01%); split: -0.02%, +0.01% Instrs: 19372852 -> 19370398 (-0.01%); split: -0.02%, +0.01% VClause: 515154 -> 515142 (-0.00%); split: -0.00%, +0.00% SClause: 719236 -> 718395 (-0.12%); split: -0.14%, +0.02% Copies: 1109770 -> 1254634 (+13.05%); split: -0.07%, +13.12% Branches: 374338 -> 374348 (+0.00%); split: -0.00%, +0.00% PreSGPRs: 1776481 -> 1653761 (-6.91%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
* aco: Remember when exec mask is const, and restore the const then.Timur Kristóf2021-05-181-5/+10
| | | | | | | | | | | | | | | Previously, we would store even the constant -1 exec mask from the beginning of every merged shader. With this change it is no longer necessary because we can restore to constant exec mask directly. Hence, this frees up a register pair (single register for Wave32) in every merged shader. No Fossil DB changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
* aco: Use Operand instead of Temp for the exec mask stack.Timur Kristóf2021-05-181-31/+31
| | | | | | | | | | | This will enable us to store non-temporary values, such as constant operands there. No Fossil DB changes. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
* aco: ensure loops nested in a WQM loop are in WQMRhys Perry2021-04-081-20/+39
| | | | | | | | | | | | | | | | | | Fixes a potential empty exec mask in this situation: enter_wqm() loop { ... wqm code ... enter_exact() loop { ... no wqm code ... } } Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: f0074a6f053 ("aco: do not flag all blocks WQM to ensure we enter all nested loops in WQM") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4546 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10075>
* aco: Use ASSERTED to avoid unused variable warning.Timur Kristóf2021-03-161-1/+1
| | | | | | Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9632>
* aco: calculate all p_as_uniform and v_readfirstlane_b32 sources in WQMRhys Perry2021-02-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | We should avoid a situation where a v_readfirstlane_b32 is in WQM but it's source is calculated in Exact. Fixes hang when running Assassin's Creed: Valhalla benchmark. fossil-db (GFX10.3): Totals from 1021 (0.70% of 146267) affected shaders: CodeSize: 7835228 -> 7842992 (+0.10%); split: -0.00%, +0.10% Instrs: 1519208 -> 1521149 (+0.13%); split: -0.00%, +0.13% SClause: 78921 -> 78920 (-0.00%) Copies: 44456 -> 45421 (+2.17%); split: -0.05%, +2.22% Branches: 12987 -> 13933 (+7.28%) PreSGPRs: 47599 -> 47813 (+0.45%) Cycles: 10037540 -> 10045304 (+0.08%); split: -0.00%, +0.08% VMEM: 538381 -> 538777 (+0.07%); split: +0.11%, -0.03% SMEM: 84553 -> 84554 (+0.00%); split: +0.01%, -0.01% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9288>
* aco: remove special handling of load_helper_invocationDaniel Schürmann2021-02-171-17/+2
| | | | | | | These should now behave the same as is_helper_invocation. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9058>
* aco: fix assertion in insert_exec_mask passDaniel Schürmann2021-02-151-2/+5
| | | | | | Fixes: a56ddca4e80a6ef7bb0c44edb4e5b6169510aaca ('aco: make all exec accesses non-temporaries ') Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9047>
* aco: fix transition_to_{WQM,Exact} if exec.back() is not in execRhys Perry2021-02-151-13/+22
| | | | | | | | | | | | | | | | | | | This can happen at merge blocks. fossil-db (GFX10.3): Totals from 25229 (17.25% of 146267) affected shaders: CodeSize: 58575920 -> 58571376 (-0.01%); split: -0.01%, +0.00% Instrs: 10979245 -> 10978109 (-0.01%); split: -0.01%, +0.00% SClause: 591817 -> 591816 (-0.00%) Copies: 604987 -> 603851 (-0.19%); split: -0.19%, +0.00% Cycles: 96088796 -> 96084252 (-0.00%); split: -0.00%, +0.00% VMEM: 10470372 -> 10470368 (-0.00%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: a56ddca4e80 ("aco: make all exec accesses non-temporaries") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4299 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9047>
* aco: remove dead code for the handling of exec temporariesDaniel Schürmann2021-02-121-25/+2
| | | | | | | | Totals from 26026 (18.67% of 139391) affected shaders (Navi10): PreSGPRs: 370993 -> 326177 (-12.08%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870>
* aco: make all exec accesses non-temporariesDaniel Schürmann2021-02-121-82/+76
| | | | | | | | | | | | | | | | | | | | | | | So that they are not counted into the register demand. Totals from 107336 (77.00% of 139391) affected shaders (Navi10): VGPRs: 4023452 -> 4023248 (-0.01%); split: -0.01%, +0.01% SpillSGPRs: 14088 -> 12571 (-10.77%); split: -11.03%, +0.26% CodeSize: 266816164 -> 266765528 (-0.02%); split: -0.04%, +0.02% MaxWaves: 1553339 -> 1553374 (+0.00%); split: +0.00%, -0.00% Instrs: 50977701 -> 50973093 (-0.01%); split: -0.02%, +0.01% Cycles: 1733911128 -> 1733605320 (-0.02%); split: -0.05%, +0.03% VMEM: 40867650 -> 40900204 (+0.08%); split: +0.13%, -0.05% SMEM: 6835980 -> 6829073 (-0.10%); split: +0.10%, -0.20% VClause: 1032783 -> 1032788 (+0.00%); split: -0.01%, +0.01% SClause: 2103705 -> 2104115 (+0.02%); split: -0.09%, +0.11% Copies: 3195658 -> 3193656 (-0.06%); split: -0.30%, +0.24% Branches: 1140213 -> 1140120 (-0.01%); split: -0.05%, +0.04% PreSGPRs: 3603785 -> 3437064 (-4.63%); split: -5.13%, +0.50% PreVGPRs: 3321996 -> 3321850 (-0.00%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870>
* aco: don't create unnecessary exec phi on merge blocksDaniel Schürmann2021-02-121-7/+1
| | | | | | | No fossil-db changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870>
* aco: remove loop to flag loop blocks as WQMRhys Perry2021-02-091-24/+1
| | | | | | | | This is no longer necessary. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446>
* aco: rewrite setting of Exact_BranchRhys Perry2021-02-091-11/+65
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446>
* aco: do not flag all blocks WQM to ensure we enter all nested loops in WQMRhys Perry2021-02-091-2/+0
| | | | | | | | | | | | | | | | | | | This should no longer be necessary since the mark_block_wqm() we use to flag break conditions as WQM now adds block to the worklist. With them added to the worklist, get_block_needs() will add WQM to block_needs. Adding WQM to block_needs here without adding the block to the worklist (like we do here) can cause issues because it does not ensure that the predecessors' branches are in WQM (needed for it to be possible to transition to WQM in the block). This happened in an Overwatch shader. No fossil-db changes. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: 661922f6ac9 ("aco: add block to worklist in mark_block_wqm()") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4066 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446>
* aco: return references in instruction cast methodsRhys Perry2021-01-221-9/+9
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
* aco: use format-check methodsRhys Perry2021-01-221-7/+7
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>
* aco: use instruction cast methodsRhys Perry2021-01-221-9/+5
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8595>