summaryrefslogtreecommitdiff
path: root/src/amd/compiler/aco_lower_to_hw_instr.cpp
Commit message (Collapse)AuthorAgeFilesLines
* aco: get scratch addr from symbol for radeonsiQiang Yu2023-04-281-1/+8
| | | | | | Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22727>
* amd: fix typosHarri Nieminen2023-04-131-2/+2
| | | | | | Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22432>
* aco: Don't use nir_selection_control in aco_ir.Timur Kristóf2023-04-101-3/+1
| | | | | | | | | We don't want to rely on any NIR structures in ACO, because we would like to avoid the need to include nir.h in aco_ir. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241>
* aco: Consider p_cbranch_nz as divergent branch too.Timur Kristóf2023-04-031-1/+2
| | | | | | | | A p_cbranch_nz instruction that reads exec is divergent too. Fixes: f030b75b7d2c359b90c18ee4ed83fa05265c12e0 Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21493>
* aco/to_hw_instr: use VOP1 opsel for v_mov_b16Georg Lehmann2023-03-301-12/+4
| | | | | | | | | Foz-DB GFX1100: Totals from 4661 (3.46% of 134864) affected shaders: CodeSize: 36500568 -> 36391704 (-0.30%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22069>
* aco: remove aco::rt_stack variableDaniel Schürmann2023-03-161-1/+1
| | | | | | | Since we initialize scratch in the RT proglog, there is no need for this variable anymore. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21780>
* aco: create hw_init_scratch() function for p_init_scratch loweringDaniel Schürmann2023-03-161-29/+34
| | | | Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21780>
* radv/rt: use terminate() when returning from raygen shadersDaniel Schürmann2023-03-081-2/+3
| | | | | | | | | | | | | | | Q2RTX stats: Totals from 7 (0.01% of 134913) affected shaders: CodeSize: 204712 -> 204744 (+0.02%); split: -0.06%, +0.07% Instrs: 37526 -> 37522 (-0.01%); split: -0.07%, +0.06% Latency: 950563 -> 956024 (+0.57%) InvThroughput: 187915 -> 188977 (+0.57%) Copies: 4829 -> 4763 (-1.37%) Branches: 1570 -> 1583 (+0.83%) PreSGPRs: 407 -> 400 (-1.72%) PreVGPRs: 614 -> 617 (+0.49%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21736>
* aco: remove VOP[123C]P? structsGeorg Lehmann2023-03-071-11/+11
| | | | | | Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
* aco: treat VINTERP_INREG as VALUGeorg Lehmann2023-03-071-1/+1
| | | | | | | | | It's just v_fma with fixed DPP8 and builtin s_waitcnt_expcnt, so it can mostly be handled as a pure VALU instruction. Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21023>
* radv: unconditionally enable scratch for RT shadersDaniel Schürmann2023-02-161-1/+1
| | | | Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21159>
* aco: don't modify exec in p_interp_gfx11Rhys Perry2023-02-081-8/+0
| | | | | | | | | | | | | | | | | | | The RDNA3 ISA docs say that lds_param_load write the entire quad regardless of exec, so this isn't needed. fossil-db (gfx1100): Totals from 5291 (3.93% of 134574) affected shaders: Instrs: 4891396 -> 4789628 (-2.08%) CodeSize: 25519032 -> 25111960 (-1.60%) Latency: 36122982 -> 36074300 (-0.13%); split: -0.14%, +0.00% InvThroughput: 4162436 -> 4161424 (-0.02%); split: -0.02%, +0.00% Copies: 263862 -> 263838 (-0.01%) PreSGPRs: 225012 -> 224179 (-0.37%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21171>
* aco: use s_pack_ll_b32_b16 for constant copiesGeorg Lehmann2023-02-011-2/+10
| | | | | | | | Totals from 2 (0.00% of 134913) affected shaders: CodeSize: 28636 -> 28628 (-0.03%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20970>
* aco: use s_bfm_64 for constant copiesGeorg Lehmann2023-02-011-0/+9
| | | | | | | | | Foz-DB Navi21: Totals from 1025 (0.76% of 134913) affected shaders: CodeSize: 1436752 -> 1432412 (-0.30%) Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20970>
* aco: allow Builder::Result to be dereferencedRhys Perry2023-01-101-6/+5
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
* aco/gfx11: fix discard early exit removal optimizationRhys Perry2023-01-101-2/+3
| | | | | | | | | | | | | | | | | This optimization never happened because the NULL target was removed in GFX11. fossil-db (gfx1100): Totals from 5439 (4.04% of 134574) affected shaders: Instrs: 407865 -> 387123 (-5.09%) CodeSize: 2163340 -> 2060644 (-4.75%) Latency: 3432378 -> 3327802 (-3.05%) InvThroughput: 270133 -> 262980 (-2.65%) Branches: 8524 -> 3085 (-63.81%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20513>
* aco: Use v_mov_b16 on GFX11.Georg Lehmann2023-01-031-3/+32
| | | | | | | | | | | | Foz-DB GFX1100: Totals from 4684 (3.47% of 134913) affected shaders: CodeSize: 41086444 -> 41043476 (-0.10%) Instrs: 8176019 -> 8175995 (-0.00%) Latency: 83792071 -> 83792023 (-0.00%) InvThroughput: 10311371 -> 10311369 (-0.00%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20369>
* aco/gfx11: export mrtz in discard early exit for non-color shadersRhys Perry2022-12-161-2/+5
| | | | | | | | | | | | | | | | If a shader doesn't export any color targets and instead only exports mrtz, the discard early exit block should match. Fixes artifacts on Lara in Rise of the Tomb Raider benchmark and hair in The Witcher 3 (classic). https://reviews.llvm.org/D128185 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: bc8da20dda6 ("aco: export MRT0 instead of NULL on GFX11") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20345>
* aco: Emulate Wave64 bpermute on GFX11.Timur Kristóf2022-12-141-0/+65
| | | | | | | | | Similar to emit_gfx10_wave64_bpermute, but uses the new v_permlane64_b32 instruction to swap data between wave halves. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
* aco: Stylistic changes to emit_gfx10_wave64_bpermute.Timur Kristóf2022-12-141-3/+4
| | | | | | Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
* aco: Split opcodes for GFX6 and GFX10 emulated bpermute.Timur Kristóf2022-12-141-7/+6
| | | | | | | | | Different sequences are emitted for these, so it makes sense to have different opcodes too. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20293>
* aco: Don't use v_lshrrev_b64 for moves on GFX11.Bas Nieuwenhuizen2022-12-021-2/+4
| | | | | | | | | | | | | | | | | Looking at VOPD things, shifts are not very likely to get dual issued but plain moves are. Looking at RDNA2 v_lshrrev_b64 are half the perf of v_mov_b32 (but you need twice as many moves), so on GFX11 this likely reaches the threshold where moves are faster. Totals from 68400 (50.70% of 134906) affected shaders: CodeSize: 275489516 -> 275459536 (-0.01%); split: -0.01%, +0.00% Instrs: 51775474 -> 51991286 (+0.42%) Latency: 589884847 -> 589066439 (-0.14%); split: -0.15%, +0.01% InvThroughput: 127154986 -> 126037619 (-0.88%); split: -0.88%, +0.00% Copies: 3756157 -> 3976193 (+5.86%) Branches: 1259604 -> 1260072 (+0.04%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19633>
* aco: improve do_pack_2x16() with zero constantsRhys Perry2022-12-011-6/+8
| | | | | | | | | We can skip the v_or_b32 or use an instruction smaller than v_alignbyte_b32. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
* aco/gfx11: use v_cvt_i32_i16/v_cvt_u32_u16Rhys Perry2022-12-011-0/+5
| | | | | | | | | | fossil-db (gfx1100): Totals from 52753 (39.07% of 135032) affected shaders: CodeSize: 153603860 -> 153163384 (-0.29%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19933>
* aco: fix emitting DEALLOC_VGPRS in the discard blockSamuel Pitoiset2022-11-221-2/+2
| | | | | | | | | It should be emitted right before s_endpgm. Cc: 22.3 mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19931>
* aco: add p_dual_src_export_gfx11 for dual source blending on GFX11Samuel Pitoiset2022-11-161-0/+79
| | | | | | | | | Dual source blending must be in strict WQM mode. Cc: 22.3 mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643>
* aco: move statistics enum to aco_shader_info.hDaniel Schürmann2022-11-151-2/+3
| | | | | | to make it accessible from the driver. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19721>
* aco: fix p_interp_gfx11 to not overwrite SCCSamuel Pitoiset2022-11-151-1/+4
| | | | | | | | | | | s_wqm_b64 clobbers SCC. Found this while working on dual source blending. Fixes: 6113ee650a2 ("aco/gfx11: fix FS input loads in quad-divergent control flow") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19747>
* aco: Use s_pack_ll_b32_b16 for scalar zero extend.Georg Lehmann2022-11-011-0/+2
| | | | | | | | | | Foz-DB Navi21: Totals from 2403 (1.78% of 134913) affected shaders: CodeSize: 25329156 -> 25311244 (-0.07%) Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19413>
* aco/gfx11: fix FS input loads in quad-divergent control flowRhys Perry2022-11-011-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | This is not ideal and it would be great to somehow make it better some day. fossil-db (gfx1100): Totals from 5208 (3.86% of 135032) affected shaders: MaxWaves: 127058 -> 126962 (-0.08%); split: +0.01%, -0.09% Instrs: 3983440 -> 4072736 (+2.24%); split: -0.00%, +2.24% CodeSize: 21872468 -> 22230852 (+1.64%); split: -0.00%, +1.64% VGPRs: 206688 -> 206984 (+0.14%); split: -0.05%, +0.20% Latency: 37447383 -> 37491197 (+0.12%); split: -0.05%, +0.17% InvThroughput: 6421955 -> 6422348 (+0.01%); split: -0.03%, +0.03% VClause: 71579 -> 71545 (-0.05%); split: -0.09%, +0.04% SClause: 148289 -> 147146 (-0.77%); split: -0.84%, +0.07% Copies: 259011 -> 258084 (-0.36%); split: -0.61%, +0.25% Branches: 101366 -> 101314 (-0.05%); split: -0.10%, +0.05% PreSGPRs: 223482 -> 223460 (-0.01%); split: -0.21%, +0.20% PreVGPRs: 184448 -> 184744 (+0.16%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19370>
* aco: fix typo in branch loweringRhys Perry2022-11-011-1/+1
| | | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: aadb7aef019 ("aco: add VINTERP instruction format") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19370>
* aco: swap v_perm_b32 operandsRhys Perry2022-10-241-7/+8
| | | | | | | | | | I misread the ISA doc and got the order wrong. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: dae1629778d ("aco: disable sdwa on gfx11") Fixes: e68e6c75ca1 ("aco: use v_perm_b32 to copy 0xff00/0x00ff/0xff/0x00") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19223>
* aco: Allow explicitly removing jumps on GFX10+ when beneficial.Timur Kristóf2022-10-111-12/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Removing jumps" in ACO means skipping the jump instruction at the beginning of a divergent branch (but still modify exec). ACO already supports implicitly removing jumps when it decides that executing a branch with empty exec mask is more beneficial than a jump. This commit adds the possibility to use this explicitly through nir_selection_control. ACO will respect this setting and remove the branch instructions when this is specified, unless it decides that this would cause bugs (eg. exp instruction). There are two cases that benefit from the new change: 1. When the application requests to "flatten" a branch (ie. remove control flow), we now respect that. 2. When the compiler stack determines that a divergent branch is always taken. v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks Fossil DB stats on Navi 21: Totals from 13 (0.01% of 134906) affected shaders: CodeSize: 136616 -> 136496 (-0.09%) Instrs: 26196 -> 26166 (-0.11%) Latency: 417928 -> 417889 (-0.01%) Branches: 1241 -> 1211 (-2.42%) Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-By: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
* aco/gfx11: deallocate VGPRs at the end of the shaderRhys Perry2022-09-301-0/+4
| | | | | | | | | | | | | fossil-db (gfx1100): Totals from 65987 (40.81% of 161689) affected shaders: Instrs: 57123207 -> 57199947 (+0.13%) CodeSize: 308402500 -> 308709460 (+0.10%) Latency: 680527139 -> 680527160 (+0.00%) InvThroughput: 131620026 -> 131620045 (+0.00%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17710>
* aco: add VINTERP instruction formatRhys Perry2022-09-261-1/+1
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
* aco: add LDSDIR instruction formatRhys Perry2022-09-261-1/+1
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17333>
* aco: Fix p_init_scratch for task shaders.Timur Kristóf2022-09-011-1/+1
| | | | | | | Fixes: d2d94b62f2a4f8686c17b7c33ae02aa2b2029a27 Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18339>
* aco: use std::vector::reserve() more oftenDaniel Schürmann2022-08-301-1/+2
| | | | | | | This removes the majority of vector re-allocations. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18105>
* aco: fix long-jump version of discard early exitRhys Perry2022-08-251-2/+2
| | | | | | | | | | | It isn't safe to modify the exec mask before the discard block, and the definition interferes with GFX11 NOP insertion. Just use s[0:1] instead, since we won't be using it. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18125>
* aco: add new pseudo instruction p_jump_to_epilogSamuel Pitoiset2022-07-181-0/+4
| | | | | | | | | | The first operand of this new pseudo-instruction is a 64-bit SGPR for the continue PC, followed by a variable list of fixed VGPRS for the color exports which are the PS epilog inputs. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17485>
* aco: initialize scratch base registers on GFX9-GFX10.3Rhys Perry2022-07-081-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | fossil-db (navi21): Totals from 1142 (0.70% of 162293) affected shaders: Instrs: 271636 -> 271974 (+0.12%) CodeSize: 1532020 -> 1533792 (+0.12%) Latency: 7484066 -> 7485698 (+0.02%) InvThroughput: 4048824 -> 4049579 (+0.02%) SClause: 4171 -> 4212 (+0.98%) PreSGPRs: 11203 -> 12276 (+9.58%) fossil-db (vega10): Totals from 3327 (2.06% of 161355) affected shaders: Instrs: 257413 -> 257601 (+0.07%) CodeSize: 1424244 -> 1425372 (+0.08%) Latency: 8598402 -> 8600466 (+0.02%) InvThroughput: 7906335 -> 7908234 (+0.02%) SClause: 4932 -> 4973 (+0.83%) PreSGPRs: 22010 -> 25405 (+15.42%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17079>
* aco: don't use 32-bit fp inline constants for fp16 vop3p literalsRhys Perry2022-07-051-4/+1
| | | | | | | | | | If we're applying the literal 0x3f800000 to a fp16 vop3p instruction, we shouldn't use the 1.0 inline constant, because the hardware will use the 16-bit 1.0: 0x00003c00. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16296>
* aco: fix single-alignbyte do_pack_2x16() path with fp inline constantsRhys Perry2022-07-051-1/+5
| | | | | | | | | | We were using a 16-bit inline constant with a 32-bit instruction and the test would have created "v1: %_:v[0] = v_alignbyte_b32 0.5, %_:v[1][16:32], 2" instead. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16296>
* aco: use v_perm_b32 to copy 0xff00/0x00ff/0xff/0x00Rhys Perry2022-05-311-0/+10
| | | | | | Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>
* aco: disable sdwa on gfx11Rhys Perry2022-05-311-20/+142
| | | | | | | | | | | | | Instead of SDWA v_mov_b32/v_xor_b32, we can use a combination of v_add_u16/v_sub_u16 (add/sub swap, similar to xor swap) and v_perm_b32 with a literal. I don't know yet if GFX11 adds any new instructions which makes this easier, but this approach should have full functionality. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>
* aco: clarify a portion of do_pack_2x16Rhys Perry2022-05-311-1/+3
| | | | | | | | This confused me a bit when I first saw it. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16595>
* aco: only add/subtract low bits of program addressesRhys Perry2022-05-231-2/+1
| | | | | | | | | | | | | fossil-db (Sienna Cichlid): Totals from 4007 (2.47% of 162293) affected shaders: Instrs: 3733239 -> 3728018 (-0.14%) CodeSize: 20770340 -> 20749456 (-0.10%) Latency: 46883958 -> 46872764 (-0.02%); split: -0.02%, +0.00% InvThroughput: 10550392 -> 10548698 (-0.02%); split: -0.02%, +0.00% Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16460>
* aco: fix p_constaddr with a non-zero offsetRhys Perry2022-05-231-1/+1
| | | | | | | | | Seems this broke a while ago and we never noticed. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Fixes: 0af7ff49fde ("aco: lower p_constaddr into separate instructions earlier") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16460>
* amd: change chip_class naming to "enum amd_gfx_level gfx_level"Marek Olšák2022-05-131-66/+66
| | | | | | | | This aligns the naming with PAL. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pellou-Prayer <pierre-eric.pelloux-prayer@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16469>
* aco: export MRT0 instead of NULL on GFX11Samuel Pitoiset2022-05-121-1/+2
| | | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16369>