opcodes/arm: add disassembler styling for arm

This commit adds disassembler styling for the ARM architecture. The ARM disassembler is driven by several instruction tables, e.g. cde_opcodes, coprocessor_opcodes, neon_opcodes, etc The type for elements in each table can vary, but they all have one thing in common, a 'const char *assembler' field. This field contains a string that describes the assembler syntax of the instruction. Embedded within that assembler syntax are various escape characters, prefixed with a '%'. Here's an example of a very simple instruction from the arm_opcodes table: "pld\t%a" The '%a' indicates a particular type of operand, the function print_insn_arm processes the arm_opcodes table, and includes a switch statement that handles the '%a' operand, and takes care of printing the correct value for that instruction operand. It is worth noting that there are many print_* functions, each function handles a single *_opcodes table, and includes its own switch statement for operand handling. As a result, every *_opcodes table uses a different mapping for the operand escape sequences. This means that '%a' might print an address for one *_opcodes table, but in a different *_opcodes table '%a' might print a register operand. Notice as well that in our example above, the instruction mnemonic 'pld' is embedded within the assembler string. Some instructions also include comments within the assembler string, for example, also from the arm_opcodes table: "nop\t\t\t@ (mov r0, r0)" here, everything after the '@' is a comment that is displayed at the end of the instruction disassembly. The next complexity is that the meaning of some escape sequences is not necessarily fixed. Consider these two examples from arm_opcodes: "ldrex%c\tr%12-15d, [%16-19R]" "setpan\t#%9-9d" Here, the '%d' escape is used with a bitfield modifier, '%12-15d' in the first instruction, and '%9-9d' in the second instruction, but, both of these are the '%d' escape. However, in the first instruction, the '%d' is used to print a register number, notice the 'r' immediately before the '%d'. In the second instruction the '%d' is used to print an immediate, notice the '#' just before the '%d'. We have two problems here, first, the '%d' needs to know if it should use register style or immediate style, and secondly, the 'r' and '#' characters also need to be styled appropriately. The final thing we must consider is that some escape codes result in more than just a single operand being printed, for example, the '%q' operand as used in arm_opcodes ends up calling arm_decode_shift, which can print a register name, a shift type, and a shift amount, this could end up using register, sub-mnemonic, and immediate styles, as well as the text style for things like ',' between the different parts. I propose a three layer approach to adding styling: (1) Basic state machine: When we start printing an instruction we should maintain the idea of a 'base_style'. Every character from the assembler string will be printed using the base_style. The base_style will start as mnemonic, as each instruction starts with an instruction mnemonic. When we encounter the first '\t' character, the base_style will change to text. When we encounter the first '@' the base_style will change to comment_start. This simple state machine ensures that for simple instructions the basic parts, except for the operands themselves, will be printed in the correct style. (2) Simple operand styling: For operands that only have a single meaning, or which expand to multiple parts, all of which have a consistent meaning, then I will simply update the operand printing code to print the operand with the correct style. This will cover a large number of the operands, and is the most consistent with how styling has been added to previous architectures. (3) New styling syntax in assembler strings: For cases like the '%d' that I describe above, I propose adding a new extension to the assembler syntax. This extension will allow me to temporarily change the base_style. Operands like '%d', will then print using the base_style rather than using a fixed style. Here are the two examples from above that use '%d', updated with the new syntax extension: "ldrex%c\t%{R:r%12-15d%}, [%16-19R]" "setpan\t%{I:#%9-9d%}" The syntax has the general form '%{X:....%}' where the 'X' character changes to indicate a different style. In the first instruction I use '%{R:...%}' to change base_style to the register style, and in the second '%{I:...%}' changes base_style to immediate style. Notice that the 'r' and '#' characters are included within the new style group, this ensures that these characters are printed with the correct style rather than as text. The function decode_base_style maps from character to style. I've included a character for each style for completeness, though only a small number of styles are currently used. I have updated arm-dis.c to the above scheme, and checked all of the tests in gas/testsuite/gas/arm/, and the styling looks reasonable. There are no regressions on the ARM gas/binutils/ld tests that I can see, so I don't believe I've changed the output layout at all. There were two binutils tests for which I needed to force the disassembler styling off. I can't guarantee that I've not missed some untested corners of the disassembler, or that I might have just missed some incorrectly styled output when reviewing the test results, but I don't believe I've introduced any changes that could break the disassembler - the worst should be some aspect is not styled correctly.
author: Andrew Burgess <aburgess@redhat.com> 2022-07-07 13:43:45 +0100
committer: Andrew Burgess <aburgess@redhat.com> 2022-11-01 09:32:13 +0000
commit: 6576bffe6cbbb53c5756b2fccd2593ba69b74cdf (patch)
tree: c3337c121d91e60706e07a0b1fc847c30e751229 /opcodes/disassemble.c
parent: 8cb6e17571f3fb66ccd4fa19f881602542cd06fc (diff)
download: binutils-gdb-6576bffe6cbbb53c5756b2fccd2593ba69b74cdf.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index 79a2f3dabe5..0a8f2da629f 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -622,6 +622,7 @@ disassemble_init_for_target (struct disassemble_info * info)
     case bfd_arch_arm:
       info->symbol_is_valid = arm_symbol_is_valid;
       info->disassembler_needs_relocs = true;
+      info->created_styled_output = true;
       break;
 #endif
 #ifdef ARCH_avr
author	Andrew Burgess <aburgess@redhat.com>	2022-07-07 13:43:45 +0100
committer	Andrew Burgess <aburgess@redhat.com>	2022-11-01 09:32:13 +0000
commit	6576bffe6cbbb53c5756b2fccd2593ba69b74cdf (patch)
tree	c3337c121d91e60706e07a0b1fc847c30e751229 /opcodes/disassemble.c
parent	8cb6e17571f3fb66ccd4fa19f881602542cd06fc (diff)
download	binutils-gdb-6576bffe6cbbb53c5756b2fccd2593ba69b74cdf.tar.gz