delta/binutils-gdb.git - sourceware.org: git/binutils-gdb.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Only check for valid Morello bounds on non-exec syms	Matthew Malcomson	2022-10-13	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Capabilities pointing to symbols in SEC_CODE sections are given the bounds of the entire PCC. We ensure that the PCC bounds are padded and aligned as needed in the linker. Capabilities pointing to other symbols (e.g. in data sections) are given the bounds of the symbol that they point to. It is the responsibility of the assembly generator (i.e. usually the compiler) to ensure these bounds are correctly aligned and padded as necessary. We emit a warning for imprecise bounds in the second case, until this patch that warning also looked at the first case. This was a mistake and is rectified in this commit.
*	Various fixes for capability IFUNCs	Matthew Malcomson	2022-10-13	1	-32/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Enable having a CAPINIT relocation against an IFUNC. We update the `final_link_relocate` switch case around IFUNC's to also handle CAPINIT relocations. The handling of CAPINIT relocations is slightly different than for AARCH64_NN (i.e. ABS64) relocations since we generally need to emit a dynamic relocation. Handling this relocation also needs to manage the PDE case when a hard-coded address has been put into code to satisfy something like an `adrp`. In these cases the canonical address of the IFUNC becomes its PLT stub rather than the result of the resolver. We then need to use a RELATIVE relocation rather than an IRELATIVE one. N.b. unlike the ABS64 relocation, since a CAPINIT will always emit a dynamic relocation we do not require pointer equality adjustments on a symbol from having seen a CAPINIT. That means we do not need to request that the PLT stub of an IFUNC is treated as the canonical address just from having seen a CAPINIT relocation. A CAPINIT relocation against an IFUNC needs to be recorded internally so that _bfd_elf_allocate_ifunc_dyn_relocs does not garbage collect the PLT stub and associated IRELATIVE relocation. See changes in the CAPINIT case of the IFUNC switch of elfNN_aarch64_final_link_relocate, and in the CAPINIT case of elfNN_aarch64_check_relocs. 2) Ensure that GOT relocations against an IFUNC have their fragment populated with the LSB set. For GOT relocations against a capability IFUNC we need to introduce a relocation for the runtime to provide us with a valid capability. See changes in the GOT cases of the IFUNC switch of elfNN_aarch64_final_link_relocate, changes in the elfNN_aarch64_allocate_ifunc_dynrelocs function, and changes around handling an IFUNC GOT entry in elfNN_aarch64_finish_dynamic_symbol. 3) Ensure that mapping symbols are emitted for the .iplt. Without this many of the testcases here are disassembled incorrectly. See changes in elfNN_aarch64_output_arch_local_syms. 4) IRELATIVE relocations are against symbols which are not in the dynamic symbol table, hence they need their fragment populated to inform the dynamic linker the bounds and permissions to call the associated resolver with. See part of the CAPINIT IFUNC handling in elfNN_aarch64_final_link_relocate, and the IRELATIVE handling in elfNN_aarch64_create_small_pltn_entry. 5) Disallow an ABS64 relocation against a purecap IFUNC. Such a relocation is expecting a 64-bit value but the function will return a capability. Some handling could be implemented by some communication method to the dynamic linker that this particular value should be 64-bit (maybe by emitting an AARCH64_IRELATIVE relocation rather than a MORELLO_IRELATIVE one), but as yet GCC doesn't generate such a relocation and we believe it's unlikely to be needed. See new error check in AARCH64_NN clause of elfNN_aarch64_final_link_relocate. 6) Ensure that for statically linked PDE's, we segregate IRELATIVE and RELATIVE relocations. IRELATIVE relocs should be in the .rela.iplt section, while RELATIVE relocs should be in the .rela.dyn section. Correspondingly all RELATIVE relocations should be between the __rela_dyn_{start,end} symbols, and all IRELATIVE relocations should be between the __rela_iplt_{start,end} symbols. This segregation is made based on dynamic relocation type rather than static relocation that generates it. The segregation allows the static libc to more easily handle relocations. Update testcases accordingly. We introduce some new testcases, morello-ifunc.s contains uses of an IFUNC which has been referenced directly in code. When compiling a PDE this triggers the pointer equality requirement and hence the canonical address for this symbol becomes the PLT stub rather than the result of the resolver. morello-ifunc1.s does not use the IFUNC directly in code so that the address used everywhere is the result of the resolver. Both of these have testcases assembled and linked for static, dynamically linked PDE, and PIE. The testcase without a hard-coded access also has a testcase for -shared. morello-ifunc2.s is written to check that a CAPINIT relocation does indeed stop the garbage collection of an IFUNC's PLT and IRELATIVE relocation. morello-ifunc3.s tests that we error on an ABS64 relocation against a C64 IFUNC. morello-ifunc-dynlink.s tests that a CAPINIT relocation against an IFUNC symbol defined in a shared library behaves the same way as one against a FUNC symbol defined in a shared library. Implementation note: When segregating IRELATIVE and RELATIVE relocs the change for relocations against IFUNC symbols populated in the GOT is straightforward. For CAPINIT relocations the change is not as straightforward. The problem is that on sight of CAPINIT relocations in check_relocs we immediately allocate space in the srelcaps section. In trying to satisfy the above we need to know whether we're going to be emitting an IRELATIVE relocation or RELATIVE one in order to know which section it should go in. The determining factor between these two kinds of relocations is whether there is a text relocation to this IFUNC symbol, since that determines whether we need to make this CAPINIT relocation a RELATIVE relocation pointing to the PLT stub (in order to satisfy pointer equality) or an IRELATIVE relocation pointing to the resolver. Whether such a relocation occurs is recorded against each symbol in the pointer_equality_needed member. This can only be known after all relocations have been seen in check_relocs. Hence, when coming across a CAPINIT relocation in check_relocs we do not in general know whether this CAPINIT relocation should end up as an IRELATIVE or RELATIVE relocation. This patch postpones the decision by recording the number of CAPINIT relocations against a given symbol in a hash table while going through check_relocs and allocating the relevant space in the required section in size_dynamic_sections. N.b. this is similar in purpose to the dyn_relocs linked list on a symbol. We do not use that existing member which is on every symbol since the structure does not allow any indication of what kind of relocation triggered the need. Moreover the structure is used for different purposes throughout the linker and disentangling the new meaning from the existing ones seems overly confusing. Overall, the decisions about which sections relocations against an IFUNC should go in are: CAPINIT relocations: If this is a static PDE link, and the symbol does not need pointer equality handling, then this should emit an IRELATIVE relocation and that should go in the .rela.iplt section. If this is a PIC link, then this should go in the .rela.ifunc section (along with all other dynamic relocations against the IFUNC, as commented in _bfd_elf_allocate_ifunc_dyn_relocs). Otherwise this relocation should go in the srelcaps section (which goes in .rela.dyn). GOT relocations: If this is a static PDE link, and the symbol does not need pointer equality, then this should emit an IRELATIVE relocation into the .rela.iplt section. If this is a static PDE link, then this should emit a RELATIVE relocation and that should go in the srelcaps section (which is in .rela.dyn). Otherwise this should go in .rela.got section.
*	ld, aarch64: Account for stubs in bounds sizing	Alex Coplan	2022-10-05	2	-80/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch deals with the interaction between the code that attempts to make bounds precise (for both the PCC bounds and for some individual sections) and the code that adds stubs (e.g. long-branch veneers and interworking stubs) in the AArch64 backend. We aim to set precise bounds for the PCC span and some individual sections in elfNN_c64_resize_sections. However, it transpires that elfNN_aarch64_size_stubs can change the layout in ways that extend sections that should be covered under the PCC span outside of the bounds set in elfNN_c64_resize_sections. The introduction of stubs can also change (even reduce) the amount of padding required to make the bounds on any given section precise. To address this problem, we move the core logic from elfNN_c64_size_sections into a new function, c64_resize_sections, that is safe to be called repeatedly. Similarly, we move the core logic from elfNN_aarch64_size_stubs into a new function aarch64_size_stubs which again can be called repeatedly. We then adjust elfNN_aarch64_size_stubs to call aarch64_size_stubs and c64_resize_sections in a loop, stopping when c64_resize_sections no longer makes any changes to the layout. An important observation made above is that the introduction of stubs can change the amount of padding needed to make bounds precise. Likewise, introducing padding can in theory necessitate the introduction of stubs (e.g. if the change in layout necessitates a long-branch veneer). This is why we run the resizing/stubs code in a loop until no further changes are necessary. Since the amount of padding needed to achieve precise bounds for a section can change (indeed reduce) with the introduction of stubs, we need a mechanism to update the amount of padding applied to a section in a subsequent iteration of c64_resize_sections. We achieve this by introducing a new interface in ld/emultempl/aarch64elf.em. We have the functions: static void c64_set_section_padding (asection osec, bfd_vma padding, void cookie); static void c64_get_section_padding (void cookie); Here, the "cookie" value is, to consumers of this interface (i.e. bfd/elfnn-aarch64.c), an opaque handle used to refer to the padding that was introduced for a given section. The consuming code then passes back the cookie to later query the amount of padding already installed or to update the amount of padding. Internally, within aarch64elf.em, the "cookie" is just a pointer to the node in the ldexp tree containing the integer amount of padding inserted. In the AArch64 ELF backend, we then maintain a (lazily-allocated) mapping between output sections and cookies in order to be able to update the padding we installed in subsequent iterations of c64_resize_sections. While working on this patch, an edge case became apparent: the case where pcc_high_sec requires precise bounds (i.e. where we call ensure_precisely_bounded_section on pcc_high_sec). As it stands, in this case, the code to ensure precise PCC bounds may in fact make the bounds on pcc_high_sec itself no longer representable (even if we previously ensured this by calling ensure_precisely_bounded_section). In general, it is not always possible to choose an amount of padding to add to the end of pcc_high_sec to make both pcc_high_sec and the PCC span itself have precise bounds (without introducing an unreasonably large alignment requirement on pcc_high_sec). To handle the edge case above, we decouple these two problems by adding a separate amount of padding after pcc_high_sec to make the PCC bounds precise. If pcc_high_sec is required to have precise bounds, then that can be done in the usual way by adding padding to pcc_high_sec in ensure_precisely_bounded_section. The new mechanism for adding padding after an output section is implemented in aarch64elf.em:c64_pad_after_section. To avoid having to add yet another mechanism to update the padding after pcc_high_sec, we avoid adding this padding until all other resizing / bounds-setting work is done. This is not possible for individual sections since padding introduced there may have a knock-on effect requiring further work, but we believe this isn't the case for the padding added after pcc_high_sec to make the PCC bounds precise. This patch also reveals a pre-existing issue whereby we end up calling ensure_precisely_bounded_section on the ABS section. Without a further change to prevent this, this can lead to a null pointer dereference in ensure_precisely_bounded_section, since the "owner" field on the ABS pointer is NULL, and we use this field to obtain a pointer to the output BFD in the new c64_get_section_padding_info function. Of course, it doesn't make sense for ensure_precisely_bounded_section to be called on the ABS section in the first place. This can happen when there are relocations against ldscript-defined symbols which are defined at the top level of the ldscript (i.e. not in a particular output section). Those symbols initially have their output section set to the ABS section. Later, we resolve such symbols to their correct output section in ldexp_finalize_syms, but the code in c64_resize_sections is running in ldemul_after_allocation, which comes before the call to ldexp_finalize_syms in the lang_process flow. For now, we just skip such symbols when looking for sections that need precise bounds in c64_resize_sections, but this issue will later need fixing properly. We choose to avoid fixing the pre-existing issue in this patch to avoid over-complicating an already complex change.
*	Extra error checking around TLS relocations	Matthew Malcomson	2022-08-05	1	-9/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We add the following extra error checking: 1) That TLS relocations (including SIZE relocations, but excluding Local-Exec relocations) are not requested against a symbol plus addend. 2) That SIZE relocations are requested against a defined symbol in the current binary (i.e. one that the static linker knows the size of). 3) A TLS Local-Exec relocation must be against a symbol in the current binary. All the above also have error messages that describe the problem so that the user could fix it. Treating a relocation against a "symbol plus addend" as an error is due to a combination of factors. - The linker implementation does not have any way to represent a GOT entry of "symbol plus addend". Hence we currently just have silent bugs if asked to implement those relocations which require a GOT entry if they have a "symbol plus addend" relocation. - It would be wasteful anyway to have multiple entries in the GOT for e.g. sym+off1, sym+off2. - Morello size relocations don't support "symbol plus addend" since the meaning would have to be defined (is this the remaining size of the symbol?) and there is no known use for this. We allow local-exec relocation on "symbol plus addend" since then the addend just implements an offset into the object we're accessing (rather than a new GOT entry for the location of "symbol plus addend"). There is also an existing testcase in the BFD linker to allow such relocations. The compiler can always avoid emitting these if it wants. Notes on implementation: - We choose to check errors in final_link_relocate rather than check_relocs since this is where most existing error checking is done. - We check for errors around addends in relocate_section rather than final_link_relocate or check_relocs since final_link_relocate does not get told the original relocation (before TLS relaxation) and check_relocs does not know about addends coming from the result of previous relocations on the same code. N.b. in order to emit multiple errors when there are multiple relocations with an addend we change things in relocate_section to store a "return value" in a local variable and set it to false if any problem was seen but not return early.
*	Remove layout_sections_again argument to size_stubs	Matthew Malcomson	2022-08-05	2	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was originally the first place that a function in bfd/elfnn-aarch64.c was given a reference to gldaarch64_layout_sections_again, and hence was the natural place to store the function onto the elf hash table. Ever since the introduction of elfNN_c64_resize_sections we have been performing this operation in that function before this size_stubs function. Hence it seems sensible to remove the argument and now superfluous operation from elfNN_aarch64_size_stubs.
*	Implement Morello TLS relaxations	Matthew Malcomson	2022-08-05	1	-99/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The majority of the code change here is around TLS data stubs. The new TLS ABI requires that when relaxing a General Dynamic or Initial Exec access to a variable to a Local Exec access, the linker emits data stubs in a read-only section. We do this with the below approach: - check_relocs notices that we need TLS data stubs by recognising that some relocation will need to be relaxed to a local-exec relocation. - check_relocs then records a hash table entry mapping the symbol that we are relocating against to a position in some data stub section. It also ensures that this data stub section has been created, and increments our data stub section size. - This section is placed in the resulting binary using the standard subsection and wildcard matching implemented by the generic linker. - In elfNN_aarch64_size_dynamic_sections we allocate the actual buffer for our data stub section. - When it comes to actually relaxing the TLS sequence, relocate_section directly populates the data stub using the address and size of the TLS object that has already been calculated, it then uses final_link_relocate to handle adjusting the text so that it points to this data stub. Notes on implementation: Mechanism by which we create and populate a TLS data section: There are currently three different ways by which the AArch64 backend creates and populates a special section. These are - The method by which the .got (and related sections) are populated. - The method by which the interworking stubs are populated. - The method by which erratum 843419 stub sections are populated. We have gone with an approach that mostly follows that used to populate the .got. Here we give an outline of the approaches and provide the reasoning by which the approach used by the .got was chosen. Handling the .got section: - Create a section on an existing BFD. - Mark that section as SEC_LINKER_CREATED. - Record the existing BFD as the `dynobj`. - bfd_elf_final_link still calls elf_link_input_bfd on the object. - elf_link_input_bfd avoids emitting the section (because of SEC_LINKER_CREATED). - bfd_elf_final_link then emits the special sections on `dynobj` after all the non-special sections on all objects have been relocated. - Allows updating the .got input section in relocate_section & final_link_relocate knowing that its contents will be output once all relocations on standard input sections have been processed. Handling interworking stub sections. - Create a special stub file. - Create sections on that stub file for each input section we need a stub for. - Manually populate the sections in build_stubs (which is called through `ldemul_finish` before `ldwrite` and hence before any other files are relocated). Handling erratum 843419 stub sections. - Create a special stub file. - Create sections on that stub file for each input section we need a stub for. - Ensure that the stub file is marked with class ELFCLASSNONE. - Ensure that the list of input sections for the relevant output section statement has the veneered input section directly before the stub section which has the veneer. - When relocating and outputting sections, having ELFCLASSNONE means that we output sections on the stub_file only when we see the corresponding input statement. Without that class marker bfd_elf_final_link calls elf_link_input_bfd which writes out the data for all input sections on the relevant BFD. - Since we have ensured the input statement for our stub section is directly after the input statement for the section we are emitting veneers for, we know that the veneered section will be relocated and output before we output our stub section. - Hence we can copy relocated data from the veneered section into our stub section and know that our stub section will be output after this modification has been made. In deciding what to do with the read-only TLS data stubs we noticed the following problems with each approach: - The ABI requires that the read-only TLS data stubs are emitted into a read-only section. This will necessarily be a different output section to .text where the requirement for these stubs is found. The temporal order in which output sections are written to the output file is tied to the order in which the in-memory linker statements are kept, and that is tied to the linker script provided by the user. Hence we can not rely on ordering and ELFCLASSNONE to ensure that our data stub section is emitted after the relevant TLS sequences have been relaxed. (We need to know our data stub section is written to the output after we have populated it as otherwise the data would not propagate to the resulting binary). - I think it is easier and simpler to find the data needed for the TLS data stubs in relocate_section just as we relax the relevant TLS sequences. Hence I don't want to use the approach used for interworking stubs of populating the entire section beforehand. - Adding a section to `dynobj` would mean that we're adding a section to a user input BFD, which is not quite as clear as having a separate BFD for our special stub section. It also means we treat this particular section as a "dynamic" section. This is a little confusing nomenclature-wise. Based on the above trade-offs we chose the .got approach (accepting the negative that this will be stored on a user BFD). N.b. using the .got approach and requiring the section get allocated in `size_dynamic_sections` is not problematic for static executables despite the nomenclature. This function always gets called. One difference between how we handle the data stubs and how the .got is handled is that we do not count the number of data stubs required in size_dynamic_sections, but rather total it as we see relocations needing these stubs in check_relocs. We do this largely to avoid requiring another data member on all symbols to indicate information about whether this symbol needs a data stub and where that data stub is. The number of TLS symbols are expected to be much smaller than the number of symbols with an entry in the GOT and hence a separate hash table just containing entries for those symbols which need such information is likely to often be smaller. N.b. it is interesting to mention that for all relocations which need a data stub we would make an input section on `dynobj` in `check_relocs` if that relaxation were not performed. This is since if we did not realise they could be relaxed these relocations would have needed a .got entry. N.b. we must use make_section_anyway to create our TLS data stubs section in order to avoid any problems with our linker defined section having the same name as a section already defined by the user. We do not use local stub symbols: The TLS ABI describes data stubs using specially named symbols. These are not part of the ABI. We could have associated the position of a data stub with a particular symbol by generating a symbol internally using some name mangling scheme that matches that in the TLS ABI examples and points to the data stub for a particular symbol. We take the current approach on the belief that it is "neater" to avoid relying on such a name-mangling scheme and the associated sprintf calls. final_link_relocate handling the adjusted relocation for data stubs: final_link_relocate does not actually use the `h` or `sym` arguments to decide anything for the two relocations we need to handle once we have relaxed an IE or GD TLS access sequence to a LE one. The relocations we need are BFD_RELOC_MORELLO_ADR_HI20_PCREL and BFD_RELOC_AARCH64_ADD_LO12. For both (and in fact for most relocations) we only use `h` and `sym` for catching and reporting errors. This means we don't actually have to update the `h` and/or `sym` variables before calling elfNN_aarch64_final_link_relocate. Allocate TLS data stubs on dynobj This uses the same approach that the linker uses to ensure that the .got sections are emitted after all relocations have been processed. elf_link_input_bfd avoids sections with SEC_LINKER_CREATED, and bfd_elf_final_link emits all SEC_LINKER_CREATED sections on the dynobj after standard sections have been relocated. This means that we can populate the contents of the TLS data stub section while performing relocations on all our other sections (i.e. in the same place as we perform the relaxations on the TLS sequences that we recognise need these data stubs). Assert that copy_indirect_symbol is not a problem copy_indirect_symbol takes information from one symbol and puts it onto another. The point is to ensure that any symbol which simply refers to another has all its cached information on that symbol to which it refers rather than itself. If we could ever call this function on a symbol which we have found needs an associated data stub created, then we could have to handle adjusting the hash table associating a symbol with a data stub. We do not believe this is needed, and add an assert instead. The proof that this is not a problem is a little tricky. However it shouldn't be a problem given what it's handling. This is handling moving cached information from an indirected symbol to the symbol it represents. That is needed when the information was originally put on the indirected symbol, and that happens when the indirection was originally the other way around. The two ways that this reversal of indirection can happen is through resolving dynamic weak symbols and versioned symbols. Both of these are not something we can see with SYMBOL_REFERENCES_LOCAL TLS symbols (see below). We only need to worry about copy_indirect_symbol transferring information from a symbol which we have generated a TLS relaxation against to LE. In order to satisfy the criteria that we have generated a TLS relaxation hash entry against a symbol, we must have already have run check_relocs. This means that of the ways in which copy_indirect_symbol can be called we have eliminated all but _bfd_elf_fix_symbol_flags and bfd_elf_record_link_assignment. bfd_elf_record_link_assignment handles symbol assignments in a linker script. Such assignments can not be made on TLS symbols (we end up generating a non-TLS symbol). _bfd_elf_fix_symbol_flags only calls copy_indirect_symbol on symbols which have is_weakalias set. These are symbols "from a dynamic object", and we only ever call the hook when the real definition is in a non-shared object. Hence we would not have performed this relaxation on the symbol (because it is not SYMBOL_REFERENCES_LOCAL). Hence I don't believe this is something that we can trigger and we add an assertion here rather than add code to handle the case.
*	Add new relocations to linker (excluding relaxations)	Matthew Malcomson	2022-08-05	2	-13/+261
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some notes on the implementation decisions: Use _bfd_aarch64_elf_resolve_relocation on :size: relocations This is unnecessary, since all that function does in the case of :size: relocations is to return the value it was given as an argument. For the analogous MOVW_G0 relocations this function adds the addend and emits a warning in the case of a weak undefined TLS symbol. TPREL128/TLSDESC relocs now add size of symbol in fragment to satisfy the ABI requirement. This only happens when we know the size of the relevant symbol, we also emit the location of the symbol in a TPREL128 fragment when that is known too. See PR for documentation https://github.com/ARM-software/abi-aa/pull/80 Implementation note: Handling the size of a symbol according to whether the static linker knows what it is was very slightly tricky. Using the macro `SYMBOL_REFERENCES_LOCAL` to check whether we knew the size of a symbol is a problem. That macro treats PROTECTED visibility symbols as not local. This is in order to handle the case where a reference to a protected function symbol could end up having the value of an executable's PLT (in order to handle function equality and hard-coded addresses in an executable). Since TLS symbols can not be function symbols (n.b. this refers to the TLS object and not the resolver), this requirement does not apply. That means we should check this property with something like `SYMBOL_CALLS_LOCAL` (which is the existing macro to treat protected symbols differently). Given the confusing nomenclature here, we add a new AArch64 backend macro called `TLS_SYMBOL_REFERENCES_LOCAL` so that we have a nice name for it. N.b. in this patch we adjust all uses of `SYMBOL_REFERENCES_LOCAL` which are known to be acting on TLS symbols. This includes some places where it does not matter whether a symbol is protected or not because the condition also requires that we're in an executable (like in deciding whether a relocation can be relaxed). This was done simply for conformity and neatness.
*	Add new relocations to GAS	Matthew Malcomson	2022-08-05	3	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also add the ability to disassemble these relocations correctly. Include checking that many different sizes work with different instructions, include error checking that the `size` relocation is not allowed in a64 mode. Ensure that the size relocation is not allowed on instructions other than mov[kz]. See the arm ABI aaelf64-morello document for the definition of these new relocations. Regenerate bfd/bfd-in2.h and bfd/libbfd.h from bfd/reloc.c.
*	Adjust TLS relaxation condition	Matthew Malcomson	2022-08-05	1	-46/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In aarch64_tls_transition_without_check and elfNN_aarch64_tls_relax we choose whether to perform a relaxation to an IE access model or an LE access model based on whether the symbol itself is marked as local (i.e. `h == NULL`). This is problematic in two ways. The first is that sometimes a global dynamic access can be relaxed to an initial exec access when creating a shared library, and if that happens on a local symbol then we currently relax it to a local exec access instead. This usually does not happen since we only relax an access if aarch64_can_relax_tls returns true and aarch64_can_relax_tls does not have the same problem. However, it can happen when we have seen both an IE and GD access on the same symbol. This case is exercised in the newly added testcase tls-relax-gd-ie-2. The second problem is that deciding based on whether the symbol is local misses the case when the symbol is global but is still non-interposable and known to be located in the executable. This happens on all global symbols in executables. This case is exercised in the newly added testcase tls-relax-ie-le-4. Here we adjust the condition we base our relaxation on so that we relax to local-exec if we are creating an executable and the relevant symbol we're accessing is stored inside that executable. Alongside that general fix, we adjust the existing exclusion parameters for Morello relaxations. Patches are in-flight to replace the existing Morello TLS relocation handling with the more recent TLS ABI. This patch simply adjusts the existing handling to use a more robust method to determine the case when a GD -> LE relaxation can be performed. -- Updating tests for new relaxation criteria Many of the tests added to check our relaxation to IE were implemented by taking advantage of the fact that we did not relax a global symbol defined in an executable. Since a global symbol defined in an executable is still not interposable, we know that a TLS version of such a symbol will be in the main TLS block. This means that we can perform a stronger relaxation on such symbols and relax their accesses to a local-exec access. Hence we have to update all tests that relied on the older suboptimal decision making. The two cases when we still would want to relax a general dynamic access to an initial exec one are: 1) When in a shared library and accessing a symbol which we have already seen accessed with an initial exec access sequence. 2) When in an executable and accessing a symbol defined in a shared library. Both of these require shared library support, which means that these tests are now only available on targets with that. I have chosen to switch the existing testcases from a plain executable to one dynamically linked to a shared object as that doesn't require changing the testcases quite so much (just requires accessing a different variable rather than requiring adding another code sequence). The tls-relax-all testcase was an outlier to the above approach, since it included a general dynamic access to both a local and global symbol and inspected for the difference accordingly. This is the same logical change as https://sourceware.org/pipermail/binutils/2022-July/121660.html
*	Standardise check for static PDE	Matthew Malcomson	2022-08-05	1	-25/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have hit multiple problems checking for a static non-PIE binary using incorrect conditions. In looking into a TLS relaxation that should not have happened we found another. To help avoid this problem in the future (and to make reading the code a lot easier for someone who isn't familiar with the BFD linker flags) we now perform the check for a static PDE with a macro called `static_pde`. N.b. this macro can only be used after we've created any needed dynamic sections. That happens when loading symbols, which is very early on and hence before any of the places we want to use this macro. However it's still good to note it's not always a valid check.
*	Neaten up a clause in final_link_relocate	Matthew Malcomson	2022-07-01	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally this clause included lines checking `!bfd_link_pic && bfd_link_executable`. I left these in to ensure that the new Morello part to the clause did not interfere with the original stock AArch64 part of the clause. Now we have split the condition into multiple if statements for clarity, we can remove the confusing parts of the clause. This clause is to catch any symbols that go in the GOT but would not be otherwise given a relocation by finish_dynamic_symbol. We can express that check better with a modified condition. What we want to do in this clause is to account for all GOT entries which would not get a dynamic relocation otherwise, but need a RELATIVE dynamic relocation for Morello. This is any symbol for which c64_should_not_relocate is false and WILL_CALL_FINISH_DYNAMIC_SYMBOL is false. Changing the clause to only mention these two predicates (plus ensuring that we do not mess around with such relocations when creating a relocatable object file rather than a final binary) explains the purpose of this condition much better. N.b. see the commit message of 8f5baae3d15 for the reasoning for the original decision to not change the conditional.
*	Use global GOT type to determine GOT action	Matthew Malcomson	2022-07-01	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	morello-binutils: Use global GOT type to determine GOT action In final_link_relocate we currently use whether the relocation we're looking at is a Morello relocation to decide whether we should treat the GOT entry as a Morello GOT entry or not. This is problematic since we can have an AArch64 relocation against a capability GOT entry (even if it isn't a very useful thing to have). The current patch decides whether we need to emit a MORELLO RELATIVE relocation in the GOT based on whether the GOT as a whole contains capabilities rather than based on whether the first relocation against this GOT is a Morello relocation. Until now we did not see any problem from this. Here we add a testcase that triggers the problem.
*	Account for weak undefined symbols in Morello	Matthew Malcomson	2022-07-01	1	-58/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally we believed we had accounted for these symbols within the existing if conditional. It turns out that with `-pie --no-dynamic-linker` on the command line (which causes the `link_info` member `dynamic_undefined_weak` to be set to 0) such symbols can bypass elfNN_aarch64_allocate_dynrelocs putting these symbols into the dynamic symbol table. Hence we can have such symbols without a dynamic index and our existing conditionals need to be adjusted. On further inspection we notice that GOT entries for hidden undefined weak symbols were still getting RELATIVE relocations. This is quite unnecessary since it's known that the entry should be the NULL capability, but on top of that it relies on the runtime to have a special case to not add the load displacement to RELATIVE relocations with completely zero fragments. We make two logical adjustments. The first is that in our handling of CAPINIT relocations we add a clause to avoid emitting a relocation for any undefined weak symbol which we know for certain should end up with the NULL capability at runtime. In this clause we ensure that the fragment is completely zero. The second is around handling GOT entries. For these we ensure that elfNN_aarch64_allocate_dynrelocs does not allocate a dynamic relocation for the GOT entry of such symbols and that elfNN_aarch64_final_link_relocate leaves the GOT entry empty and without any relocation. N.b. in implementing this change the conditionals became quite confusing. We split them up quite unnecessarily into different else/if statements for clarity at the expense of verbosity. We also add tests to check the behaviour of undefined weak symbols for dynamically linked PDE's/PIE's/static executables/shared objects. N.b.2 We also add an extra assert in final_link_relocate. This function deals with GOT entries for symbols both in the internal hash table and not in the hash table. Binutils decides whether symbols should be in the hash table or not based on their binding. WEAK binding symbols are put in the hash table. That said, final_link_relocate has a `weak_undef_p` local flag to describe whether a given symbol is weak undefined or not. This flag is defined for both symbols in the hash table and symbols not in the hash table. I believe that the only time we have weak_undef_p set in final_link_relocate when the relevant symbol is not in the hash table is when we have "removed" a relocation from our work list by modifying it to be a R_AARCH64_NONE relocation against the STN_UNDEF symbol (e.g. during TLS relaxation). Such cases would not fall into the GOT relocation clause. Hence I don't think we can ever see weak_undef_p symbols which are not in the hash table in this clause. It's worth an assertion to catch the possibility that this is wrong.
*	Emit CAPINIT relocations for dynamically linked PDE's	Matthew Malcomson	2022-07-01	1	-23/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now CAPINIT relocations were only emitted for position independent code. For a data relocation against a symbol in some other shared object this was problematic since we don't know the address that said symbol will be at. We ended up emitting a broken RELATIVE relocation. This also happened to be problematic for function pointers, since a CAPINIT relocation did not ensure that a PLT entry was created in this binary. When a PLT entry was not created we again had a broken RELATIVE relocation. We could have fixed the problem with function pointers by ensuring that a CAPINIT relocation caused a PLT entry to be emitted and the RELATIVE relocation hence to point to that PLT entry. Here we choose to always emit a CAPINIT relocation and let the dynamic linker resolve that to a local PLT entry if one exists, but if one does not exist let the dynamic linker resolve it to the actual function in some other shared library. Alongside this change we ensure that we leave 0 as the value in the fragment for a CAPINIT relocation. The dynamic linker already has to decide which symbol to use, and it would have the value of the local symbol available if it chooses to use it. Hence there is no reason for the static linker to leave the value of one option in the fragment of this CAPINIT relocation. This patch also introduces quite a few new testcases. These are to check that we should only add a special PLT entry as the canonical address for pointer equality when a function pointer is accessed via code relocations -- and we ensure this does not happen for accessing data pointers or accesses via CAPINIT data relocations. Outside of the new testcases, we also adjust emit-relocs-morello-3{,-a64c}.d. These testcases checked for a CAPINIT relocation in a shared object. Now we no longer populate that fragment we need to adjust the testcase accordingly.
*	Use htab->c64_rel more, do not use GOT_CAP	Matthew Malcomson	2022-07-01	1	-43/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each symbol that has a reference in the GOT has an associated got_type. For capabilities we currently have a new got_type of GOT_CAP for capability entries in the GOT. We do not allow capability entries in the GOT for an A64 (or hybrid) binary, and only allow capability entries in the GOT for a purecap binary. Hence there is no need to maintain a per-symbol indication of whether the associated GOT entry for this symbol is a capability or not. There is already an existing flag on the hash table to indicate whether the GOT contains capabilities or addresses. We can replace every use of the existing GOT_CAP with a check of this flag. Doing such a transformation means we can not express an invalid state (there is no longer any way to express a GOT which contains some addresses and some capabilities). It also solves a bug where we introduce a PLT to be the canonical address of a function after having seen a R_AARCH64_LDST128_ABS_LO12_NC relocation. The existing manner of deciding whether an entry in the GOT should be a capability or address based on the relocation we generated it from could not work in a binary when we only have this relocation. It should be determined based on the flags of the input object files we saw (i.e. are these purecap object files or not). N.b. this also fixes an observed problem that could have been fixed in the existing regime. In this case the JUMP_SLOT of a PLT entry added to be the canonical address of a function which was addressed directly in code (with both a Morello and AArch64 relocation) had got_type of GOT_UNKNOWN (because it was simply not marked) and hence elfNN_aarch64_create_small_pltn_entry was generating an AArch64 relocation because the GOT entry was not GOT_CAP. This patch also adjusts the "should this GOT contain capabilities" flag to report yes/no based on the EF_AARCH64_CHERI_PURECAP flag of the inputs rather than based on whether we've seen any morello relocations pointing into the GOT. NOTE: We do not remove the existing times where we set this flag based on MORELLO relocations. This is left for a future patch when we look into the handling of hybrid code and the GOT. N.b. this required two changes in the testsuite. morello-capinit.d required updating since the size of the GOT section was previously incorrectly calculated. There is no GOT relocation in this testcase, which meant that the existing method of finding the size of the dummy first GOT entry was incorrect (gave the size of an AArch64 entry). Since the size of the GOT is now different the PCC bounds is now different, and we hence need to update the values checking for the PCC bounds in this testcase. We take this opportunity to make the testcase more robust by using the new record/check testsuite feature. This means the testcase now passes on other targets (i.e. both bare-metal and for none-linux). emit-relocs-morello.d had a minor change for the same reason. Since the alignment requirement of the GOT changed this changed the start position too. When the start position changed objdump decided not to output an extra line of 0000000.
*	Morello do not create RELATIVE relocs for dynamic GOT entries	Matthew Malcomson	2022-04-28	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For dynamic symbol GOT entries, the linker emits relocations for that entry in finish_dynamic_symbol. Since Morello capabilities always need dynamic relocations to initialise GOT entries at runtime, we need to emit relocations for any capability GOT entries. Two examples which are not needed for non-Morello linking are static linking and for global symbols defined and referenced in a PDE. In order to ensure we emit those relocations we catch them in the existing clause of final_link_relocate that looks for GOT entries that require relocations which are not handled by finish_dynamic_symbol. Before this patch, the clause under which those relocations were emitted would include dynamic GOT entries in dynamically linked position dependent executables. These symbols hence had RELATIVE relocations emitted to initialise them in the executables GOT by final_link_relocate, and GLOB_DAT relocations emitted to initialise them by finish_dynamic_symbol. The RELATIVE relocation is incorrect to use, since the static linker does not know the value of this symbol at runtime (i.e. it does not know the location in memory that the the shared library will be loaded). This patch ensures that the clause in final_link_relocate does not catch such dynamic GOT entries by ensuring that we only catch symbols when we would not otherwise call finish_dynamic_symbol. N.b. we also add an assertion in the condition-guarded block, partly to catch similar problems earlier, but mainly to make it clear that `relative_reloc` should not be set when finish_dynamic_symbol will be called. N.b.2 The bfd_link_pic check is a little awkward to understand. Due to the definition of WILL_CALL_FINISH_DYNAMIC_SYMBOL, the only time that `!bfd_link_pic (info) && !WILL_CALL_FINISH_DYNAMIC_SYMBOL` is false and `!WILL_CALL_FINISH_DYNAMIC_SYMBOL (is_dynamic, bfd_link_pic (info), h)` is true is when the below holds: is_dynamic && !h->forced_local && h->dynindx == -1 This clause is looking for local GOT relocations that are not in the dynamic symbol table, in a binary that will have dynamic sections. This situation is the case that this clause was originally added to handle (before the Morello specific code was added). It is the case when we need a RELATIVE relocation because we have a PIC object, but finish_dynamic_symbol would not be called on the symbol. Since all capability GOT entries need relocations to initialise them it would seem unnecessary to include the bfd_link_pic check in our Morello clause. However the existing clause handling these relocations for AArch64 specifically avoids adding a relocation for bfd_link_hash_undefweak symbols. By keeping the `!bfd_link_pic (info)` clause in the Morello part of this condition we ensure such undefweak symbols are still avoided. I do not believe it is possible to trigger the above case that requires this `bfd_link_pic` clause (where we have a GOT relocation against a symbol satisfying): h->dynindx == -1 && !h->forced_local && h->root.type == bfd_link_hash_undefweak && bfd_link_pic (info) && bfd_link_executable (info) I believe this is because when creating an undefweak symbol that has a GOT reference we hit the clause in elfNN_aarch64_allocate_dynrelocs which ensures that such symbols are put in `.dynsym` (and hence have a `h->dynindx != -1`). A useful commit to reference for understanding this is ff07562f1e. Hence there is no testcase for this part. We do add some code that exercises the relevant case (but does not exercise this particular clause) into the morello-dynamic-link-rela-dyn testcase.
*	Predicate fixes around srelcaps and capability GOT relocations	Matthew Malcomson	2022-04-28	1	-15/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch clears up some confusing checks around where to place capability relocations initialising GOT entries. Our handling of capability entries for the GOT had a common mistake in the predicates that we used. Statically linked executables need to have all capability relocations contiguous in order to be able to mark their start and end with __rela_dyn_{start,end} symbols. These symbols are used by the runtime to find dynamic capability relocations that must be performed. They are not needed when dynamically linking as then it is the responsibility of the dynamic loader to perform these relocations. We generally used `bfd_link_executable (info) && !bfd_link_pic (info)` to check for statically linked executables. This predicate includes dynamically linked PDE's. In most cases seen we do not want to include dynamically linked PDE's. This problem manifested in a few different ways. When the srelcaps section was non-empty we would generate the __rela_dyn_{start,end} symbols -- which meant that these would be unnecessarily emitted for dynamically linked PDE's. In one case we erroneously increased the size of this section on seeing non-capability relocations, and since no relocations were actually added we would see a set of uninitialised relocations. Here we inspected all places in the code handling the srelcaps section and identified 5 problems. We add tests for those problems which can be seen (some of the problems are only problems once others have been fixed) and fix them all. Below we describe what was happening for each of the problems in turn: --- Avoid non-capability relocations during srelcaps sizing elfNN_aarch64_allocate_dynrelocs increases the size for relocation sections based on the number of dynamic symbol relocations. When increasing the size of the section in which we store capability relocations (recorded as srelcaps in the link hash table) our conditional erroneously included non-capability relocations. We were hence allocating space in a section like .rela.dyn for relocations populating the GOT with addresses of non-capability symbols in a statically linked executable (for non-Morello compilation). This change widens the original if clause so it should catch CAP relocations that should go in srelgot, and tightens the fallback else if clause in allocate_dynrelocs to only act on capability entries in the GOT, since those are the only ones not already caught which still need relocations to populate. Implementation notes: While not necessary, we also stop the fallback conditional checking !bfd_link_pic and instead put an assertion that we only ever enter the conditions block in the case of !bfd_link_pic && !dynamic. This is done to emphasise that this condition is there to account for all the capability GOT entries for the hash table which need relocations and are not caught by the existing code. The fact that this should only happen when building static executables seems like an emergent property rather than the thing we would want to check against. This is tested with no-morello-syms-static. --- size_dynamic_sections use srelcaps for statically linked executables and srelgot for dynamically linked binaries. When creating a statically linked executable the srelcaps section will always be initialised and that is where we should put all capability relocations. When creating a dynamically linked executable the srelcaps may or may not be initialised (depending on if we saw CAPINIT relocations) and either way we should put GOT relocations into the srelgot section. Though there is no functional change to look for, this code path is exercised with the morello-static-got test and morello-dynamic-link-rela-dyn for statically linked and dynamically linked PDE's respectively. --- Capability GOT relocations go in .rela.got for dynamically linked PDEs final_link_relocate generates GOT relocations for entries in the GOT that are not handled by the generic ELF relocation code. For Morello we require relocations for any entry in the GOT that needs to be a capability. For static linking we keep track of a section specifically for capability relocations. This is done in order to be able to emit __rela_dyn_{start,end} symbols at the start and end of an array of these relocations (see commit 40bbb79e5a3 for when this was introduced and commit 8d4edc5f8 for when we ensured that MORELLO_RELATIVE relocations into the GOT were included in this section). The clause in final_link_relocate that decides whether we should put MORELLO_RELATIVE relocations for initialising capability GOT entries into this special section currently includes dynamically linked PDE's. This is unnecessary, since for dynamically linked binaries we do not want to emit such __rela_dyn_{start,end} symbols. While this behaviuor is in general harmless (especially since both input sections srelcaps and srelgot have the same output section in the default linker scripts), this commit changes it for clarity of the code. We now only put these relocations initialising GOT entries into the srelcaps section if we require it for some reason. The only time we do require this is when statically linking binaries and we need the __rela_dyn_* symbols. Otherwise we put these entries into the `srelgot` section which exists for holding GOT entries together. Since this diff is not about a functional change we do not include a testcase. However we do ensure that the testcase morello-dynamic-link-rela-dyn is written so as to exercise the codepath which has changed. --- Only ensure that srelcaps is initialised when required In commit 8d4edc5f8 we started to ensure that capability relocations for initialising GOT entries were stored next to dynamic RELATIVE relocations arising from CAPINIT static relocations. This was done in order to ensure that all relocations creating a capability were stored next to each other, allowing us to mark the range of capability relocations with __rela_dyn_{start,end} symbols. We only need to do this for statically linked executables, for dynamically linked executables the __rela_dyn_{start,end} symbols are unnecessary. When doing this, and there were no CAPINIT relocations that initialised the srelcaps section, we set that srelcaps section to the same section as srelgot. Despite what the comment above this clause claimed we mistakenly did this action when dynamically linking a PDE (i.e. we did not just do this for static non-PIE binaries). With recent changes that ensure we do not put anything in this srelcaps section when not statically linking this makes no difference, but changing the clause to correctly check for static linking is a nice cleanup to have. Since there is no observable change expected this diff has no testcase, but the code path is exercised with morello-dynamic-got. --- Only emit __rela_dyn_* symbols for statically linked exes The intention of the code to emit these symbols in size_dynamic_sections was only to emit symbols for statically linked executables. We recently noticed that the condition that has been used for this also included dynamically linked PDE's. Here we adjust the condition so that we only emit these symbols for statically linked executables. This allows initailisation code in glibc to be written much simpler, since it does not need to determine whether the relocations have been handled by the dynamic loader or not -- if the __rela_dyn_* symbols exist then this is definitely a statically linked executable and the relocations have not been handled by the dynamic loader. This is tested with morello-dynamic-link-rela-dyn.
*	Account for LSB on DT_INIT/DT_FINI entries	Matthew Malcomson	2022-04-28	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When DT_INIT and/or DT_FINI point to C64 functions they should have their LSB set. I.e. these entries should contain the address of the relevant functions and not a slight variation on them. This is already done by Morello clang, and we want GNU ld updated to match. Here we account for these LSB's for Morello in the same way as the Arm backend accounts for the Thumb LSB. This is done in the finish_dynamic_sections hook by checking the two dynamic section entries, looking up the relevant functions, and adding that LSB onto the entry value. In our testcase we simply check that the INIT and FINI section entries have the same address as the _init and _fini symbols.
*	Handle locally-resolving entries in the GOT	Matthew Malcomson	2022-04-28	1	-18/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In standard AArch64 linking by the BFD linker, dynamic symbols in PIC code have their dynamic relocations created by elfNN_aarch64_finish_dynamic_symbol. Any required information in the relevant fragment is added by elfNN_aarch64_final_link_relocate. Non-dynamic symbols that are supposed to go in the GOT have their RELATIVE relocations created in elfNN_aarch64_final_link_relocate next to the place where the fragment is populated. The code in elfNN_aarch64_finish_dynamic_symbol was not updated when we ensured that RELATIVE relocations against function symbols were generated with the PCC base stored in their fragment and an addend defined to make up the difference so that the relocation pointed at the relevant function. On top of this, elfNN_aarch64_final_link_relocate was never written to include the size and permission information in the GOT fragment for RELATIVE relocations that will be generated by elfNN_aarch64_finish_dynamic_symbol. This patch resolves both issues by adding code to elfNN_aarch64_final_link_relocate to handle setting up the fragment of a RELATIVE relocation that elfNN_aarch64_finish_dynamic_symbol will create, and adding code in elfNN_aarch64_finish_dynamic_symbol to use the correct addend for the RELATIVE relocation that it generates. Implementation choices: The check in elfNN_aarch64_final_link_relocate for "cases where we would generate a RELATIVE relocation through elfNN_aarch64_finish_dynamic_symbol" is believed to handle undefined weak symbols by checking SYMBOL_REFERENCES_LOCAL on the belief that the latter would not return true if on undefined weak symbols. This is not as clearly correct as the rest of the condition, so seems reasonable to bring to the attention of anyone interested. We add an assertion that this is the case so we get alerted if it is not, we could choose to include !UNDEFWEAK_NO_DYNAMIC_RELOC in the condition instead, but believe that would lead to confusion in the code (i.e. why check something that will always be false). Similarly, when we check against SYMBOL_REFERENCES_LOCAL to decide whether to populate the fragment for this relocation this does not directly correspond to `h->dynindx == -1` (which would indicate that this symbol is not in the dynamic symbol table). This means that our clause catches symbols which would appear in the dynamic symbol table as long as SYMBOL_REFERENCES_LOCAL returns true. The only case in which we know this can happen is for PROTECTED visibility data when GNU_PROPERTY_NO_COPY_ON_PROTECTED is set. When this happens a RELATIVE relocation is generated (since this is an object we know will resove to the current binary) and the static linker provides the permissions and size of the associated object in the relevant fragment. This behaviour matches all other RELATIVE relocations and allows the dynamic loader to assume that all RELATIVE relocations should have their associated permissions and size provided. We mention this behaviour since the symbol for this object will appear in the dynamic symbol table and hence the dynamic loader could determine the size and permissions itself. In our condition to decide whether to update this relocation we include a check that we `WILL_CALL_FINISH_DYNAMIC_SYMBOL`. This is not necessary, since the combination of conditions implies it, however it makes things much clearer as to what we're checking for. Testsuite notes: When testing our change here we check: 1) The addend and base of the RELATIVE relocation gives the required address of the hidden function. 2) The bounds of the RELATIVE relocation is non-zero. 3) The permissions of the RELATIVE relocation are executable. Lacking in this particular test is a check that the PCC bounds are calculated correctly, and that the base we define is the base of the PCC. We rely on existing tests to check our calculation of the PCC bounds.
*	Add a size to __ehdr_start	Matthew Malcomson	2022-03-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This symbol is defined in a binary when there is a segment which contains both the file header and the program header. The symbol points at the file header. The point of this symbol is to allow the program to robustly examine its own output. Glibc uses this symbol. This symbol is currently not marked as a linker or linker script defined symbol, and hence does not get its bounds adjusted. The symbol is given zero size, and consequently any capability initialised as a relocation to this symbol is given zero bounds. In order to allow access to read the headers this symbol points at this patch adds a size to the symbol. We do not believe that the size of this symbol is used for anything other than CHERI bounds, so we believe that this is a safe change to make. Setting the size of the symbol means that c64_fixup_frag uses that size as the bounds to apply to a capability relocation pointing at that symbol. This allows access to the file and program headers loaded into memory. An alternative approach would be to not set the size of the symbol, but only change the bounds of the relocation generated. This would be done by checking for the `__ehdr_start' name in c64_fixup_frag and setting the size according to the `sizeof_ehdr' and `elf_program_header_size' values stored on the output BFD object. We chose the approach to set the size on the symbol for code-aesthetic reasons under the belief that having this size on the symbol in the final binary is a slight benefit in readability for a user and causes no downside. I do not believe that Morello lld sets the bounds of a capability to this symbol correctly. That issue has been raised separately.
*	Treat `start_stop` symbols as having section size	Matthew Malcomson	2022-03-07	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is special handling to ensure that symbols which look like they are supposed to point at the start of a section are given a size to span that entire section. GNU ld has special `start_stop` symbols which are automatically provided by the linker for sections where the output section and input section share a name and that name is representable as a C identifier. (see commit cbd0eecf2) These special symbols represent the start and end address of the output section. These special symbols are used in much the same way in source code as section-start symbols provided by the linker script. Glibc uses these for the __libc_atexit section containing pointers for functions to run at exit. This change accounts for these `start_stop` symbols by giving them the size of the "remaining" range of the output section in the same way as linker script defined symbols. This means that the `start` symbols get section-spanning bounds and the `stop` symbols get bounds of zero. N.b. We will have to also account for these symbols in the `resize_sections` function, but that's not done yet.
*	Account for LSB in more relocations	Matthew Malcomson	2022-03-07	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LSB on STT_FUNC symbols was missed in a few different places. 1) Absolute relocations coming from .xword, .word, and .hword directives and the lowest bit MOVW relocations did not account for the LSB at all. 2) Relocations for the ADR instruction only added the LSB on local symbols. Here we account for these by adding the LSB in each clause in elfNN_aarch64_final_link_relocate. The change under the BFD_RELOC_AARCH64_NN clause handles absolute 64 bit relocations, the change for BFD_RELOC_AARCH64_ADR_LO21_PCREL handles the relocation on ADR instructions, and the extra relocations checked against in the clause including BFD_RELOC_AARCH64_ADD_LO12 ore the remaining items. N.b. we noticed the MOVW relocation problem because glibc's start.S was using these direct MOV relocations to access the value of `main`. Since `main` is a function we need to include the LSB in the resulting relocation value. These relocations did not include the LSB from STT_FUNC symbols. Others were found from inspection of each relocation in turn.
*	Assign correct size on Morello TLS relocations	Matthew Malcomson	2022-03-07	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous code was not actually using the size of a symbol when the symbol was in the hash table. This meant that our TLS relaxations created an instruction sequence with bounds of zero so that the GCC TLS instruction sequence eventually ended up giving a length-zero capability. Also handle extra size of pointers in TCB for c64. For purecap we have 16 byte pointers. Hence the TCB is 32 bytes. This was not yet handled in our relaxations. Here we determine whether to use a 32 or 16 byte TCB based on the flags of the current BFD (i.e. whether this is a purecap binary that we're creating). Testcases are updated to account for the fact that the length of the capability to the symbol itself is now sometimes non-zero and for the different offset required into the TLS block for modules loaded at startup time.
*	Adjust which sections we resize for precise bounds	Matthew Malcomson	2022-02-21	1	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change we would ensure the ability for precise bounds on any section which had a linker defined symbol pointing in it unless that linker defined symbol looked like a section starting symbol. In that case we would adjust the next section if the current section had no padding between this one and the next. I believe this was a mistake. The testcase we add here is a simple case of having a `__data_relro_start` symbol in the .data.rel.ro section, and that would not ensure that the .data.rel.ro section were precisely padded. The change we make here is to perform padding for precise bounds on all sections with linker defined symbols in them and the next section if there is no padding between this and the next section. This is a huge overfit and we do it for the reason described in the existing comment (that we have no information on the offset of this symbol within the output section). In the future we may want to remove this padding for linker script defined symbols which do not look like section start symbols. We would do this in conjunction with changing the bounds we put on such linker script defined symbols. This would be for another patch.
*	Rework the resize_sections function	Matthew Malcomson	2022-02-21	1	-67/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now instead of iterating through relocations recording changes to make and sorting those changes according to the section VMA before going on to make the changes in section VMA order, we instead modify the sections as we iterate through relocations. This change can be done now that the alignment is always done, even if the VMA of the start and end was good for precise bounds and now that padding is added via an expression rather than setting the point to a specific location. Having these two things means that changing the layout of earlier sections does not affect the precise bounds property of later sections. It also makes things easier that we keep the padding of a section inside that section, so can tell whether a section has the correct size just from the size of the section rather recording in an out-of-bounds manner which sections have had their padding assigned. This patch makes no functional change, but simply changes the code to be more readable.
*	Pad and align sections in more cases	Matthew Malcomson	2022-02-21	1	-48/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change individual sections would not be padded or aligned if there was no C64 code or if there were no capability GOT relocations in the binary. This meant that if we had a data-only PURECAP shared library with a CAPINIT relocation in it pointing at something, then the section that relocation pointed into would not be padded accordingly. This patch changes this so that we look for sections which may need individual padding if we see the PURECAP elf header flag, and if there is a `srelcaps` section (i.e. the RELATIVE capability relocations). We keep the behaviour that we do not adjust the size of sections unless there is a static relocation pointing at a zero-sized symbol at that section. That is, we do not make any adjustment to try and handle section padding in the case where other binaries would dynamically link against such symbols. We do this since the "symbol pointing to section start implies spanning entire section" decision is a hack to enable some linker script uses, and we don't want to extend it without a known motivating example. Finally, this patch ignores padding PCC bounds on PURECAP binaries if there is no C64 code in this binary, but ensures that the PCC bounds are made precise even if there are no static relocations in the file. We could still get the current PCC and offset it using `adr` if there are no static relocations, but without there being any C64 code there will be no PCC to bound and hence we don't need the bounds on a hypothetical PCC to be precise.
*	Always ensure that the PCC bounds are precise for Morello	Matthew Malcomson	2022-02-21	1	-29/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mechanism by which we were ensuring the PCC bounds were made precise for Morello happened to only work if there were some sections which needed to be made precisely representable for Morello individually. This was because we included the PCC bounds calculation in the `queue` iteration that only iterated over adjustments to individual sections. Here we move the PCC bounds calculation to after the `queue` iteration so that we always perform this operation. We suspect the original implementation was chosen to ensure that padding was added in sequential section ordering. This ordering seems to have been in order to ensure that padding at one position would not adjust sections that had already been adjusted (because padding one section changes the location of the sections after it). We have already found in previous patches that this approach was not sufficient to ensure an adjustment being permanent. The alignment change to the first section that the PCC should span can change the location of all sections after it, or the linker can simply have extra space before .text that it removes on a call to layout_sections_again. In the patches to fix those problems, we have adjusted the code here to represent the padding in a way that stays stable across changes. That has meant that the iteration in VMA order is no longer necessary, and that means that our movement of the PCC bounds calculation to outside of the `queue` iteration loop can be performed. One interesting part of this adjustment is that given a set of sections, the length of memory that they span can change if the first sections alignment is adjusted. For example, if we have the below: sectionA VMA 0xf size 0x1 alignment 0x1 sectionB VMA 0x10 size 0x10 alignment 0x10 Then aligning sectionA to 0x10 gives the below: sectionA VMA 0x10 size 0x1 alignment 0x10 sectionB VMA 0x20 size 0x10 alignment 0x10 The total range of the first case is [0xf -> 0x20] for a size of 0x11 and of the second case it is [0x10 -> 0x30] for a size of 0x20. This means that we should handle the alignment adjustment for the PCC bounds first, and must handle it in a loop to ensure that we handle the case that this change in length requires an extra alignment. Only then do we know the size that we want to add into the last section in the range so that the entire bounds are correct.
*	elfNN_c64_resize_section always sets alignment	Matthew Malcomson	2022-02-21	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch we would only change the alignment of a section if it did not have a start and end address that was aligned properly. This meant that there was nothing stopping the alignment of this section degrading in the future. On first glance this looks like it would not be a problem since this function only adjusts sections in order of increasing VMA (hence it would seem that the alignment of the current section can not be reduced). However, in some cases layout_sections_again can be seen to reduce the alignment of sections if there was some initial space before the .text section that it shrinks for some reason. This led to a degredation of the alignment of all sections after that point (until another highly aligned section). The testcase added for this change (in the final "testsuite" commit of this patch series) is a good example of this, on first entry to the elfNN_c64_resize_sections function .text happened to have a start address of 0xb0 (which meant that .data.rel.ro was also aligned to such a boundary and the function did not believe there was a need to align .data.rel.ro to a 16 byte boundary). However after the first call to layout_sections_again this changed to 0x78, reducing the alignment of .data.rel.ro in the process.
*	Add padding with an expression rather than a hard-address	Matthew Malcomson	2022-02-21	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When adding padding to ensure section bounds do not overlap we were implementing the padding using `lang_add_newdot`. This interacts with what is essentially an in-memory linker script that the linker will use at the very end to emit its sections according to those rules that have been built up. `lang_add_newdot` is essentially the same as defining the position to be the given address in a linker script. This means that all sections after this point in the linker script will be at an address starting from this known address. I.e. the method by which we add padding is essentially changing the description of how we will lay a binary out from: <sections before the padded one> <section to be padded> <sections after the padded one> to the description: <sections before the padded one> <section to be padded> <current position must be 0x[number calculated now]> <sections after the padded one> This works fine in most cases. The address we calculate is a known-good value and sections after this "point" are moved to after the known-good value. However, the fact that we choose a specific value when we call `c64_pad_section` means that adjusting sections which occur before the current point will not change anything that occurs after it. I.e. a description of <sections before the padded one> <section to be padded> <current position must be 0x[number calculated now]> <sections after the padded one> being changed to a description of <sections before section X> <New padding> <sections before padded one> <section to be padded> <current position must be 0x[number calculated now]> <sections after the padded one> leaves the `<sections after the padded one>` with the same address. This can lead to the padded section and the section after it overlapping. This rarely happens, because our padding always happens after a section and we iterate over sections in memory order. However, when we align the very start of the PCC range in order to produce precise bounds across this range that can change the start position of the first section that should be spanned by the PCC range. Since it can change the start position we can hit the problem described above. This happens when attempting to build glibc. It causes an error message like the one below. section .got LMA [000000000053c0c0,000000000053cfff] overlaps section .data.rel.ro LMA [0000000000525fe0,000000000053d08f] This patch solves this problem by adding an entry into this in-memory linker script that describes padding without specifying a given address. I.e. the outline of the script we produce becomes <sections before the padded one> <section to be padded> <current position goes from P to P+0x[padding calculated now]> <sections after the padded one> This is safe w.r.t. adjustments occuring before the padding we have inserted, and it avoids the warning we noticed when trying to build glibc. We also fix up some other bugs in this area around double-padding sections. First, the calculation of the padding required was based on the output section VMA and size. The calculation was done by taking the current start and end VMA then finding the resulting start and end VMA that we want using c64_valid_cap_range. Then we calculated the padding we wanted by finding the difference between the current and requested end VMA's. This ignored the fact that the output section was also getting aligned, which would change the start VMA -- hence the resulting end VMA would not end up where we wanted. Here we do the calculation of how much padding to add based on the size we want rather than based on the ending VMA we want. Second, the reported size of the output section was not changing after adding our padding. This meant that the second time around this loop (if for example a relocation into a given section was used in more than one place and hence this section was enqueued twice) we would again find that the section size was not padded and try again. We fix this by introducing the padding statement to the output section statement children rather than to the main statement list. This means that the padding will be accounted for in the output section size and hence the loop will avoid padding this section again. Just to note: LLD does not report the sizes of sections including their padding. This is so that programs which read binary information (such as readelf and objdump) do not need to read the padded zeros in the file. We choose to include this padding in the section size information on the premise that it is usually quite small and that the output from these programs is then more readable. The bug that we fixed by including this padding in the size of the output section could be fixed in another way.
*	PCC bounds now span READONLY and RELRO sections	Matthew Malcomson	2022-02-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this they would span sections which are SEC_CODE or some specific known sections like the GOT and PLT. This is not enough, since the compiler can want to access .rodata via relative offsets to PCC. Hence we need to include READONLY sections. Similarly, we want to include .data.rel.ro sections in the PCC bounds so that they can be accessed via PCC -- this allows the capability indirection table to be accessed. We have not been noticing this until now because the default linker script happens to order sections such that the PCC being required to span .got and .text happens to end up including these problematic sections. RELRO sections are a bit interesting since the fact they are RELRO is not recorded anywhere on the section itself. Rather it is stored in the fact that the section is covered by the RELRO segment. This means that we need to check if the sections VMA is within the relevant range rather than just look at the section. This turns out to be pretty easy since we have a structure containing the RELRO range, however we do need to ensure that we don't mix up the uses of the section VMA and the RELRO start and end around calls of layout_sections_again since this call can change both.
*	Return the alignment required from c64_valid_cap_range	Matthew Malcomson	2022-02-10	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were specifying section alignment requirements based on the alignment that the section base happened to have. This sometimes resulted in very strange alignment requests that were much greater than actually required. That is not usually a problem, but it does give unnecessary padding upon re-adjustments due to changing the PCC bounds after individual sections have been padded. This patch adds an interface such that we return the alignment actually required for exact capability bounds from c64_valid_cap_range. We then use that alignment as our alignment requirement on the sections which have a section-sized symbol associated with them.
*	Provide default permissions if section has no permission flags	Matthew Malcomson	2022-02-09	1	-13/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The permissions that a capability to an object should end up with is based on the section it should point into. With symbols that point into SHN_ABS sections we have nothing to base the permissions on (since these sections don't have associated permission flags). For the moment we are making a default of choosing Read-Write permissions and warning the user about it. The permissions match what Morello LLD currently does (from observation). When Morello linkers use the symbol type to determine whether a capability should have executable permissions or not, this should end up being able to handle all uses (since STT_FUNC would get RX perms while everything else gets RW perms). In the only case we know of in the GNU team the symbol ends up with zero-size anyway, so the choice of Read-Write doesn't seem too lax. (Having zero-size is fine for the use-case we know of in glibc, since that use case simply checks if the address of the symbol is non-zero. Hence we have no need as yet to dereference the symbol). The use case we know about are the `_nl_current_<LANG>_used` symbols defined with `_NL_CURRENT_DEFINE` in the locale/lc-<lang>.c files in statically linked glibc. If any case that requires non-zero size or different permissions becomes important then something more will be required across the toolchain.
*	Error linking binaries with differing e_flags.	Matthew Malcomson	2022-02-09	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit is partly changing two existing (believed buggy) behaviours in elfNN_aarch64_merge_private_bfd_data and partly accounting for a capability-specific requirement. The existing behaviours in elfNN_aarch64_merge_private_bfd_data were: 1) It returned `TRUE` by default. This effectively ignored the ELF flags on the binaries, despite there being code looking at them. 2) We do not mark the output BFD as initialised until we see flags with non-default architecture and flags. This can't tell the difference between linking default objects to non-default objects if the default objects are given first on the command line. The capability-specific requirement is: - This function originally returned early if the object file getting merged into the existing output object file is not dynamic and has no code sections. The code reasoned that differing ELF flags did not matter in this case since there was no code that would be expecting it. For capabilities the binary compatibility is still important. Data sections now contain capabilities as pointers, got sections now have a different got element size. Hence we avoid this short-circuit if any of the flags we're checking are the CHERI_PURECAP flag.
*	Only warn on badly sized symbols	Matthew Malcomson	2022-02-07	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reasoning behind only warning for symbols which have a size which cannot be precisely bounded is that there is nothing requiring precise bounds, GCC knowingly avoids changing the size of some symbols for precise bounds (TLS and symbols with user-specified alignment and user-specified section), and LLD only warns on imprecise bounds rather than erroring. N.b. the reasoning for GCC avoiding padding in these cases is explained in the commit message of b302420cb55 in the GCC branch vendors/ARM/heads/morello. All in all it's not something that we want in our toolchain as a requirement, and it's not something that other toolchains have as a requirement, so there doesn't seem to be much of a reason to include it. In order to make this warning a little nicer for anyone reading it, we add the name of the symbol to the warning. Update the testsuite to account for this. Co-Author: Alex Coplan <alex.coplan@arm.com>
*	Fixing cap_meta	Matthew Malcomson	2022-02-07	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It had two problems: 1) The linker was storing permission flags in the bottom byte and the size in the top 56 bits. Newlib was looking for the permission flags in the top byte and the length in the bottom 56 bits of a uint64_t stored as bytes 8:16 of the fragment. N.b. The ABI requires a given storage order between the size and permission flags (as opposed to requiring a given uint64_t value be stored in the relevant position). This means that our current implementation would not work for a hypothetical big-endian Morello. 2) The linker prioritised SEC_READONLY flags over SEC_CODE ones on the section, this meant that function symbols into the .text section (which has both flags on it) would be given read-only permissions rather than executable permissions. This patch also must update all tests to account for this change.
*	ld: Adjust bounds, base, and size for various symbols	Alex Coplan	2022-02-03	2	-73/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch has two main goals: - Relax an existing diagnostic to permit the linker to accept capability relocations against symbols without size information. - Adjust the capability base and bounds for symbols which point into sections which may be accessed via the PCC. The Morello ABI accesses global data using ADR and ADRP, and has no special indirection to jump to other functions. Given this, the PCC must maintain its bounds and base so that during execution loading global data and jumping to other functions can be done without worrying about the current PCC permissions and bounds. To implement this, all capabilities that could be loaded into the PCC (via BLR or similar) must have a bounds and base according to the PCC. This must span all global data and text sections (i.e. .got, .text, .got.plt and the like). There is already code finding the range that the PCC should span, this patch records the information in a variable that we can query later. There are two places where we create a relocation requesting a capability to be initialised at runtime. When handling relocations which request a capability from the GOT, and when handling a CAPINIT relocation. This patch adjusts both. We can't tell from inspection which symbols would be loaded into the PCC, but we know that those symbols must point into a section which is executable. For now, we do this operation for all symbols which point into an executable section. Most RELATIVE relocations don't use the addend. Rather the VA and size we want are put in the relative fragment and the addend is zero. This is because the base of the capability usually matches the VA we want that capability initialised to. In these possibly-code symbols we want the base of the capability bounds to be the base of the PCC, and the VA to be something very different. Hence we make use of the addend in the RELA relocations to encode this offset. Note on implementation: c64_fixup_frag takes the base and size of a capability we want to request from the runtime and checks that these are exactly representable in a capability. This patch changes many of the capabilities we request from the runtime to have the same bounds (those of the PCC). We leave the check to look at the bounds requested by the symbol rather than to check the PCC bounds multiple times. That means that if a symbol that points into an executable section has incorrect bounds then this will trigger a linker error even though it will cause no security problem when this executes. This is a trade-off between getting extra checks that the compiler is handling object bounds sizes and erroring on non-problematic code. We have a compatibility hack that if a symbol is defined in the linker script to be directly after a given section but is named something like __._start or __start_. then we treat it as if it is defined at the very start of the next section. The new behaviour introduced in this patch needs to take account of the above compatibility hack. This patch also updates the testsuite according to these changes. In some places the original test no longer checks what it wanted, since the base of all symbols pointing into executable sections are now the same. There we add extra symbols and things to check so we ensure that this behaviour of PCC bounds is seen and that the original behaviour is still seen on non-executable sections. This commit also includes a few tidy-ups: We adjust the base and limit that are checked in c64_fixup_frag. Originally this would calculate the base as value + addend. As discussed above the way we treat capabilities in Morello is such that the value determines the base and the addend determines the initial value pointing from that base. Hence the check that these capabilities had correct bounds was not correct. We add an extra assertion in final_link_relocate for robustness purposes. There is an existing bug in the assembler where GOT relocations against local symbols can be turned into relocations against the relevant section symbol plus an addend. This is problematic for multiple reasons, one being that the linker implementation does not have any way to associate different GOT entries with the same symbol but multiple offsets. In fact the linker ignores any offset. Here we simply add an assertion that this never happens. It turns a silent pre-existing error into a noisy one. 2022-02-03 Alex Coplan <alex.coplan@arm.com> Matthew Malcomson <matthew.malcomson@arm.com> bfd/ChangeLog: * elfnn-aarch64.c (pcc_low): New. (pcc_high): New. (elfNN_c64_resize_sections): Update new global variables pcc_{low,high} instead of local variables to track PCC span. (enum c64_section_perm_type): New. (c64_symbol_section_adjustment): New. (c64_fixup_frag): Rework to calculate size appropriately for symbols that need adjustment. (c64_symbol_adjust): New. Use it ... (elfNN_aarch64_final_link_relocate): ... here. ld/ChangeLog: * testsuite/ld-aarch64/aarch64-elf.exp: Add new tests. * testsuite/ld-aarch64/emit-relocs-morello-6.d: New test. * testsuite/ld-aarch64/emit-relocs-morello-6.s: Assembly. * testsuite/ld-aarch64/emit-relocs-morello-6b.d: New test. * testsuite/ld-aarch64/emit-relocs-morello-7.d: New test. * testsuite/ld-aarch64/emit-relocs-morello-7.ld: Linker script thereof. * testsuite/ld-aarch64/emit-relocs-morello-7.s: Assembly. * testsuite/ld-aarch64/morello-capinit.d: New test. * testsuite/ld-aarch64/morello-capinit.ld: Linker script. * testsuite/ld-aarch64/morello-capinit.s: Assembly. * testsuite/ld-aarch64/morello-sizeless-global-syms.d: New test. * testsuite/ld-aarch64/morello-sizeless-global-syms.s: Assembly. * testsuite/ld-aarch64/morello-sizeless-got-syms.d: New test. * testsuite/ld-aarch64/morello-sizeless-got-syms.s: Assembly. * testsuite/ld-aarch64/morello-sizeless-local-syms.d: New test. * testsuite/ld-aarch64/morello-sizeless-local-syms.s: Assembly. Co-authored-by: Matthew Malcomson <matthew.malcomson@arm.com>
*	Bugfixes in MORELLO GOT relocations	Matthew Malcomson	2022-01-18	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to link code against newlib with the current BFD Morello linker we get quite a lot of cases of the error below. "relocation truncated to fit: R_MORELLO_LD128_GOT_LO12_NC against symbol `<whatever>' defined in .text.<whatever> section in <filename>" This happens because the relocation gets transformed into a relocation pointing into the GOT in elfNN_aarch64_final_link_relocate, but the h->target_internal flag that indicates whether this is a C64 function symbol or not is then added to the end value rather than the value that is stored in the GOT. This then correctly falls foul of a check in _bfd_aarch64_elf_put_addend that ensures the value we get from this relocation is 8-byte aligned since it must be pointing to the start of a valid entry in the GOT. Here we ensure that this LSB is set on the value newly added into the GOT rather than on the offset pointing into the GOT. This both means that loading function symbols from the GOT will have the LSB correctly set (hence we stay in C64 mode when branching to this function as we should) and it means that the error about a misaligned GOT address is fixed. In this patch we also ensure that we add a dynamic relocation to initialise the correct GOT entry when we are resolving a MORELLO relocation that requires an entry in the GOT. This was already handled in the case of a global symbol, but had not been handled in the case of a local symbol. This is why we set `relative_reloc` to TRUE in if resolving a MORELLO GOT relocation against a static executable. In writing the testcase for this patch we found an existing bug to do with static relocations of this kind (of this kind meaning that are handled in this case statement). The assembler often chooses to create the relocation against the section symbol rather than the original symbol, and make up for that by giving the relocation an addend. The linker does not have any mechanism to create "symbol plus addend" entries in the GOT -- it indexes into the GOT based on the symbol only. Hence all relocations which are a section symbol plus addend end up pointing at one value in the GOT just containing the value of the symbol. We do not fix this existing bug, but just note it given that this is in the same area.
*	Switch __cap_dynrelocs* to __rela_dyn* symbols	Matthew Malcomson	2022-01-18	1	-19/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The name has been changed in LLVM, so we adjust it in binutils to match. We also move where these symbols are created. Previously they were created in elfNN_aarch64_always_size_sections, but we move this to elfNN_aarch64_size_dynamic_sections. We do the moving since these symbols are supposed to span all dynamic capability relocations stored in the .rela.dyn section for static executables. In the case of a static binary we place relocations for the GOT into this section as well as internal relocations. These relocations for the GOT are handled in elfNN_aarch64_size_dynamic_sections, which is called after elfNN_aarch64_always_size_sections. The size of this section is only fully known after those GOT relocations are managed, so the position these symbols should be placed in is only known at that point. Hence we only initialise the __rela_dyn* symbols at that point. 2021-10-06 Matthew Malcomson <matthew.malcomson@arm.com> ChangeLog: * bfd/elfnn-aarch64.c (elfNN_aarch64_always_size_sections): Move initialisation of __rela_dyn* symbols ... (elfNN_aarch64_size_dynamic_sections): ... to here. * ld/testsuite/ld-aarch64/aarch64-elf.exp: Run new tests. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-1.d: New test. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-1.s: New test. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-2.d: New test. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-2.s: New test. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-3.d: New test. * ld/testsuite/ld-aarch64/emit-morello-reloc-markers-3.s: New test.
*	ld: Ignore TLS relocs against weak undef symbols	Alex Coplan	2022-01-17	2	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The behaviour of weak undef thread-local variables is not well defined. TLS relocations against weak undef symbols are not handled properly by the linker, and in some cases cause the linker to crash (notably when linking glibc for purecap Morello). This patch simply ignores these and emits a warning to that effect. This is a compromise to enable progress for Morello. bfd/ChangeLog: 2022-01-17 Alex Coplan <alex.coplan@arm.com> * elfnn-aarch64.c (elfNN_aarch64_relocate_section): Skip over TLS relocations against weak undef symbols. (elfNN_aarch64_check_relocs): Likewise, but also warn. ld/ChangeLog: 2022-01-17 Alex Coplan <alex.coplan@arm.com> * testsuite/ld-aarch64/aarch64-elf.exp: Add morello-weak-tls test. * testsuite/ld-aarch64/morello-weak-tls.d: New test. * testsuite/ld-aarch64/morello-weak-tls.s: New test. * testsuite/ld-aarch64/weak-tls.d: Update test wrt new behaviour.
*	morello-binutils: Adjust c64_valid_cap_range calculation	Matthew Malcomson	2021-09-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function had a buggy implementation of rounding a value up to a given power of 2. Aligning to a multiple of 16 would align to a multiple of 32 and so on. This was observable when linking object files that had very large objects in them. The compiler would ensure that these objects are large enough that they are exactly representable, but the linker would complain that they are not because the linker asserted extra alignment than the compiler. Here we fix the bug, add a few testcases, and adjust an existing testcase in the area.
*	Apply changes to allow compiling with -ansi	Matthew Malcomson	2021-08-04	1	-1/+2
\| \| \| \| \| \| \| \| \|	This is just to help anyone trying to build Morello binutils with a very old system compiler. At the time we branched, binutils wanted to be able to be build using C89. These changes are what is needed to compile using the `-ansi` flag (i.e. using that C89 flag).
*	Core file support (C registers + capability tags)	Luis Machado	2021-05-24	3	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable core file dumping through the gcore command and enable reading of kernel-generated core files for Morello. This patch enables writing and reading Morello core file containing dumps of the C register set and dumps of the capability tags from memory. The C register dumps are stored in a NT_ARM_MORELLO note, while the capability tag dumps are stored into multiple NT_MEMTAG notes. The NT_MEMTAG notes have the following format: NT_MEMTAG: <header> <tag data> The header has the following format: /* Header for NT_MEMTAG notes. / struct __attribute__ ((packed)) tag_dump_header { uint16_t format; uint64_t start_vma; uint64_t end_vma; union { struct tag_dump_fmt { uint16_t granule_byte_size; uint16_t tag_bit_size; uint16_t __unused; } cheri; } u; // Other formats may be added here later. }; There is a speed limitation while saving capability tags. That's because GDB only has access to one capability per-ptrace call. In the future there may be a ptrace request to read capability tags in bulk, which will make things much faster. Tested by writing a gcore-based core file and reading it back, and also exercised by reading a kernel-generated core file. bfd/ChangeLog: 2021-05-24 Luis Machado <luis.machado@arm.com> elf-bfd.h (elfcore_write_aarch_morello): New prototype. * elf.c (elfcore_grok_aarch_morello): New function. (elfcore_grok_note): Handle NT_ARM_MORELLO. (elfcore_write_aarch_morello): New function. (elfcore_write_register_note): Handle reg-aarch-morello. (elfcore_make_memtag_note_section): New function. (elfcore_grok_note): Handle NT_MEMTAG note types. binutils/ChangeLog: 2021-05-24 Luis Machado <luis.machado@linaro.org> * readelf.c (get_note_type): Handle NT_MEMTAG note types. include/ChangeLog: 2021-05-24 Luis Machado <luis.machado@linaro.org> * elf/common.h (NT_MEMTAG): New constant. (ELF_CORE_TAG_CHERI): New constant. gdb/ChangeLog: 2021-05-24 Luis Machado <luis.machado@arm.com> * aarch64-linux-tdep.c (aarch64_linux_cregmap): Update to match Morello's register layout in the core file. (aarch64_linux_iterate_over_regset_sections): Update to handle Morello's register set. (aarch64_linux_init_abi): Likewise. Register core file hooks. (aarch64_linux_decode_memtag_note) (aarch64_linux_create_memtag_notes_from_range) (morello_get_tag_granules): New functions. (MAX_TAGS_TO_TRANSFER): New constant. * arch/aarch64-cap-linux.h (MORELLO_TAG_GRANULE_SIZE) (MORELLO_TAG_BIT_SIZE): New constants. (tag_dump_header): New struct. * corelow.c (core_target <read_capability>: New method overrides. (core_target::read_capability): New methods. * gdbarch.sh (create_memtag_notes_from_range) (decode_memtag_note): New hooks. * gdbarch.c: Regenerate. * gdbarch.h: Regenerate. * linux-tdep.c (linux_make_memtag_corefile_notes): New function. (linux_make_corefile_notes): Call linux_make_memtag_corefile_notes. (linux_address_in_memtag_page): Removed.
*	[Morello] TLS Descriptor support	Siddhesh Poyarekar	2020-10-20	6	-30/+325
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds basic support for TLS descriptors. Relaxation of TLSDESC_GD to other relocations is limited to TLS_LE, other cases end up retaining TLSDESC_GD. There is one key difference from A64 for TLSDESC_GD -> LE transition and that is in the case of static non-pie binaries. Morello TLSDESC_GD relocations are relaxed to LE for static non-pie binaries since it ought to be safe to do so and it aligns with llvm behaviour. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (IS_AARCH64_TLSDESC_RELOC): Add Morello relocations. (elfNN_aarch64_tlsdesc_small_plt_c64_entry): New Morello tlsdesc PLT entry. (elfNN_aarch64_howto_table): Add TLSDESC_ADR_PAGE20, TLSDESC_LD128_LO12, TLSDESC_CALL, TLSDESC relocations for Morello. (aarch64_tls_transition_without_check): Add INFO and MORELLO_RELOC arguments. Add morello TLSDESC relocations. (aarch64_reloc_got_type, elfNN_aarch64_final_link_relocate, elfNN_aarch64_tls_relax, elfNN_aarch64_check_relocs, aarch64_can_relax_tls): Add morello TLSDESC relocations. (aarch64_tls_transition): Add transitions for morello TLSDESC relocations. (elfNN_aarch64_tls_relax): Add relaxations for morello TLSDESC. (elfNN_aarch64_relocate_section): Emit dynamic relocation for Morello static relocations. (elfNN_aarch64_allocate_dynrelocs): Allocate dynamic relocation space for Morello TLSDESC. (elfNN_aarch64_finish_dynamic_sections): Emit Morello tlsdesc PLT entry. * elfxx-aarch64.c (_bfd_aarch64_elf_put_addend, _bfd_aarch64_elf_resolve_relocation): Add Morello relocations. * reloc.c: Add Morello relocations. * bfd-in2.h: Regenerate. * libbfd.h: Regenerate. gas/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * config/tc-aarch64.c (s_tlsdesccall): Emit Morello TLSDESC_CALL in C64 code. (reloc_table): Add Morello relocation. (md_apply_fix): Emit Morello TLSDESC_LD128_LO12 in C64 code. (aarch64_force_relocation): Add Morello TLSDESC relocations. * testsuite/gas/aarch64/morello-tlsdesc-c64.d: New file. * testsuite/gas/aarch64/morello-tlsdesc.d: New file. * testsuite/gas/aarch64/morello-tlsdesc.s: New file. include/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elf/aarch64.h: New Morello TLSDESC relocations. ld/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * testsuite/ld-aarch64/morello-tlsdesc.s: New file. * testsuite/ld-aarch64/morello-tlsdesc.d: New test. * testsuite/ld-aarch64/morello-tlsdesc-static.d: New test. * testsuite/ld-aarch64/morello-tlsdesc-staticpie.d: New test. * testsuite/ld-aarch64/aarch64-elf.exp: Add them.
*	[Morello] Pad section alignment to account for capability range format	Siddhesh Poyarekar	2020-10-20	3	-60/+338
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The capability format has limitations on the alignment and length of capability bounds and are subject to rounding. Add alignment and padding at the boundaries of such long (typically >16M) sections so that any capabilities referencing these sections do not end up overlapping into neighbouring sections. There are two cases where this is in use. The first and most important due to the current implementation is the range for PCC, which needs to span all executable sections and all PLT and GOT sections. The other case is for linker and ldscript defined symbols that may be used in dynamic relocations. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (elf_aarch64_link_hash_table): New member. (section_start_symbol, c64_valid_cap_range, exponent): Move up. (sec_change_queue): New structure. (queue_section_padding, record_section_change, elfNN_c64_resize_sections): New functions. (bfd_elfNN_aarch64_init_maps): Add info argument. Adjust callers. * elfxx-aarch64.h (bfd_elf64_aarch64_init_maps, bfd_elf32_aarch64_init_maps): Add info argument. (elf64_c64_resize_sections, elf32_c64_resize_sections): New function declarations. ld/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * emultempl/aarch64elf.em (elf64_c64_pad_section): New function. (gld${EMULATION_NAME}_after_allocation): Resize C64 sections. * ldlang.c (lang_add_newdot): New function. * ldlang.h (lang_add_newdot): New function declaration. * testsuite/ld-aarch64/aarch64-elf.exp: Add new test. * testsuite/ld-aarch64/morello-sec-round.d: New file. * testsuite/ld-aarch64/morello-sec-round.ld: New file. * testsuite/ld-aarch64/morello-sec-round.s: New file.
*	[Morello] Capability support for exception headers	Siddhesh Poyarekar	2020-10-20	5	-7/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Identify and mark C64 frames - Identify C64 registers including DDC. - Identify 'purecap' argument to .cfi_startproc for C64 frames - Emit 'C' in augmentation string for C64 frames - Recognise the 'C' in the CIE augmentation string when parsing exception headers Difference from LLVM: The llvm assembler only uses purecap to add C to the augmentation string. The GNU assembler on the other hand uses -march and validates that purecap is passed to .cfi_startproc only for -morello+c64. This means that for code compiled for A64, if llvm sees `.cfi_startproc purecap`, it sets 'C' whereas the GNU assembler flags an error. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elf-bfd.h (elf_backend_data): New callback elf_backend_eh_frame_augmentation_char. * elf-eh-frame.c (_bfd_elf_parse_eh_frame): Use it. * elfnn-aarch64.c (elf64_aarch64_eh_frame_augmentation_char): New function. (elf_backend_eh_frame_augmentation_char): New macro. * elfxx-target.h [!elf_backend_eh_frame_augmentation_char]: Set elf_backend_eh_frame_augmentation_char to NULL. (elfNN_bed): Initialise elf_backend_eh_frame_augmentation_char. binutils/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * dwarf.c (dwarf_regnames_aarch64): Add capability registers. gas/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * config/tc-aarch64.c (REG_DW_CSP, REG_DW_CLR): New macros. (s_aarch64_cfi_b_key_frame): Adjust for new entry_extras struct. (tc_aarch64_frame_initial_instructions): Adjust for C64. (tc_aarch64_fde_entry_init_extra, tc_aarch64_cfi_startproc_exp): New functions. (tc_aarch64_regname_to_dw2regnum): Support capability registers. * config/tc-aarch64.h (fde_entry): Forward declaration. (eh_entry_extras): New struct. (tc_fde_entry_extras, tc_cie_entry_extras): Use it. (tc_fde_entry_init_extra): Set to tc_aarch64_fde_entry_init_extra. (tc_output_cie_extra): Emit 'C' for C64. (tc_cie_fde_equivalent_extra): Adjust for C64. (tc_cie_entry_init_extra): Likewise. (tc_cfi_startproc_exp): New macro. (tc_aarch64_cfi_startproc_exp, tc_aarch64_fde_entry_init_extra): New function declarations. * dw2gencfi.c (tc_cfi_startproc_exp): New macro. (dot_cfi_startproc): Use it. * testsuite/gas/aarch64/morello-eh.d: New test. * testsuite/gas/aarch64/morello-eh.s: New test.
*	[Morello] Add interworking and range extension veneers	Siddhesh Poyarekar	2020-10-20	2	-45/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add veneers to branch from A64 to C64 and vice versa and for range extension from C64 to C64. The veneers are named as __foo_a64c64_veneer, __foo_c64a64_veneer or simply __foo_veneer (where foo is the target function) based on whether the branch is from A64 to C64, the other way around or for extended range. A64 to C64 needs an additional BX since the ADRP in the veneer does not generate a valid capability without the switch using BX. As a result, the addendum LSB is no longer important for A64 -> C64 switch, but we keep it anyway so that we can use the same veneer for long range C64 to C64 branches. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (STUB_ENTRY_NAME): Add format specifier for veneer type. (C64_MAX_ADRP_IMM, C64_MIN_ADRP_IMM): New macros. (aarch64_branch_reloc_p, c64_valid_for_adrp_p, aarch64_interwork_stub): New functions. (aarch64_c64_branch_stub, c64_aarch64_branch_stub): New stubs. (elf_aarch64_stub_type): New members. (aarch64_type_of_stub): Support C64 stubs. (aarch64_lookup_stub_type_suffix): New function. (elfNN_aarch64_stub_name): Use it. (elfNN_aarch64_get_stub_entry): Add stub_type argument. Adjust callers. Support C64 stubs. (aarch64_build_one_stub): Likewise. (aarch64_size_one_stub): Likewise. (elfNN_aarch64_size_stubs): Likewise. (elfNN_aarch64_build_stubs): Save and return error if stub building failed. (elfNN_aarch64_final_link_relocate): Emit stubs based on whether source and target of a branch are different. (aarch64_map_one_stub): Emit mapping symbol for C64 stubs. ld/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * testsuite/ld-aarch64/aarch64-elf.exp: Add test. * testsuite/ld-aarch64/morello-stubs-static.d: New file. * testsuite/ld-aarch64/morello-stubs.d: New file. * testsuite/ld-aarch64/morello-stubs.ld: New file. * testsuite/ld-aarch64/morello-stubs.s: New file. The jump targets have limited range (i.e. limited by ADRP range) and hence cannot be used for very long jumps. The linker will throw an error for such out of range jumps.
*	[Morello] Implement branch relocations	Siddhesh Poyarekar	2020-10-20	6	-32/+328
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the following static relocations: - R_MORELLO_CALL26, R_MORELLO_JUMP26 - R_MORELLO_TSTBR14, R_MORELLO_CONDBR19 and the following dynamic relocations: - R_MORELLO_JUMP_SLOT and R_MORELLO_IRELATIVE Some notes on the implementation: - The linker selects morello PLT stubs when it finds at least one static relocation that needs a capability GOT slot. - It is assumed that C64 is not compatible with BTI/PAC, so the latter gets overridden. To allow this, the call to setup_plt_values is delayed to take into account htab->c64_plt. - If the caller is A64, the assembler emits R_AARCH64_JUMP_SLOT, otherwise it emits R_MORELLO_JUMP_SLOT. - The PLT stub is A64-compatible, in that it should do the right thing when the execution state is A64. - If the slots are 16-bytes (this happens when there is at least one Morello relocation on the GOT), the references in .plt.got and in .got are always capabilities; the dynamic linker will take care of that. For PLT, the default trampoline is a capability. This is true for A64 as well as C64. - At present it is assumed that there is no interworking between A64 and C64 functions. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (elfNN_c64_small_plt0_entry, elfNN_c64_small_plt_entry): New variables. (elfNN_aarch64_howto_table): Add relocations. (setup_plt_values): Choose C64 PLT when appropriate. (bfd_elfNN_aarch64_set_options): Defer setup_plt_values call... (elfNN_aarch64_link_setup_gnu_properties) ... from here as well... (elfNN_aarch64_size_dynamic_sections): ... to here. (elfNN_aarch64_final_link_relocate, elfNN_aarch64_check_relocs, elfNN_aarch64_reloc_type_class): Support new relocations. (map_symbol_type): New member AARCH64_MAP_C64. (elfNN_aarch64_output_arch_local_syms): Use it. (aarch64_update_c64_plt_entry): New function. (elfNN_aarch64_create_small_pltn_entry): Use it. (elfNN_aarch64_init_small_plt0_entry): Emit C64 PLT when appropriate. * elfxx-aarch64.c (_bfd_aarch64_elf_put_addend, _bfd_aarch64_elf_resolve_relocation): Add new relocations. * libbfd.h (bfd_reloc_code_real_names): Likewise. * reloc.c: New relocations BFD_RELOC_MORELLO_TSTBR14, BFD_RELOC_MORELLO_BRANCH19, BFD_RELOC_MORELLO_JUMP26, BFD_RELOC_MORELLO_CALL26, BFD_RELOC_MORELLO_JUMP_SLOT and BFD_RELOC_MORELLO_IRELATIVE. * bfd-in2.h: Regenerate. gas/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * config/tc-aarch64.c (parse_operands): Choose C64 branch relocations when appropriate. (md_apply_fix, aarch64_force_relocation, aarch64_fix_adjustable): Support C64 branch relocations. include/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elf/aarch64.h: New relocations R_MORELLO_TSTBR14, R_MORELLO_CONDBR19, R_MORELLO_JUMP26, R_MORELLO_CALL26, R_MORELLO_JUMP_SLOT and R_MORELLO_IRELATIVE. ld/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * testsuite/ld-aarch64/aarch64-elf.exp: Add new tests. * testsuite/ld-aarch64/c64-ifunc-2-local.d: New file. * testsuite/ld-aarch64/c64-ifunc-2.d: New file. * testsuite/ld-aarch64/c64-ifunc-3a.d: New file. * testsuite/ld-aarch64/c64-ifunc-3b.d: New file. * testsuite/ld-aarch64/c64-ifunc-4.d: New file. * testsuite/ld-aarch64/c64-ifunc-4a.d: New file. * testsuite/ld-aarch64/ifunc-2-local.s: Support capabilities. * testsuite/ld-aarch64/ifunc-2.s: Likewise. opcodes/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * aarch64-dis.c (get_sym_code_type): Fix C64 PLT disassembly.
*	[Morello] Add symbol markers for reloc section for static binaries	Siddhesh Poyarekar	2020-10-20	2	-14/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add symbols __cap_dynrelocs_start and __cap_dynrelocs_end to mark the start and end of the .rela.dyn section when building a static executable without PIE. This allows the runtime startup to traverse the section and initialise capabilities without having to read the ELF headers. All relocations must be of type R_C64_RELATIVE and have the following properties: - Frag contains the base of the capability to be initialised - Frag + 8 has the size and permissions encoded into 56 and 8 bits respectively - Addend is the offset from the capability base bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (elfNN_aarch64_final_link_relocate): Emit relative C64 relocations for static binaries early. (aarch64_elf_init_got_section): Add capability relocations to SRELCAPS for non-PIE static binaries. (elfNN_aarch64_allocate_dynrelocs): Likewise. (elfNN_aarch64_always_size_sections): Emit __cap_dynrelocs_start and __cap_dynrelocs_end.
*	[Morello] GOT Relocations	Siddhesh Poyarekar	2020-10-20	6	-58/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Implement R_MORELLO_LD128_GOT_LO12_NC and emit the correct relocation based on the target register size. - Add R_MORELLO_GLOB_DAT and R_MORELLO_RELATIVE dynamic relocations for GOT entries - Add support for capabilities in GOT GOT slots for capabilities need to be 16 byte to accommodate capabilities. For this purpose, we delay initialising size and alignment of the GOT sections until we have walked all relocs in check_relocs. If we encounter capability relocations during the walk, set the GOT entry size and alignment to account for capabilities or leave it pointer sized otherwise. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elfnn-aarch64.c (GOT_ENTRY_SIZE): Adjust for C64 relocations. Adjust callers. (GOT_RESERVED_HEADER_SLOTS, GOT_CAP): New macros. (elfNN_aarch64_howto_table): Add R_MORELLO_LD128_GOT_LO12_NC and R_MORELLO_GLOB_DAT. (elf_aarch64_link_hash_table): New member c64_rel. (bfd_elfNN_aarch64_set_options): Initialise it. (cap_meta, c64_get_capsize): New functions. (aarch64_reloc_got_type): Use GOT_CAP. (elfNN_aarch64_final_link_relocate): Add R_MORELLO_LD128_GOT_LO12_NC and R_MORELLO_GLOB_DAT. (aarch64_elf_create_got_section): Move section initialisation into a... (aarch64_elf_init_got_section): ... New function. (elfNN_aarch64_size_dynamic_sections): Call it. (elfNN_aarch64_check_relocs): Add R_MORELLO_LD128_GOT_LO12_NC and R_MORELLO_GLOB_DAT. (elfNN_aarch64_finish_dynamic_symbol): Emit C64 relocations when appropriate. (elfNN_aarch64_got_elt_size): New function. (elfNN_aarch64_got_header_size): Return GOT entry size based on c64_rel. (elf_backend_got_elt_size): New macro. * elfxx-aarch64.c (_bfd_aarch64_elf_put_addend, _bfd_aarch64_elf_resolve_relocation): Add BFD_RELOC_MORELLO_LD128_GOT_LO12_NC. * libbfd.h (bfd_reloc_code_real_names): Add BFD_RELOC_MORELLO_GLOB_DAT and BFD_RELOC_MORELLO_LD128_GOT_LO12_NC. * reloc.c: Likewise. * bfd-in2.h: Regenerate. gas/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * config/tc-aarch64.c (parse_operands): Emit C64 relocations for got_lo12. Move old relocation checks from... (md_apply_fix): ... here. * testsuite/gas/aarch64/morello-ldst-reloc.d: Add tests. * testsuite/gas/aarch64/morello-ldst-reloc.s: Likewise. include/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * elf/aarch64.h: New relocations R_MORELLO_LD128_GOT_LO12_NC and R_MORELLO_GLOB_DAT. ld/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> * testsuite/ld-aarch64/emit-relocs-morello-1.d: New file. * testsuite/ld-aarch64/emit-relocs-morello-1.s: New test file. * testsuite/ld-aarch64/aarch64-elf.exp: Add it to test runner.
*	[Morello] Expand GOT entry sizes for C64	Siddhesh Poyarekar	2020-10-20	50	-67/+440
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expand GOT slots based on whether we are emitting C64 relocations. This patch only has infrastructure changes, i.e. it only makes got_header_size a function and adjusts across architectures. bfd/ChangeLog: 2020-10-20 Siddhesh Poyarekar <siddesh.poyarekar@arm.com> Tamar Christina <tamar.christina@arm.com> * elf-bfd.h (elf_backend_data): Make got_header_size a function. Add callbacks to all targets that use it. * elflink.c (_bfd_elf_create_got_section, bfd_elf_gc_common_finalize_got_offsets, _bfd_elf_common_section): Adjust got_header_size usage.