summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* ada: Get name from entity if that's what's passed to Subprogram_NameRichard Kenner2023-05-161-0/+3
| | | | | | | gcc/ada/ * sem_util.adb (Subprogram_Name): If what's passed is already an entity, use that for the name.
* ada: Document examples of No_Dependence restriction for code generationEric Botcazou2023-05-162-2/+43
| | | | | | | | gcc/ada/ * doc/gnat_rm/standard_and_implementation_defined_restrictions.rst (No_Dependence): Give examples of new No_Dependence restrictions. * gnat_rm.texi: Regenerate.
* ada: Bad handling of ASCII with -gnatynArnaud Charlet2023-05-162-3/+3
| | | | | | | | | | ASCII is special cased but this wasn't taking into account all cases such as Standard.ASCII. gcc/ada/ * snames.ads-tmpl (Name_ASCII): New. * style.adb (Check_Identifier): Fix handling of ASCII.
* ada: Introduce Cannot_Be_Superflat flag on N_Range nodesEric Botcazou2023-05-164-9/+22
| | | | | | | | | | | | | | | | | | | The support of superflat arrays in the language generates an overhead that the code generator attempts to minimize, but it cannot handle too complex cases and it would be helpful if the front-end could lend a hand. This change introduces the Cannot_Be_Superflat flag on N_Range nodes for this purpose, and sets it on the result of string concatenations when it is guaranteed to be nonnull. gcc/ada/ * gen_il-fields.ads (Opt_Field_Enum): Add Cannot_Be_Superflat. * gen_il-gen-gen_nodes.adb (N_Range): Add Cannot_Be_Superflat as semantical flag and change Includes_Infinities to semantical. * sinfo.ads (Cannot_Be_Superflat): Document it for N_Range. * exp_ch4.adb (Expand_Concatenate): Set Cannot_Be_Superflat on the range of the result if the result cannot be null.
* ada: Change Present_Expr field type to UintRichard Kenner2023-05-161-1/+1
| | | | | | | | | | We want the field to be initialized to No_Uint because we want to be able to test in GNAT LLVM whether we've already set it so we can be sure we only set it once. gcc/ada/ * gen_il-gen-gen_nodes.adb (Present_Expr): Type is now Uint.
* ada: Simplify dramatically ghost code for proof of System.Arith_DoubleYannick Moy2023-05-162-384/+56
| | | | | | | | | | | | | | | | Using Inline_For_Proof annotation on key expression functions makes it possible to remove hundreds of lines of ghost code that were previously needed to guide provers. gcc/ada/ * libgnat/s-aridou.adb (Big3, Is_Mult_Decomposition) (Is_Scaled_Mult_Decomposition): Add annotation for inlining. (Double_Divide, Scaled_Divide): Simplify and remove ghost code. (Prove_Multiplication): Add calls to lemmas to make proof go through. * libgnat/s-aridou.ads (Big, In_Double_Int_Range): Add annotation for inlining.
* ada: Add intermediate assertions for proof of Super_TailYannick Moy2023-05-161-0/+6
| | | | | | | | Proof of Superbounded internal unit requires a little more help. gcc/ada/ * libgnat/a-strsup.adb: Add intermediate assertions.
* ada: Missing dependency with -gnatcArnaud Charlet2023-05-161-11/+11
| | | | | | | | | | When using -gnatc, dependencies on preprocessor and config files were not recorded. gcc/ada/ * gnat1drv.adb: Ensure all dependencies are recorded even when not generating code.
* ada: Set Loop_Variant assertion policy to Ignore in bothYannick Moy2023-05-161-1/+2
| | | | | | | | Set Loop_Variant assertion policy to Ignore in both. gcc/ada/ * libgnat/a-strsup.adb: Set assertion policy for Loop_Variant.
* ada: Trivial refactoring in Instantiate_*_BodyMarc Poulhiès2023-05-161-10/+6
| | | | | | | | | | Factor out Par_Vis/Install_Parent/Par_Installed in Instantiate_Package_Body and Instantiate_Subprogram_Body. gcc/ada/ * sem_ch12.adb (Instantiate_Package_Body): Simplify if/then/else. (Instantiate_Subprogram_Body): Likewise.
* ada: Restore proof of System.Arith_DoubleYannick Moy2023-05-161-31/+119
| | | | | | | | | | | | | | | | | | Use Assert_And_Cut to simplify proof of second part of the Scaled_Divide. Add intermediate assertions and simplify where necessary. gcc/ada/ * libgnat/s-aridou.adb: (Big3): Remove override made useless. (Lemma_Quot_Rem): Add new lemma and justify it, as no prover manages to prove it. (Lemma_Div_Pow2): Use new lemma Lemma_Quot_Rem. (Prove_Scaled_Mult_Decomposition_Regroup3): Retype for simplification. (Scaled_Divide): Remove useless assertions.Decompose some assertions with cut operations. Use Assert_And_Cut for second half. Add assertions.
* RISC-V: Adjust stdint.h to stdint-gcc.h for rvv testsPan Li2023-05-1615-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch would like to align the stdint.h to the stdint-gcc.h for all the RVV test files. Aka: stdint.h => stdint-gcc.h Signed-off-by: Pan Li <pan2.li@intel.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h: Replace stdint.h with stdint-gcc.h. * gcc.target/riscv/rvv/autovec/binop/shift-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vand-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vor-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vrem-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto. * gcc.target/riscv/rvv/autovec/binop/vxor-template.h: Ditto. * gcc.target/riscv/rvv/autovec/series-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Ditto.
* s390: Refactor block operation setmemStefan Schulze Frielinghaus2023-05-164-20/+132
| | | | | | | | | | | | | | | | | | | | | | | Vectorize memset with a constant length of less than or equal to 64 bytes. Do not perform a libc function call into memset in case the size is not a compile-time constant but bounded and the upper bound is less than or equal to 256 bytes. gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_setmem): Change function signature. * config/s390/s390.cc (s390_expand_setmem): For memset's less than or equal to 256 byte do not perform a libc call. * config/s390/s390.md: Change expander into a version which takes 8 operands. gcc/testsuite/ChangeLog: * gcc.target/s390/memset-1.c: Test case memset1 makes use of vst, now.
* s390: Add block operation movmemStefan Schulze Frielinghaus2023-05-163-0/+124
| | | | | | | | | | gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_movmem): New. * config/s390/s390.cc (s390_expand_movmem): New. * config/s390/s390.md (movmem<mode>): New. (*mvcrl): New. (mvcrl): New.
* s390: Refactor block operation cpymemStefan Schulze Frielinghaus2023-05-163-22/+74
| | | | | | | | | | | | | | | | | Do not perform a libc function call into memcpy in case the size is not a compile-time constant but bounded and the upper bound is less than or equal to 256 bytes. gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_cpymem): Change function signature. * config/s390/s390.cc (s390_expand_cpymem): For memcpy's less than or equal to 256 byte do not perform a libc call. (s390_expand_insv): Adapt new function signature of s390_expand_cpymem. * config/s390/s390.md: Change expander into a version which takes 8 operands.
* Fortran: Fix an assortment of bugsPaul Thomas2023-05-1610-54/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2023-05-16 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/105152 * interface.cc (gfc_compare_actual_formal): Emit an error if an unlimited polymorphic actual is not matched either to an unlimited or assumed type formal argument. PR fortran/100193 * resolve.cc (resolve_ordinary_assign): Emit an error if the var expression of an ordinary assignment is a proc pointer component. PR fortran/87496 * trans-array.cc (gfc_walk_array_ref): Provide assumed shape arrays coming from interface mapping with a viable arrayspec. PR fortran/103389 * trans-expr.cc (gfc_conv_intrinsic_to_class): Tidy up flagging of unlimited polymorphic 'class_ts'. (gfc_conv_gfc_desc_to_cfi_desc): Assumed type is unlimited polymorphic and should accept any actual type. PR fortran/104429 (gfc_conv_procedure_call): Replace dreadful kludge with a call to gfc_finalize_tree_expr. Avoid dereferencing a void pointer by giving it the pointer type of the actual argument. PR fortran/82774 (alloc_scalar_allocatable_subcomponent): Shorten the function name and replace the symbol argument with the se string length. If a deferred length character length is either not present or is not a variable, give the typespec a variable and assign the string length to that. Use gfc_deferred_strlen to find the hidden string length component. (gfc_trans_subcomponent_assign): Convert the expression before the call to alloc_scalar_allocatable_subcomponent so that a good string length is provided. (gfc_trans_structure_assign): Remove the unneeded derived type symbol from calls to gfc_trans_subcomponent_assign. gcc/testsuite/ PR fortran/105152 * gfortran.dg/pr105152.f90 : New test PR fortran/100193 * gfortran.dg/pr100193.f90 : New test PR fortran/87946 * gfortran.dg/pr87946.f90 : New test PR fortran/103389 * gfortran.dg/pr103389.f90 : New test PR fortran/104429 * gfortran.dg/pr104429.f90 : New test PR fortran/82774 * gfortran.dg/pr82774.f90 : New test
* Skip -fdelete-null-pointer-check tests if target keeps_null_pointer_checksSenthil Kumar Selvaraj2023-05-169-1/+10
| | | | | | | | | | | | | | | | | | | A bunch of tests explicitly pass in -fdelete-null-pointer-checks and fail if the target keeps null pointer checks. Skip such tests by adding a dg-skip-if for keeps_null_pointer_checks. gcc/testsuite/ChangeLog: * gcc.dg/attr-returns-nonnull.c: Skip if keeps_null_pointer_checks. * gcc.dg/init-compare-1.c: Likewise. * gcc.dg/ipa/pr85734.c: Likewise. * gcc.dg/ipa/propmalloc-1.c: Likewise. * gcc.dg/ipa/propmalloc-2.c: Likewise. * gcc.dg/ipa/propmalloc-3.c: Likewise. * gcc.dg/ipa/propmalloc-4.c: Likewise. * gcc.dg/tree-ssa/evrp11.c: Likewise. * gcc.dg/tree-ssa/pr83648.c: Likewise.
* MATCH: [PR109424] Simplify min/max of boolean argumentsAndrew Pinski2023-05-165-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html which does not depend on adding gimple_truth_valued_p at this point. Instead will use zero_one_valued_p which is already used for mult simplifications to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0] as the range for signed bools. This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion because of PR 107888 (which I will be testing seperately). OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski PR tree-optimization/109424 gcc/ChangeLog: * match.pd: Add patterns for min/max of zero_one_valued values to `&`/`|`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bool-12.c: New test. * gcc.dg/tree-ssa/bool-13.c: New test. * gcc.dg/tree-ssa/minmax-20.c: New test. * gcc.dg/tree-ssa/minmax-21.c: New test.
* RISC-V: Add FRM and rounding mode operand into floating point intrinsicsJuzhe-Zhong2023-05-167-55/+251
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is adding rounding mode operand and FRM_REGNUM dependency into floating-point instructions. The floating-point instructions we added FRM and rounding mode operand: 1. vfadd/vfsub 2. vfwadd/vfwsub 3. vfmul 4. vfdiv 5. vfwmul 6. vfwmacc/vfwnmacc/vfwmsac/vfwnmsac 7. vfsqrt 8. floating-point conversions. 9. floating-point reductions. 10. floating-point ternary. The floating-point instructions we did NOT add FRM and rounding mode operand: 1. vfabs/vfneg/vfsqrt7/vfrec7 2. vfmin/vfmax 3. comparisons 4. vfclass 5. vfsgnj/vfsgnjn/vfsgnjx 6. vfmerge 7. vfmv.v.f gcc/ChangeLog: * config/riscv/riscv-protos.h (enum frm_field_enum): New enum. * config/riscv/riscv-vector-builtins.cc (function_expander::use_ternop_insn): Add default rounding mode. (function_expander::use_widen_ternop_insn): Ditto. * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add FRM REGNUM. (riscv_hard_regno_mode_ok): Ditto. (riscv_conditional_register_usage): Ditto. * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto. (FRM_REG_P): Ditto. (RISCV_DWARF_FRM): Ditto. * config/riscv/riscv.md: Ditto. * config/riscv/vector-iterators.md: split no frm and has frm operations. * config/riscv/vector.md (@pred_<optab><mode>_scalar): New pattern. (@pred_<optab><mode>): Ditto. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
* Daily bump.GCC Administrator2023-05-169-1/+447
|
* c: Ignore _Atomic on function return type for C2xJoseph Myers2023-05-153-2/+49
| | | | | | | | | | | | | | | | | For C2x it was decided that _Atomic would be completely ignored on function return types (just as was done for qualifiers in C11 DR#423), to eliminate the potential for an rvalue returned by a function having _Atomic-qualified type when an rvalue resulting from lvalue-to-rvalue conversion could not have such a type. Implement this for GCC. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/c/ * c-decl.cc (grokdeclarator): Ignore _Atomic on function return type for C2x. gcc/testsuite/ * gcc.dg/qual-return-9.c, gcc.dg/qual-return-10.c: New tests.
* c: Update __has_c_attribute values for C2xJoseph Myers2023-05-152-24/+20
| | | | | | | | | | | | | | | | | | | | | WG14 decided that __has_c_attribute should return the same value (equal to the intended __STDC_VERSION__ value) for all standard attributes in C2x, with values associated with when an attribute was added to the working draft (or had semantics added or changed in the working draft) only being used in earlier stages of development of that draft. The intent is that the values for existing attributes increase in future standard versions only if there are new features / semantic changes for those attributes. Implement this change for GCC. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/c-family/ * c-lex.cc (c_common_has_attribute): Use 202311 as __has_c_attribute return for all C2x attributes. gcc/testsuite/ * gcc.dg/c2x-has-c-attribute-2.c: Expect 202311L return value from __has_c_attribute for all C2x attributes.
* Fortran: CLASS pointer function result in variable definition context [PR109846]Harald Anlauf2023-05-152-1/+40
| | | | | | | | | | | | | | gcc/fortran/ChangeLog: PR fortran/109846 * expr.cc (gfc_check_vardef_context): Check appropriate pointer attribute for CLASS vs. non-CLASS function result in variable definition context. gcc/testsuite/ChangeLog: PR fortran/109846 * gfortran.dg/ptr-func-5.f90: New test.
* Add auto-resizing capability to irange's [PR109695]Aldy Hernandez2023-05-152-30/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <tldr> We can now have int_range<N, RESIZABLE=false> for automatically resizable ranges. int_range_max is now int_range<3, true> for a 69X reduction in size from current trunk, and 6.9X reduction from GCC12. This incurs a 5% performance penalty for VRP that is more than covered by our > 13% improvements recently. </tldr> int_range_max is the temporary range object we use in the ranger for integers. With the conversion to wide_int, this structure bloated up significantly because wide_ints are huge (80 bytes a piece) and are about 10 times as big as a plain tree. Since the temporary object requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word. This means the structure grew from 4112 bytes to 40912 bytes. This patch adds the ability to resize ranges as needed, defaulting to no resizing, while int_range_max now defaults to 3 sub-ranges (instead of 255) and grows to 255 when the range being calculated does not fit. For example: int_range<1> foo; // 1 sub-range with no resizing. int_range<5> foo; // 5 sub-ranges with no resizing. int_range<5, true> foo; // 5 sub-ranges with resizing. I ran some tests and found that 3 sub-ranges cover 99% of cases, so I've set the int_range_max default to that: typedef int_range<3, /*RESIZABLE=*/true> int_range_max; We don't bother growing incrementally, since the default covers most cases and we have a 255 hard-limit. This hard limit could be reduced to 128, since my tests never saw a range needing more than 124, but we could do that as a follow-up if needed. With 3-subranges, int_range_max is now 592 bytes versus 40912 for trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP and 3.02% for threading, with no noticeable change in overall compilation (0.27%). This is more than covered by our 13.26% improvements for the legacy removal + wide_int conversion. I think this approach is a good alternative, while providing us with flexibility going forward. For example, we could try defaulting to a 8 sub-ranges for a noticeable improvement in VRP. We could also use large sub-ranges for switch analysis to avoid resizing. Another approach I tried was always resizing. With this, we could drop the whole int_range<N> nonsense, and have irange just hold a resizable range. This simplified things, but incurred a 7% penalty on ipa_cp. This was hard to pinpoint, and I'm not entirely convinced this wasn't some artifact of valgrind. However, until we're sure, let's avoid massive changes, especially since IPA changes are coming up. For the curious, a particular hot spot for IPA in this area was: ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) { ... ... value_range save (m_vr); m_vr.union_ (*other_vr); return m_vr != save; } The problem isn't the resizing (since we do that at most once) but the fact that for some functions with lots of callers we end up a huge range that gets copied and compared for every meet operation. Maybe the IPA algorithm could be adjusted somehow??. Anywhooo... for now there is nothing to worry about, since value_range still has 2 subranges and is not resizable. But we should probably think what if anything we want to do here, as I envision IPA using infinite ranges here (well, int_range_max) and handling frange's, etc. gcc/ChangeLog: PR tree-optimization/109695 * value-range.cc (irange::operator=): Resize range. (irange::union_): Same. (irange::intersect): Same. (irange::invert): Same. (int_range_max): Default to 3 sub-ranges and resize as needed. * value-range.h (irange::maybe_resize): New. (~int_range): New. (int_range::int_range): Adjust for resizing. (int_range::operator=): Same.
* Only return changed=true in union_nonzero when appropriate.Aldy Hernandez2023-05-152-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irange::union_ was being overly pessimistic in its return value. It was returning false when the nonzero mask was possibly the same. The reason for this is because the nonzero mask is not entirely kept up to date. We avoid setting it up when a new range is set (from a set, intersect, union, etc), because calculating a mask from a range is measurably expensive. However, irange::get_nonzero_bits() will always return the correct mask because it will calculate the nonzero mask inherit in the mask on the fly and bitwise or it with the saved mask. This was an optimization because last release it was a big penalty to keep the mask up to date. This may not necessarily be the case with the conversion to wide_int's. We should investigate. Just to be clear, the result from get_nonzero_bits() is always correct as seen by the user, but the wide_int in the irange does not contain all the information, since part of the nonzero bits can be determined by the range itself, on the fly. The fix here is to make sure that the result the user sees (callers of get_nonzero_bits()) changed when unioning bits. This allows ipcp_vr_lattice::meet_with_1 to avoid unnecessary copies when determining if a range changed. This patch yields an 6.89% improvement to the ipa_cp pass. I'm including the IPA changes in this patch, as it's a testcase of sorts for the change. gcc/ChangeLog: * ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Avoid unnecessary range copying * value-range.cc (irange::union_nonzero_bits): Return TRUE only when range changed.
* c++: add feature-test macro for auto(x)Patrick Palka2023-05-152-0/+7
| | | | | | | | | | | | | | This adds the feature-test macro for PR0849R8, as per https://github.com/cplusplus/CWG/issues/281. gcc/c-family/ChangeLog: * c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast for C++23. gcc/testsuite/ChangeLog: * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.
* RISC-V: Add rounding mode operand for fixed-point patternsJuzhe-Zhong2023-05-156-23/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we are going to have fixed-point intrinsics that are modeling rounding mode https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222 We should have operand to specify rounding mode in fixed-point instructions. We don't support these modeling rounding mode intrinsics yet but we will definetely support them later. This is the preparing patch for new coming intrinsics. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum. * config/riscv/riscv-vector-builtins.cc (function_expander::use_exact_insn): Add default rounding mode operand. * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_REGNUM. (riscv_hard_regno_mode_ok): Ditto. (riscv_conditional_register_usage): Ditto. * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto. (VXRM_REG_P): Ditto. (RISCV_DWARF_VXRM): Ditto. * config/riscv/riscv.md: Ditto. * config/riscv/vector.md: Ditto Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
* OPTABS: Extend the number of expanding instructions patternPan Li2023-05-151-0/+5
| | | | | | | | | | | | | | | | | | | We (RVV) is going to add a rounding mode operand into floating-point instructions which have 11 operands. Since we are going have intrinsic that is adding rounding mode argument: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 This is the patch that is adding rounding mode operand in RISC-V port: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618573.html You can see there are 11 operands in these patterns. gcc/ChangeLog: * optabs.cc (maybe_gen_insn): Add case to generate instruction that has 11 operands. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
* fix assert in non-atomic pathThomas Neumann2023-05-151-1/+3
| | | | | | | | The non-atomic path does not have range information, we have to adjust the assert handle that case, too. libgcc/ChangeLog: * unwind-dw2-fde.c: Fix assert in non-atomic path.
* aarch64: Cost vector comparisons more accuratelyKyrylo Tkachov2023-05-152-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are missing cases for combining of FACGE/FACGT instructions. In the testcase of the patch we generate: foo: fabs v3.4s, v0.4s fabs v0.4s, v1.4s fabs v1.4s, v2.4s fcmgt v0.4s, v3.4s, v0.4s fcmgt v1.4s, v3.4s, v1.4s b g This is because combine is rejecting the pattern due to costs: Successfully matched this instruction: (set (reg:V4SI 106) (neg:V4SI (lt:V4SI (abs:V4SF (reg:V4SF 113)) (abs:V4SF (reg:V4SF 111))))) rejecting combination of insns 8, 9 and 10 original costs 8 + 8 + 12 = 28 replacement costs 8 + 28 = 36 It is obviously recursing in the various arms of the RTX and such. This patch teaches the aarch64 rtx costs routine that our vector comparisons are represented as a NEG of compare operators, with the FACGE/FAGT operations in particular having ABS on each arm. With this patch we get the much more reasonable dump: original costs 8 + 8 + 8 = 24 replacement costs 8 + 8 = 16 and generate the optimal assembly: foo: mov v31.16b, v0.16b facgt v0.4s, v0.4s, v1.4s facgt v1.4s, v31.4s, v2.4s b g Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_rtx_costs, NEG case): Add costing logic for vector modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/facg_1.c: New test.
* Support parallel testing in libgomp, part II [PR66005]Thomas Schwinge2023-05-158-6/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ..., and enable if 'flock' is available for serializing execution testing. Regarding the default of 19 parallel slots, this turned out to be a local minimum for wall time when testing this on: $ uname -srvi Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz ... in two configurations: case (a) standard configuration, no offloading configured, case (b) offloading for GCN and nvptx configured but no devices available. For both cases, default plus '-m32' variant. $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}" Case (a), baseline: 6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k 6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k This is what people have been complaining about, rightly so, in <https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and elsewhere. Case (a), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k Yay! Case (b), baseline; 2+ h: 7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k Case (b), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k On my Dell Precision 7530 laptop: $ uname -srvi Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz $ nvidia-smi -L GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) ... in two configurations: case (c) standard configuration, no offloading configured, case (d) offloading for nvptx configured and device available. For both cases, only default variant, no '-m32'. $ \time make check-target-libgomp Case (c), baseline; roughly half of case (a) (just one variant): 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k Case (c), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k 1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k 2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k 2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k Case (d), baseline (compared to case (b): only nvptx offloading compilation, but also nvptx offloading execution); ~1 h: 2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k 2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k Case (d), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k 3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k Worth noting is that with nvptx offloading, there is one execution test case that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively stalls progress for almost 5 min: quickly other executions test cases queue up on the lock for all parallel slots. That's working as expected; just noting this as it accordingly does skew the wall time numbers. PR testsuite/66005 libgomp/ * configure.ac: Look for 'flock'. * testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing. * testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here... * testsuite/lib/libgomp.exp: ... but here, instead. (libgomp_load): Override for parallel testing. * testsuite/libgomp-site-extra.exp.in (FLOCK): Set. * configure: Regenerate. * Makefile.in: Regenerate. * testsuite/Makefile.in: Regenerate.
* Support parallel testing in libgomp, part I [PR66005]Rainer Orth2023-05-153-23/+138
| | | | | | | | | | | | | | | | | | | | | | ..., while still hard-coding the number of parallel slots to one. PR testsuite/66005 libgomp/ * testsuite/Makefile.am (PWD_COMMAND): New variable. (%/site.exp): New target. (check_p_numbers0, check_p_numbers1, check_p_numbers2) (check_p_numbers3, check_p_numbers4, check_p_numbers5) (check_p_numbers6, check_p_numbers, gcc_test_parallel_slots) (check_p_subdirs) (check_DEJAGNU_libgomp_targets): New variables. ($(check_DEJAGNU_libgomp_targets)): New target. ($(check_DEJAGNU_libgomp_targets)): New dependency. (check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets. * testsuite/Makefile.in: Regenerate. * testsuite/lib/libgomp.exp: For parallel testing, 'load_file ../libgomp-test-support.exp'. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
* libgomp testsuite: As appropriate, use the 'gcc', 'g++', 'gfortran' driver ↵Thomas Schwinge2023-05-1510-65/+71
| | | | | | | | | | | | | | | | | | | | | | | | [PR91884] ..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++' for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.) PR testsuite/91884 libgomp/ * configure.ac: 'AC_SUBST(CXX)'. * configure: Regenerate. * Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/libgomp-site-extra.exp.in (GXX_UNDER_TEST) (GFORTRAN_UNDER_TEST): Set. * testsuite/lib/libgomp.exp (libgomp_init): Adjust. * testsuite/libgomp.c++/c++.exp: Use 'GXX_UNDER_TEST'. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Use 'GFORTRAN_UNDER_TEST'. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
* libgomp testsuite: Have each '*.exp' file specify the compiler to use [PR91884]Thomas Schwinge2023-05-158-15/+17
| | | | | | | | | | | | | | | | | | ..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior. PR testsuite/91884 libgomp/ * testsuite/lib/libgomp.exp (libgomp_target_compile): Don't specify compiler. * testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler. * testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.
* fix assert in __deregister_frame_info_basesSören Tempel2023-05-151-1/+3
| | | | | | | | | | | | | | | The assertion in __deregister_frame_info_bases assumes that for every frame something was inserted into the lookup data structure by __register_frame_info_bases. Unfortunately, this does not necessarily hold true as the btree_insert call in __register_frame_info_bases will not insert anything for empty ranges. Therefore, we need to explicitly account for such empty ranges in the assertion as `ob` will be a null pointer for such ranges, hence causing the assertion to fail. Signed-off-by: Sören Tempel <soeren@soeren-tempel.net> libgcc/ChangeLog: * unwind-dw2-fde.c: Accept empty ranges when deregistering frames.
* ada: Fix typo in commentMarc Poulhiès2023-05-151-1/+1
| | | | | | gcc/ada/ * exp_ch3.adb (Make_Allocator_For_Return): Fix typo in comment.
* ada: Add annotations for proof of termination of runtime unitsYannick Moy2023-05-157-0/+66
| | | | | | | | | | | | | | | | String-manipulating functions should always terminate. Add justification for the termination of Mapping function parameter, and loop variants where needed. This is needed for GNATprove to prove termination. gcc/ada/ * libgnat/a-strbou.ads: Add justifications for Mapping. * libgnat/a-strfix.adb: Same. * libgnat/a-strfix.ads: Same. * libgnat/a-strsea.adb: Same. * libgnat/a-strsea.ads: Same. * libgnat/a-strsup.adb: Same and add loop variants. * libgnat/a-strsup.ads: Same and add specification of termination.
* ada: Recover proof of runtime unitsYannick Moy2023-05-152-5/+7
| | | | | | | | | | | Changes needed to make proof go through, after some change in GNAT and SPARK. gcc/ada/ * libgnat/a-strsup.adb (Super_Slice): Reorder component assignment to avoid failing predicate check related to initialization. * libgnat/s-expmod.adb (Exp_Modular): Add intermediate assertion.
* ada: Recover proof of Interfaces.C for terminationYannick Moy2023-05-151-1/+9
| | | | | | | | | | GNATprove reports possible non-terminating loops in functions marked as terminating. Add loop variants to prove loop termination. gcc/ada/ * libgnat/i-c.adb: Add loop variants. Remove useless initialization.
* ada: Fix comment related to inliningBob Duff2023-05-151-3/+1
| | | | | | | | | Correction to previous check-in: Remove comment about Proc_Next_... procedures, which were deleted. gcc/ada/ * einfo-utils.ads: Remove comment.
* ada: Use Inline aspect instead of pragma in Einfo.UtilsBob Duff2023-05-152-299/+176
| | | | | | | | | | | | | | | | | | | This package was using the Ada 83 renaming idiom for inlining Next_Component and other Next_... procedures without inlining the same-named functions. Using the Inline aspect avoids that sort of horsing around. We change all the other pragmas Inline in this package to aspects as well, which is a more-minor improvement. Fix too-long lines without wrapping lines. gcc/ada/ * einfo-utils.ads, einfo-utils.adb: Get rid of the Proc_Next_... procedures. Use Inline aspect instead of pragma Inline. Is_Discrete_Or_Fixed_Point_Type did not have pragma Inline, but now has the aspect; this was probably an oversight (which illustrates why aspects are better).
* ada: Fix formatting inconsistency in User's GuideRonan Desplanques2023-05-151-2/+2
| | | | | | | gcc/ada/ * doc/gnat_ugn/gnat_utility_programs.rst: Fix formatting inconsistency.
* ada: Remove duplicated code in Proc_Next_Component_Or_DiscriminantBob Duff2023-05-151-5/+1
| | | | | | | | | | | Proc_Next_Component_Or_Discriminant was duplicating the code in Next_Component_Or_Discriminant. gcc/ada/ * einfo-utils.adb: (Proc_Next_Component_Or_Discriminant): Call Next_Component_Or_Discriminant.
* ada: Improve comment on First_EntityBob Duff2023-05-151-5/+6
| | | | | | | | | | Clarify that "act as scope" overlaps with "[sub]type". gcc/ada/ * einfo.ads: (First_Entity): Update comment explaining why this exists on all [sub]types, as opposed to just the ones with associated entities.
* ada: Clean up vanishing entity fieldsBob Duff2023-05-159-96/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix all the failures caused by enabling Check_Vanishing_Fields on entities in all cases except the case of converting to or from E_Void. But leave Check_Vanishing_Fields disabled by default (controlled by -gnatd_v flag), because it might be too slow even for assertions-on mode, and we should deal with the E_Void cases eventually. The failures are fixed either by adding calls to Reinit_Field_To_Zero, or by changing which entities have which fields. Note that in a series of Reinit_Field_To_Zero calls, the optional Old_Ekind parameter is only useful on the first such call. gcc/ada/ * atree.adb (Check_Vanishing_Fields): Disable the check for "root/base type only" fields. This is a bug fix -- if we're checking some subtype S, we don't want to reach over to the root or base type and Reinit_Field_To_Zero of that, thus modifying the field for lots of subtypes other than S. Disable in the to/from E_Void cases. Misc cleanup. * gen_il-gen-gen_entities.adb: Define First_Entity, Last_Entity, and Stored_Constraint for all type entities, because there are too many cases where Reinit_Field_To_Zero would otherwise be needed. In any case, it seems cleaner to have First_Entity and Last_Entity defined in the same entity kinds. * einfo.ads: (First_Entity, Last_Entity, Stored_Constraint): Update comments to reflect gen_il-gen-gen_entities.adb changes. (Lit_Hash): Add missing "[root type only]" comment. * exp_ch5.adb: Add Reinit_Field_To_Zero calls for vanishing fields. * sem_ch10.adb: Likewise. * sem_ch6.adb: Likewise. * sem_ch7.adb: Likewise. * sem_ch8.adb: Likewise. * sem_ch3.adb: Likewise. Also remove now-unnecessary Reinit_Field_To_Zero calls.
* ada: Fix internal error on instance in package body with -gnatnEric Botcazou2023-05-151-0/+4
| | | | | | | | | | | This plugs a small loophole in the procedure responsible for attempting to hide entities that have been previously made public by the semantic analyzer in package bodies. gcc/ada/ * sem_ch7.adb (Hide_Public_Entities): Use the same condition for subprogram bodies without specification as for those with one.
* ada: Remove redundant protection against empty listsPiotr Trojanek2023-05-151-9/+6
| | | | | | | | | | Calls to First on No_List intentionally return Empty node, so explicit guards against No_List are unnecessary. Code cleanup; semantics is unaffected. gcc/ada/ * sem_util.adb (New_Copy_Tree): Remove redundant calls to Present.
* ada: Simplify lookup of predecessor in homonym chainRonan Desplanques2023-05-151-2/+1
| | | | | | | gcc/ada/ * sem_ch8.adb (End_Scope): Simplify lookup of predecessor in homonym chain.
* ada: Accept aggregates with OTHERS clause in unchecked type conversionsPiotr Trojanek2023-05-151-0/+1
| | | | | | | | | | | | | | | | When inlining subprogram calls in GNATprove mode, the actual parameter is wrapped in an unchecked conversion. If this actual parameter is an aggregate OTHERS clause, then the type of unchecked conversion allows us to resolve this clause (just like for aggregates wrapped in a qualified expression). Previously such aggregates were rejected, which caused spurious and cryptic errors; now they are accepted. gcc/ada/ * sem_aggr.adb (Resolve_Aggregate): Accept aggregates with OTHERS appearing inside unchecked conversions.
* ada: Emit warnings for (some) ineffective static predicate testsSteve Baird2023-05-155-26/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | Generate a warning if a static predicate tests for a value that does not belong to the parent subtype. For example, in subtype S is Positive with Static_Predicate => S not in 0 | 11 | 222; the 0 is ineffective because Positive already excludes that value. Generation of this new warning is controlled by the -gnatw_s switch, which can also be enabled via -gnatwa. gcc/ada/ * warnsw.ads: Add a new element, Warn_On_Ineffective_Predicate_Test, to the Opt_Warnings_Enum enumeration type. * warnsw.adb: Bind "-gnatw_s" to the new Warn_On_Ineffective_Predicate_Test switch. Add the new switch to the set of switches enabled by -gnata . * sem_ch13.adb (Build_Discrete_Static_Predicate): Declare new local procedure, Warn_If_Test_Ineffective, which conditionally generates new warning. Call this new procedure when building a new element of an RList. * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Document the -gnatw_s switch (and the corresponding -gnatw_S switch). * gnat_ugn.texi: Regenerate.