| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since r12-5072 made _Safe_container::operator=(const _Safe_container&)
protected, the debug containers no longer compile in C++98 mode. They
have user-provided copy assignment operators in C++98 mode, and they
assign each base class in turn. The 'this->_M_safe() = __x' expressions
fail, because calling a protected member function is only allowed via
'this'. They could be fixed by using this->_Safe::operator=(__x) but a
simpler solution is to just remove the user-provided assignment
operators and let the compiler define them (as we do for C++11 and
later, by defining them as defaulted).
The only change needed for that to work is to define the _Safe_vector
copy assignment operator in C++98 mode, so that the implicit
__gnu_debug::vector::operator= definition will call it, instead of
needing to call _M_update_guaranteed_capacity() manually.
libstdc++-v3/ChangeLog:
* include/debug/deque (deque::operator=(const deque&)): Remove
definition.
* include/debug/list (list::operator=(const list&)): Likewise.
* include/debug/map.h (map::operator=(const map&)): Likewise.
* include/debug/multimap.h (multimap::operator=(const multimap&)):
Likewise.
* include/debug/multiset.h (multiset::operator=(const multiset&)):
Likewise.
* include/debug/set.h (set::operator=(const set&)): Likewise.
* include/debug/string (basic_string::operator=(const basic_string&)):
Likewise.
* include/debug/vector (vector::operator=(const vector&)):
Likewise.
(_Safe_vector::operator=(const _Safe_vector&)): Define for
C++98 as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All users of path_range_query are currently allocating a gimple_ranger
only to pass it to the query object. It's tidier to just do it from
path_range_query if no ranger was passed.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::path_range_query): New
ctor without a ranger.
(path_range_query::~path_range_query): Free ranger if necessary.
(path_range_query::range_on_path_entry): Adjust m_ranger for pointer.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::compute_imports): Same.
(path_range_query::compute_ranges): Same.
(path_range_query::range_of_stmt): Same.
(path_range_query::compute_outgoing_relations): Same.
* gimple-range-path.h (class path_range_query): New ctor.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger
as path_range_query allocates one.
* tree-ssa-threadbackward.c (class back_threader): Remove m_ranger.
(back_threader::~back_threader): Same.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have much more thorough restrictions, that are shared between both
threader implementations, in the registry. I've been meaning to
remove the backward threader one, since it's only purpose was reducing
the search space. Previously there was a small time penalty for its
removal, but with the various patches in the past month, it looks like
the removal is a wash performance wise.
This catches 8 more jump threads in the backward threader in my suite.
Presumably, because we disallowed all loop crossing, whereas the
registry restrictions allow some crossing (if we exit the loop, etc).
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Remove loop
crossing restriction.
|
|
|
|
|
|
|
| |
2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com>
gcc/testsuite/
* gcc.target/powerpc/test_mffsl.c: Require Power9.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the Create_func_descriptors pass to traverse the subexpressions of
the function in a Call_expression. There are no subexpressions in the
normal case of calling a function a method directly, but there are
subexpressions when in code like F().M() when F returns an interface type.
Forgetting to traverse the function subexpressions was almost entirely
hidden by the fact that we also created the necessary thunks in
Bound_method_expression::do_flatten and
Interface_field_reference_expression::do_get_backend. However, when
the thunks were created there, they did not go through the
order_evaluations pass. This almost always worked, but failed in the
case in which the function being thunked returned multiple results, as
order_evaluations takes the necessary step of moving the
Call_expression into its own statement, and that would not happen when
order_evaluations was not called. Avoid hiding errors like this by
changing those methods to only lookup the previously created thunk,
rather than creating it if it was not already created.
The test case for this is https://golang.org/cl/363156.
Fixes https://golang.org/issue/49512
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calling the placement version of ::operator new "implicitly creates
objects in the returned region of storage" as per [intro.object]. This
allows the returned memory to be used as storage for implicit-lifetime
types (including arrays) without additional action by the caller. This
is required by the proposed resolution of LWG 3147.
libstdc++-v3/ChangeLog:
* include/std/memory_resource (memory_resource::allocate):
Implicitly create objects in the returned storage.
|
|
|
|
|
|
|
|
|
|
| |
This function only exists to avoid an error in the debug mode vector, so
doesn't need to be public.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector<bool>::data()): Give
protected access, and delete for C++11 and later.
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed on the mailing list the template actually tests for missed
optimization where we fail to pragate size of an array. We no longer miss this
after modref improvements.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* gfortran.dg/inline_matmul_17.f90: Fix template
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We newly can handle some extra cases, for example:
struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
a->a=1;
a->b=2;
a->c=3;
}
int const_fn ()
{
struct a a;
init (&a);
return a.a + a.b + a.c;
}
Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.
I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const. Stil some testuiste compensation is needed.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.
gcc/testsuite/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.
|
|
|
|
|
|
|
|
| |
gcc/ChangeLog:
PR other/103129
* diagnostic-show-locus.c (def_policy): Use def_tabstop.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fortran part to commit r12-5146-g48d7327f2aaf65
gcc/fortran/ChangeLog:
* gfortran.h (struct gfc_omp_clauses): Rename num_teams to
num_teams_upper, add num_teams_upper.
* dump-parse-tree.c (show_omp_clauses): Update to handle
lower-bound num_teams clause.
* frontend-passes.c (gfc_code_walker): Likewise
* openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses,
resolve_omp_clauses): Likewise.
* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses,
gfc_trans_omp_target): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/teams-1.f90: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned and polynomial type-qualified builtins for
vcombine_* Neon intrinsics. Using these builtins removes the need for
many casts in arm_neon.h.
gcc/ChangeLog:
2021-11-10 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete.
(TYPES_COMBINEP): Delete.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for vcombine_* intrinsics.
* config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary
cast.
(vcombine_s16): Likewise.
(vcombine_s32): Likewise.
(vcombine_f32): Likewise.
(vcombine_u8): Use type-qualified builtin and remove casts.
(vcombine_u16): Likewise.
(vcombine_u32): Likewise.
(vcombine_u64): Likewise.
(vcombine_p8): Likewise.
(vcombine_p16): Likewise.
(vcombine_p64): Likewise.
(vcombine_bf16): Remove unnecessary cast.
* config/aarch64/iterators.md (VD_I): New mode iterator.
(VDC_P): New mode iterator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned and polynomial type-qualified builtins for LD1/ST1
Neon intrinsics. Using these builtins removes the need for many casts
in arm_neon.h.
The new type-qualified builtins are also lowered to gimple - as the
unqualified builtins are already.
gcc/ChangeLog:
2021-11-10 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define.
(TYPES_LOAD1_P): Define.
(TYPES_STORE1_U): Define.
(TYPES_STORE1P): Rename to...
(TYPES_STORE1_P): This.
(get_mem_type_for_load_store): Add unsigned and poly types.
(aarch64_general_gimple_fold_builtin): Add unsigned and poly
type-qualified builtin declarations.
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for LD1/ST1.
* config/aarch64/arm_neon.h (vld1_p8): Use type-qualified
builtin and remove cast.
(vld1_p16): Likewise.
(vld1_u8): Likewise.
(vld1_u16): Likewise.
(vld1_u32): Likewise.
(vld1q_p8): Likewise.
(vld1q_p16): Likewise.
(vld1q_p64): Likewise.
(vld1q_u8): Likewise.
(vld1q_u16): Likewise.
(vld1q_u32): Likewise.
(vld1q_u64): Likewise.
(vst1_p8): Likewise.
(vst1_p16): Likewise.
(vst1_u8): Likewise.
(vst1_u16): Likewise.
(vst1_u32): Likewise.
(vst1q_p8): Likewise.
(vst1q_p16): Likewise.
(vst1q_p64): Likewise.
(vst1q_u8): Likewise.
(vst1q_u16): Likewise.
(vst1q_u32): Likewise.
(vst1q_u64): Likewise.
* config/aarch64/iterators.md (VALLP_NO_DI): New iterator.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
the vector reduction Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for vector reduction.
* config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified
builtin and remove casts.
(vaddv_u16): Likewise.
(vaddv_u32): Likewise.
(vaddvq_u8): Likewise.
(vaddvq_u16): Likewise.
(vaddvq_u32): Likewise.
(vaddvq_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
the pairwise addition Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def:
* config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified
builtin and remove casts.
(vpaddq_u16): Likewise.
(vpaddq_u32): Likewise.
(vpaddq_u64): Likewise.
(vpadd_u8): Likewise.
(vpadd_u16): Likewise.
(vpadd_u32): Likewise.
(vpaddd_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-subtract Neon intrinsics. This removes
the need for many casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]subhn[2].
* config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary
cast.
(vsubhn_s32): Likewise.
(vsubhn_s64): Likewise.
(vsubhn_u16): Use type-qualified builtin and remove casts.
(vsubhn_u32): Likewise.
(vsubhn_u64): Likewise.
(vrsubhn_s16): Remove unnecessary cast.
(vrsubhn_s32): Likewise.
(vrsubhn_s64): Likewise.
(vrsubhn_u16): Use type-qualified builtin and remove casts.
(vrsubhn_u32): Likewise.
(vrsubhn_u64): Likewise.
(vrsubhn_high_s16): Remove unnecessary cast.
(vrsubhn_high_s32): Likewise.
(vrsubhn_high_s64): Likewise.
(vrsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vrsubhn_high_u32): Likewise.
(vrsubhn_high_u64): Likewise.
(vsubhn_high_s16): Remove unnecessary cast.
(vsubhn_high_s32): Likewise.
(vsubhn_high_s64): Likewise.
(vsubhn_high_u16): Use type-qualified builtin and remove
casts.
(vsubhn_high_u32): Likewise.
(vsubhn_high_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-narrowing-add Neon intrinsics. This removes the
need for many casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
builtins for [r]addhn[2].
* config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary
cast.
(vaddhn_s32): Likewise.
(vaddhn_s64): Likewise.
(vaddhn_u16): Use type-qualified builtin and remove casts.
(vaddhn_u32): Likewise.
(vaddhn_u64): Likewise.
(vraddhn_s16): Remove unnecessary cast.
(vraddhn_s32): Likewise.
(vraddhn_s64): Likewise.
(vraddhn_u16): Use type-qualified builtin and remove casts.
(vraddhn_u32): Likewise.
(vraddhn_u64): Likewise.
(vaddhn_high_s16): Remove unnecessary cast.
(vaddhn_high_s32): Likewise.
(vaddhn_high_s64): Likewise.
(vaddhn_high_u16): Use type-qualified builtin and remove
casts.
(vaddhn_high_u32): Likewise.
(vaddhn_high_u64): Likewise.
(vraddhn_high_s16): Remove unnecessary cast.
(vraddhn_high_s32): Likewise.
(vraddhn_high_s64): Likewise.
(vraddhn_high_u16): Use type-qualified builtin and remove
casts.
(vraddhn_high_u32): Likewise.
(vraddhn_high_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
halving-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uhsub builtins.
* config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary
cast.
(vhsub_s16): Likewise.
(vhsub_s32): Likewise.
(vhsub_u8): Use type-qualified builtin and remove casts.
(vhsub_u16): Likewise.
(vhsub_u32): Likewise.
(vhsubq_s8): Remove unnecessary cast.
(vhsubq_s16): Likewise.
(vhsubq_s32): Likewise.
(vhsubq_u8): Use type-qualified builtin and remove casts.
(vhsubq_u16): Likewise.
(vhsubq_u32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
(rounding) halving-add Neon intrinsics. This removes the need for
many casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for u[r]hadd builtins.
* config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary
cast.
(vhadd_s16): Likewise.
(vhadd_s32): Likewise.
(vhadd_u8): Use type-qualified builtin and remove casts.
(vhadd_u16): Likewise.
(vhadd_u32): Likewise.
(vhaddq_s8): Remove unnecessary cast.
(vhaddq_s16): Likewise.
(vhaddq_s32): Likewise.
(vhaddq_u8): Use type-qualified builtin and remove casts.
(vhaddq_u16): Likewise.
(vhaddq_u32): Likewise.
(vrhadd_s8): Remove unnecessary cast.
(vrhadd_s16): Likewise.
(vrhadd_s32): Likewise.
(vrhadd_u8): Use type-qualified builtin and remove casts.
(vrhadd_u16): Likewise.
(vrhadd_u32): Likewise.
(vrhaddq_s8): Remove unnecessary cast.
(vrhaddq_s16): Likewise.
(vrhaddq_s32): Likewise.
(vrhaddq_u8): Use type-wualified builtin and remove casts.
(vrhaddq_u16): Likewise.
(vrhaddq_u32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
widening-subtract Neon intrinsics. This removes the need for many
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for usub[lw][2] builtins.
* config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary
cast.
(vsubl_s16): Likewise.
(vsubl_s32): Likewise.
(vsubl_u8): Use type-qualified builtin and remove casts.
(vsubl_u16): Likewise.
(vsubl_u32): Likewise.
(vsubl_high_s8): Remove unnecessary cast.
(vsubl_high_s16): Likewise.
(vsubl_high_s32): Likewise.
(vsubl_high_u8): Use type-qualified builtin and remove casts.
(vsubl_high_u16): Likewise.
(vsubl_high_u32): Likewise.
(vsubw_s8): Remove unnecessary casts.
(vsubw_s16): Likewise.
(vsubw_s32): Likewise.
(vsubw_u8): Use type-qualified builtin and remove casts.
(vsubw_u16): Likewise.
(vsubw_u32): Likewise.
(vsubw_high_s8): Remove unnecessary cast.
(vsubw_high_s16): Likewise.
(vsubw_high_s32): Likewise.
(vsubw_high_u8): Use type-qualified builtin and remove casts.
(vsubw_high_u16): Likewise.
(vsubw_high_u32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them to implement
widening-add Neon intrinsics. This removes the need for many casts in
arm_neon.h.
gcc/ChangeLog:
2021-11-09 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use BINOPU type
qualifiers in generator macros for uadd[lw][2] builtins.
* config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary
cast.
(vaddl_s16): Likewise.
(vaddl_s32): Likewise.
(vaddl_u8): Use type-qualified builtin and remove casts.
(vaddl_u16): Likewise.
(vaddl_u32): Likewise.
(vaddl_high_s8): Remove unnecessary cast.
(vaddl_high_s16): Likewise.
(vaddl_high_s32): Likewise.
(vaddl_high_u8): Use type-qualified builtin and remove casts.
(vaddl_high_u16): Likewise.
(vaddl_high_u32): Likewise.
(vaddw_s8): Remove unnecessary cast.
(vaddw_s16): Likewise.
(vaddw_s32): Likewise.
(vaddw_u8): Use type-qualified builtin and remove casts.
(vaddw_u16): Likewise.
(vaddw_u32): Likewise.
(vaddw_high_s8): Remove unnecessary cast.
(vaddw_high_s16): Likewise.
(vaddw_high_s32): Likewise.
(vaddw_high_u8): Use type-qualified builtin and remove casts.
(vaddw_high_u16): Likewise.
(vaddw_high_u32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them for [R]SHRN[2]
Neon intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtins for [R]SHRN[2].
* config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified
builtin and remove casts.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare unsigned type-qualified builtins and use them for XTN[2] Neon
intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare unsigned
type-qualified builtins for XTN[2].
* config/aarch64/arm_neon.h (vmovn_high_u16): Use type-
qualified builtin and remove casts.
(vmovn_high_u32): Likewise.
(vmovn_high_u64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare poly type-qualified builtins and use them for PMUL[L] Neon
intrinsics. This removes the need for casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Use poly type
qualifier in builtin generator macros.
* config/aarch64/arm_neon.h (vmul_p8): Use type-qualified
builtin and remove casts.
(vmulq_p8): Likewise.
(vmull_high_p8): Likewise.
(vmull_p8): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Declare type-qualified builtins and use them for MLA/MLS Neon
intrinsics that operate on unsigned types. This eliminates lots of
casts in arm_neon.h.
gcc/ChangeLog:
2021-11-08 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Declare type-
qualified builtin generators for unsigned MLA/MLS intrinsics.
* config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified
builtin.
(vmla_n_u32): Likewise.
(vmla_u8): Likewise.
(vmla_u16): Likewise.
(vmla_u32): Likewise.
(vmlaq_n_u16): Likewise.
(vmlaq_n_u32): Likewise.
(vmlaq_u8): Likewise.
(vmlaq_u16): Likewise.
(vmlaq_u32): Likewise.
(vmls_n_u16): Likewise.
(vmls_n_u32): Likewise.
(vmls_u8): Likewise.
(vmls_u16): Likewise.
(vmls_u32): Likewise.
(vmlsq_n_u16): Likewise.
(vmlsq_n_u32): Likewise.
(vmlsq_u8): Likewise.
(vmlsq_u16): Likewise.
(vmlsq_u32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the end of the backtrace stream _Unwind_Find_FDE() may not be able
to find the frame unwind info and will later call the backtrace fallback
instead of finishing. This occurs when using an old libc on ppc64 due to
dl_iterate_phdr() not being able to set the fde in the last trace.
When this occurs the cfa of the trace will be behind of context's cfa.
Also, libgo’s probestackmaps() calls the backtrace with a null pointer
and can get to the backchain fallback with the same problem, in this case
we are only interested in find a stack map, we don't need nor can do a
backchain.
_Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses
uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP.
libgcc/ChangeLog:
PR libgcc/103044
* config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's
called with a null argument or at the end of the backtrace and return.
* unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends. This plugs the modref differences in handling
looping pure consts which were previously missed due to early exits on
ECF_CONST | ECF_PURE. Those early exists are bit anoying and I think as
a cleanup I may just drop some of them as premature optimizations coming from
time modref was very simplistic on what it propagates.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc): Update.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes sure to only assert we don't run into a asm goto when
inserting a stmt in reassoc, matching the condition in
can_reassociate_p. We can handle EH edges from an asm just like
EH edges from any other stmt.
2021-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/103190
* tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Imports are our nomenclature for external SSA names to a block that
are used to calculate the outgoing edges for said block. For example,
in the following snippet:
<bb 2> :
_1 = b_10 == block_11;
_2 = b_10 != -1;
_3 = _1 & _2;
if (_3 != 0)
goto <bb 3>; [INV]
else
goto <bb 5>; [INV]
...the imports to the block are b_10 and block_11 since they are both
needed to calculate _3.
The path solver takes a bitmap of imports in addition to the path
itself. This sets up the number of SSA names to be on the lookout
for, while resolving the final conditional.
Calculating these imports was initially done in the threader, since it
was the only user of the path solver. With new clients, it has become
obvious that populating the imports should be a task for the path
solver, so it can be shared among the clients.
This patch moves the import code to the solver, making both the solver
and the threader simpler in the process. This is because intent is
clearer and some duplicate code was removed.
This reshuffling had the net effect of giving us a handful of new
threads through my suite of .ii files (125). This was unexpected, but
welcome nevertheless. There is no performance difference in callgrind
over the same suite.
Regstrapped on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::add_copies_to_imports):
Rename to...
(path_range_query::compute_imports): ...this. Adapt it so it can
be passed the imports bitmap instead of working on m_imports.
(path_range_query::compute_ranges): Call compute_imports in all
cases unless an imports bitmap is passed.
* gimple-range-path.h (path_range_query::compute_imports): New.
(path_range_query::add_copies_to_imports): Remove.
* tree-ssa-threadbackward.c (back_threader::resolve_def): Remove.
(back_threader::find_paths_to_names): Inline resolve_def.
(back_threader::find_paths): Call compute_imports.
(back_threader::resolve_phi): Adjust comment.
|
|
|
|
|
|
|
|
|
|
|
|
| |
2021-11-11 Sandra Loosemore <sandra@codesourcery.com>
gcc/testsuite/
* g++.dg/warn/Wmismatched-new-delete-5.C: Add
-fdelete-null-pointer-checks.
* gcc.dg/attr-returns-nonnull.c: Likewise.
* gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2.
* gcc.dg/ifcvt-4.c: Skip on nios2.
* gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following splits loop header copying into an analysis phase
that uses ranger and a transform phase that can do without to avoid
running ranger on IL that has SSA form not updated.
2021-11-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/103188
* tree-ssa-loop-ch.c (should_duplicate_loop_header_p):
Remove query parameter, split out check for size
optimization.
(ch_base::m_ranger, cb_base::m_query): Remove.
(ch_base::copy_headers): Split processing loop into
analysis around which we allocate and use ranger and
transform where we do not.
(pass_ch::execute): Do not allocate/free ranger here.
(pass_ch_vect::execute): Likewise.
* gcc.dg/torture/pr103188.c: New testcase.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.
I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix ipa-pure-const handling of noreturn flags. It is not safe to set it for
interposable symbols and we should also set it for aliases (just like we do for
other flags). This patch merely copies other flag handling and implements it
here.
gcc/ChangeLog:
2021-11-11 Jan Hubicka <hubicka@ucw.cz>
* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.
|
|
|
|
|
|
|
| |
gcc/cp/ChangeLog:
* parser.c (cp_parser_template_argument_list): Use auto_vec
instead of manual memory management.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When thinking about GOMP_teams3, I've realized that using global variables
for the values returned by omp_get_num_teams()/omp_get_team_num() calls
is incorrect even with our right now dumb way of implementing host teams.
The problems are two, one is if host teams is used from multiple pthread_create
created threads - the spec says that host teams can't be nested inside of
explicit parallel or other teams constructs, but with pthread_create the
standard says obviously nothing about it. Another more important thing
is host fallback, right now we don't do anything for omp_get_num_teams()
or omp_get_team_num() which was fine before host teams was introduced and
the 5.1 requirement that num_teams clause specifies minimum of teams, but
with the global vars it means inside of target teams num_teams (2) we happily
return omp_get_num_teams() == 4 if the target teams is inside of host teams
with num_teams(4). With target fallback being invoked from parallel
regions global vars simply can't work right on the host.
So, this patch moves them to struct gomp_thread and propagates those for
parallel to child threads. For host fallback, the implicit zeroing of
*thr results in us returning omp_get_num_teams () == 1 and
omp_get_team_num () == 0 which is fine for target teams without num_teams
clause, for target teams with num_teams clause something to work on and
for target without teams nested in it I've asked on omp-lang what should
be done.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
* libgomp.h (struct gomp_thread): Add num_teams and team_num members.
* team.c (struct gomp_thread_start_data): Likewise.
(gomp_thread_start): Initialize thr->num_teams and thr->team_num.
(gomp_team_start): Initialize start_data->num_teams and
start_data->team_num. Update nthr->num_teams and nthr->team_num.
* teams.c (gomp_num_teams, gomp_team_num): Remove.
(GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num
instead of gomp_num_teams and gomp_team_num.
(omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams.
(omp_get_team_num): Use thr->team_num instead of gomp_team_num.
* testsuite/libgomp.c/teams-4.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a known failure for gfortran.dg/vector_subscript_1.f90. It
was previously failing for all optimization levels except -Os.
Getting the loop header copying right, now makes it fail for all
levels :-).
Tested on x86-64 Linux.
Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:
* tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve
statically to the edge remaining in the loop.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.
2021-11-11 Richard Biener <rguenther@suse.de>
PR middle-end/103181
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division. Separate floating point and integer division code.
* gcc.dg/torture/pr103181.c: New testcase.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code
to deal with it since many years ago (see it e.g. in GCC 3.2, although it
used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints).
But that code apparently isn't able to cope with members with empty class
types with [[no_unique_address]] attribute, because the empty classes have
non-zero type size but zero decl size and so one can end up from the
computation with negative offset or offset 1 byte smaller than it should be.
For !PCC_BITFIELD_TYPE_MATTERS, we just use
tree_result = byte_position (decl);
which seems exactly right even for the empty classes or anything which is
not a bitfield (and for which we don't add DW_AT_bit_offset attribute).
So, instead of trying to handle those no_unique_address members in the
current already very complicated code, this limits it to bitfields.
stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only
bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE.
As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because
DECL_BIT_FIELD might be cleared for some bitfields with bitsizes
multiple of BITS_PER_UNIT and e.g.
struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s;
struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t;
int
main ()
{
s.c = 0x55;
s.d = 0xaaaa;
t.c = 0x55;
t.d = 0xaaaa;
s.e++;
}
has different debug info with DECL_BIT_FIELD check.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
PR debug/101378
* dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS
handling only for DECL_BIT_FIELD_TYPE decls.
* g++.dg/debug/dwarf2/pr101378.C: New test.
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc/ChangeLog:
PR target/102376
* config/aarch64/aarch64.c (aarch64_process_target_attr): Check if
token is arch extension without leading '+' and emit appropriate
diagnostic for the same.
gcc/testsuite/ChangeLog:
PR target/102376
* gcc.target/aarch64/pr102376.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In OpenMP 5.1, num_teams clause can accept either one expression as before,
but it in that case changed meaning, rather than create <= expression
teams it is now create == expression teams. Or it accepts two expressions
separated by :, with the meaning that the first is low bound and second upper
bound on how many teams should be created. The other ways to set number of
teams are upper bounds with lower bound of 1.
The following patch does parsing of this for C/C++. For host teams, we
actually don't need to do anything further right now, we always create
(pretend to create) exactly the requested number of teams, so we can just
evaluate and throw away the lower bound for now.
For teams nested in target, we don't guarantee that though and further
work will be needed.
In particular, omplower now turns the teams part of:
struct S { S (); S (const S &); ~S (); int s; };
void bar (S &, S &);
int baz ();
_Pragma ("omp declare target to (baz)");
void
foo (void)
{
S a, b;
#pragma omp target private (a) map (b)
{
#pragma omp teams firstprivate (b) num_teams (baz ())
{
bar (a, b);
}
}
}
into:
retval.0 = baz ();
retval.1 = retval.0;
{
unsigned int retval.3;
struct S * D.2549;
struct S b;
retval.3 = (unsigned int) retval.1;
D.2549 = .omp_data_i->b;
S::S (&b, D.2549);
#pragma omp teams num_teams(retval.1) firstprivate(b) shared(a)
__builtin_GOMP_teams (retval.3, 0);
{
bar (&a, &b);
}
S::~S (&b);
#pragma omp return(nowait)
}
IMHO we want a new API, say GOMP_teams3 which will take 3 arguments
instead of 2 (the lower and upper bounds from num_teams and thread_limit)
and will return a bool whether it should do the teams body or not.
And, we should add right before outermost {} above
while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0))
and remove the __builtin_GOMP_teams call. The current function performs
exit equivalent (at least on NVPTX) which seems bad because that means
the destructors of e.g. private variables on target aren't invoked, and
at the current placement neither destructors of the already constructed
privatized variables in teams.
I'll do this next on the compiler side, but I'm afraid I'll need help
with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be
able to use %ctaid.x . I think ideal would be to use a .shared
integer variable for the omp_get_team_num value, but I don't have any
experience with that, are .shared variables zero initialized by default,
or do they have random value at start? PTX docs say they aren't initializable.
2021-11-11 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ...
(OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this.
(OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define.
* tree.c (omp_clause_num_ops): Increase num ops for
OMP_CLAUSE_NUM_TEAMS to 2.
* tree-pretty-print.c (dump_omp_clause): Print optional lower bound
for OMP_CLAUSE_NUM_TEAMS.
* gimplify.c (gimplify_scan_omp_clauses): Gimplify
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL.
(optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead
of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
* omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR
instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
* omp-expand.c (expand_teams_call, get_target_arguments): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/cp/
* parser.c (cp_parser_omp_clause_num_teams): Parse optional
lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR.
Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of
OMP_CLAUSE_NUM_TEAMS_EXPR.
(cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
* semantics.c (finish_omp_clauses): Handle
OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause.
* pt.c (tsubst_omp_clauses): Likewise.
(tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before
combined target teams even lower-bound expression.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use
OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression
to half of the num_teams clauses.
* c-c++-common/gomp/num-teams-1.c: New test.
* c-c++-common/gomp/num-teams-2.c: New test.
* g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression
to half of the num_teams clauses.
* g++.dg/gomp/attrs-2.C (bar): Likewise.
* g++.dg/gomp/num-teams-1.C: New test.
* g++.dg/gomp/num-teams-2.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/teams-1.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes now useless wrappers around get_immediate_dominator.
2021-11-11 Richard Biener <rguenther@suse.de>
* cfganal.c (find_pdom): Remove.
(control_dependences::find_control_dependence): Remove
special-casing of entry block, call get_immediate_dominator
directly.
* gimple-predicate-analysis.cc (find_pdom): Remove.
(find_dom): Likewise.
(find_control_equiv_block): Call get_immediate_dominator
directly.
(compute_control_dep_chain): Likewise.
(predicate::init_from_phi_def): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the control dependence compute avoid a find_edge
and optimizes allocation by embedding the bitmap head into the
vector of control dependences instead of allocating all of them.
It also uses a local bitmap obstack.
The bitmap changes make it necessary to shuffle some includes.
2021-11-10 Richard Biener <rguenther@suse.de>
* cfganal.h (control_dependences::control_dependence_map):
Embed bitmap_head.
(control_dependences::m_bitmaps): New.
* cfganal.c (control_dependences::set_control_dependence_map_bit):
Adjust.
(control_dependences::clear_control_dependence_bitmap):
Likewise.
(control_dependences::find_control_dependence): Do not
find_edge for the abnormal edge test.
(control_dependences::control_dependences): Instead do not
add abnormal edges to the edge list. Adjust.
(control_dependences::~control_dependences): Likewise.
(control_dependences::get_edges_dependent_on): Likewise.
* function-tests.c: Include bitmap.h.
gcc/analyzer/
* supergraph.cc: Include bitmap.h.
gcc/c/
* gimple-parser.c: Shuffle bitmap.h include.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commmit 5d9d0c94588 renamed future to power10 and ace60939fd2
updated the documentation for "future" renaming. This patch
is to rename the remaining "future architecture" references in
documentation and polish the words for float128.
gcc/ChangeLog:
* doc/invoke.texi: Change references to "future cpu" to "power10",
"-mcpu=future" to "-mcpu=power10". Adjust words for float128.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support
Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and
UMONITOR/UMWAIT/TPAUSE are supported.
gcc/ChangeLog
* config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake
from m_CORE_AVX2.
(processor_cost_table): Use alderlake_cost for Alderlake.
* config/i386/i386.c (ix86_sched_init_global): Handle Alderlake.
* config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake
cost.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake
issue rate to 4.
(ix86_adjust_cost): Handle Alderlake.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise.
(X86_TUNE_MEMORY_MISMATCH_STALL): Likewise.
(X86_TUNE_USE_LEAVE): Likewise.
(X86_TUNE_PUSH_MEMORY): Likewise.
(X86_TUNE_USE_INCDEC): Likewise.
(X86_TUNE_INTEGER_DFMODE_MOVES): Likewise.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise.
(X86_TUNE_USE_SAHF): Likewise.
(X86_TUNE_USE_BT): Likewise.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise.
(X86_TUNE_ONE_IF_CONV_INSN): Likewise.
(X86_TUNE_AVOID_MFENCE): Likewise.
(X86_TUNE_USE_SIMODE_FIOP): Likewise.
(X86_TUNE_EXT_80387_CONSTANTS): Likewise.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
(X86_TUNE_SSE_TYPELESS_STORES): Likewise.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise.
(X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise.
(X86_TUNE_USE_GATHER): Disable for Alderlake.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise.
(X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc/ChangeLog:
PR target/103151
* config/i386/sse.md (V_128_256): Extend to V8HF/V16HF.
(avxsizesuffix): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr103151.c: New test.
|
|
|
|
|
|
|
|
|
| |
This issue cause zifencei never correctly appended on the ISA string.
gcc/ChangeLog
* common/config/riscv/riscv-common.c (riscv_subset_list::to_string): Fix
wrong marco checking.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in the PR, the loop header copying pass avoids doing so
when optimizing for size. However, sometimes we can determine the
loop entry conditional statically for the first iteration of the loop.
This patch uses the path solver to determine the outgoing edge
out of preheader->header->xx. If so, it allows header copying. Doing
this in the loop optimizer saves us from doing gymnastics in the
threader which doesn't have the context to determine if a loop
transformation is profitable.
I am only returning true in entry_loop_condition_is_static for
a true conditional. Technically a false conditional is also
provably static, but allowing any boolean value causes a regression
in gfortran.dg/vector_subscript_1.f90.
I would have preferred not passing around the query object, but the
layout of pass_ch and should_duplicate_loop_header_p make it a bit
awkward to get it right without an outright refactor to the
pass.
Tested on x86-64 Linux.
gcc/ChangeLog:
PR tree-optimization/102906
* tree-ssa-loop-ch.c (entry_loop_condition_is_static): New.
(should_duplicate_loop_header_p): Call entry_loop_condition_is_static.
(class ch_base): Add m_ranger and m_query.
(ch_base::copy_headers): Pass m_query to
entry_loop_condition_is_static.
(pass_ch::execute): Allocate and deallocate m_ranger and
m_query.
(pass_ch_vect::execute): Same.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr102906.c: New test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The problem here is aarch64_simd_dup<mode> use
the vw iterator rather than vwcore iterator. This causes
problems for the V4SF and V2DF modes. I changed both of
aarch64_simd_dup<mode> patterns to be consistent.
Committed as obvious after a bootstrap/test on aarch64-linux-gnu.
PR target/103170
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>):
Use vwcore iterator for the r constraint output string.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/vector-dup-1.c: New test.
|