delta/gcc.git - gcc.gnu.org: git/gcc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Daily bump.	GCC Administrator	2021-11-12	10	-1/+736
\|
*	libstdc++: Fix debug containers for C++98 mode	Jonathan Wakely	2021-11-11	8	-79/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since r12-5072 made _Safe_container::operator=(const _Safe_container&) protected, the debug containers no longer compile in C++98 mode. They have user-provided copy assignment operators in C++98 mode, and they assign each base class in turn. The 'this->_M_safe() = __x' expressions fail, because calling a protected member function is only allowed via 'this'. They could be fixed by using this->_Safe::operator=(__x) but a simpler solution is to just remove the user-provided assignment operators and let the compiler define them (as we do for C++11 and later, by defining them as defaulted). The only change needed for that to work is to define the _Safe_vector copy assignment operator in C++98 mode, so that the implicit __gnu_debug::vector::operator= definition will call it, instead of needing to call _M_update_guaranteed_capacity() manually. libstdc++-v3/ChangeLog: * include/debug/deque (deque::operator=(const deque&)): Remove definition. * include/debug/list (list::operator=(const list&)): Likewise. * include/debug/map.h (map::operator=(const map&)): Likewise. * include/debug/multimap.h (multimap::operator=(const multimap&)): Likewise. * include/debug/multiset.h (multiset::operator=(const multiset&)): Likewise. * include/debug/set.h (set::operator=(const set&)): Likewise. * include/debug/string (basic_string::operator=(const basic_string&)): Likewise. * include/debug/vector (vector::operator=(const vector&)): Likewise. (_Safe_vector::operator=(const _Safe_vector&)): Define for C++98 as well.
*	Make ranger optional in path_range_query.	Aldy Hernandez	2021-11-11	4	-24/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All users of path_range_query are currently allocating a gimple_ranger only to pass it to the query object. It's tidier to just do it from path_range_query if no ranger was passed. Tested on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::path_range_query): New ctor without a ranger. (path_range_query::~path_range_query): Free ranger if necessary. (path_range_query::range_on_path_entry): Adjust m_ranger for pointer. (path_range_query::ssa_range_in_phi): Same. (path_range_query::compute_ranges_in_block): Same. (path_range_query::compute_imports): Same. (path_range_query::compute_ranges): Same. (path_range_query::range_of_stmt): Same. (path_range_query::compute_outgoing_relations): Same. * gimple-range-path.h (class path_range_query): New ctor. * tree-ssa-loop-ch.c (ch_base::copy_headers): Remove gimple_ranger as path_range_query allocates one. * tree-ssa-threadbackward.c (class back_threader): Remove m_ranger. (back_threader::~back_threader): Same.
*	Remove loop crossing restriction from the backward threader.	Aldy Hernandez	2021-11-11	1	-30/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have much more thorough restrictions, that are shared between both threader implementations, in the registry. I've been meaning to remove the backward threader one, since it's only purpose was reducing the search space. Previously there was a small time penalty for its removal, but with the various patches in the past month, it looks like the removal is a wash performance wise. This catches 8 more jump threads in the backward threader in my suite. Presumably, because we disallowed all loop crossing, whereas the registry restrictions allow some crossing (if we exit the loop, etc). Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Remove loop crossing restriction.
*	rs6000: Fix test_mffsl.c to require Power9 support	Bill Schmidt	2021-11-11	1	-1/+2
\| \| \| \| \| \| \|	2021-11-11 Bill Schmidt <wschmidt@linux.ibm.com> gcc/testsuite/ * gcc.target/powerpc/test_mffsl.c: Require Power9.
*	compiler: traverse func subexprs when creating func descriptors	Ian Lance Taylor	2021-11-11	4	-11/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the Create_func_descriptors pass to traverse the subexpressions of the function in a Call_expression. There are no subexpressions in the normal case of calling a function a method directly, but there are subexpressions when in code like F().M() when F returns an interface type. Forgetting to traverse the function subexpressions was almost entirely hidden by the fact that we also created the necessary thunks in Bound_method_expression::do_flatten and Interface_field_reference_expression::do_get_backend. However, when the thunks were created there, they did not go through the order_evaluations pass. This almost always worked, but failed in the case in which the function being thunked returned multiple results, as order_evaluations takes the necessary step of moving the Call_expression into its own statement, and that would not happen when order_evaluations was not called. Avoid hiding errors like this by changing those methods to only lookup the previously created thunk, rather than creating it if it was not already created. The test case for this is https://golang.org/cl/363156. Fixes https://golang.org/issue/49512 Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/363274
*	libstdc++: Make pmr::memory_resource::allocate implicitly create objects	Jonathan Wakely	2021-11-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Calling the placement version of ::operator new "implicitly creates objects in the returned region of storage" as per [intro.object]. This allows the returned memory to be used as storage for implicit-lifetime types (including arrays) without additional action by the caller. This is required by the proposed resolution of LWG 3147. libstdc++-v3/ChangeLog: * include/std/memory_resource (memory_resource::allocate): Implicitly create objects in the returned storage.
*	libstdc++: Remove public std::vector<bool>::data() member	Jonathan Wakely	2021-11-11	1	-9/+13
\| \| \| \| \| \| \| \| \| \|	This function only exists to avoid an error in the debug mode vector, so doesn't need to be public. libstdc++-v3/ChangeLog: * include/bits/stl_bvector.h (vector<bool>::data()): Give protected access, and delete for C++11 and later.
*	Fix gfortran.dg/inline_matmul_17.f90 template.	Jan Hubicka	2021-11-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	As discussed on the mailing list the template actually tests for missed optimization where we fail to pragate size of an array. We no longer miss this after modref improvements. gcc/testsuite/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * gfortran.dg/inline_matmul_17.f90: Fix template
*	Enable pure-const discovery in modref.	Jan Hubicka	2021-11-11	10	-145/+204
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We newly can handle some extra cases, for example: struct a {int a,b,c;}; __attribute__ ((noinline)) int init (struct a a) { a->a=1; a->b=2; a->c=3; } int const_fn () { struct a a; init (&a); return a.a + a.b + a.c; } Here pure/const stops on the fact that const_fn calls non-const init, while modref knows that the memory it initializes is local to const_fn. I ended up reordering passes so early modref is done after early pure-const mostly to avoid need to change testsuite which greps for const functions being detects in pure-const. Stil some testuiste compensation is needed. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> ipa-modref.c (analyze_function): Do pure/const discovery, return true on success. (pass_modref::execute): If pure/const is discovered fixup cfg. (ignore_edge): Do not ignore pure/const edges. (modref_propagate_in_scc): Do pure/const discovery, return true if cdtor was promoted pure/const. (pass_ipa_modref::execute): If needed remove unreachable functions. * ipa-pure-const.c (warn_function_noreturn): Fix whitespace. (warn_function_cold): Likewise. (skip_function_for_local_pure_const): Move earlier. (ipa_make_function_const): Break out from ... (ipa_make_function_pure): Break out from ... (propagate_pure_const): ... here. (pass_local_pure_const::execute): Use it. * ipa-utils.h (ipa_make_function_const): Declare. (ipa_make_function_pure): Declare. * passes.def: Move early modref after pure-const. gcc/testsuite/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * c-c++-common/tm/inline-asm.c: Disable pure-const. * g++.dg/ipa/modref-1.C: Update template. * gcc.dg/tree-ssa/modref-11.c: Disable pure-const. * gcc.dg/tree-ssa/modref-14.c: New test. * gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls. * gfortran.dg/do_subscript_3.f90: Add -O0.
*	diagnostic: fix unused variable 'def_tabstop' [PR103129]	David Malcolm	2021-11-11	1	-1/+1
\| \| \| \| \| \| \| \|	gcc/ChangeLog: PR other/103129 * diagnostic-show-locus.c (def_policy): Use def_tabstop. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
*	Fortran/openmp: Add support for 2 argument num_teams clause	Tobias Burnus	2021-11-11	8	-19/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fortran part to commit r12-5146-g48d7327f2aaf65 gcc/fortran/ChangeLog: * gfortran.h (struct gfc_omp_clauses): Rename num_teams to num_teams_upper, add num_teams_upper. * dump-parse-tree.c (show_omp_clauses): Update to handle lower-bound num_teams clause. * frontend-passes.c (gfc_code_walker): Likewise * openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses, resolve_omp_clauses): Likewise. * trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses, gfc_trans_omp_target): Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/teams-1.f90: New test.
*	aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics	Jonathan Wright	2021-11-11	4	-30/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned and polynomial type-qualified builtins for vcombine_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete. (TYPES_COMBINEP): Delete. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for vcombine_* intrinsics. * config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary cast. (vcombine_s16): Likewise. (vcombine_s32): Likewise. (vcombine_f32): Likewise. (vcombine_u8): Use type-qualified builtin and remove casts. (vcombine_u16): Likewise. (vcombine_u32): Likewise. (vcombine_u64): Likewise. (vcombine_p8): Likewise. (vcombine_p16): Likewise. (vcombine_p64): Likewise. (vcombine_bf16): Remove unnecessary cast. * config/aarch64/iterators.md (VD_I): New mode iterator. (VDC_P): New mode iterator.
*	aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics	Jonathan Wright	2021-11-11	4	-79/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned and polynomial type-qualified builtins for LD1/ST1 Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. The new type-qualified builtins are also lowered to gimple - as the unqualified builtins are already. gcc/ChangeLog: 2021-11-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define. (TYPES_LOAD1_P): Define. (TYPES_STORE1_U): Define. (TYPES_STORE1P): Rename to... (TYPES_STORE1_P): This. (get_mem_type_for_load_store): Add unsigned and poly types. (aarch64_general_gimple_fold_builtin): Add unsigned and poly type-qualified builtin declarations. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for LD1/ST1. * config/aarch64/arm_neon.h (vld1_p8): Use type-qualified builtin and remove cast. (vld1_p16): Likewise. (vld1_u8): Likewise. (vld1_u16): Likewise. (vld1_u32): Likewise. (vld1q_p8): Likewise. (vld1q_p16): Likewise. (vld1q_p64): Likewise. (vld1q_u8): Likewise. (vld1q_u16): Likewise. (vld1q_u32): Likewise. (vld1q_u64): Likewise. (vst1_p8): Likewise. (vst1_p16): Likewise. (vst1_u8): Likewise. (vst1_u16): Likewise. (vst1_u32): Likewise. (vst1q_p8): Likewise. (vst1q_p16): Likewise. (vst1q_p64): Likewise. (vst1q_u8): Likewise. (vst1q_u16): Likewise. (vst1q_u32): Likewise. (vst1q_u64): Likewise. * config/aarch64/iterators.md (VALLP_NO_DI): New iterator.
*	aarch64: Use type-qualified builtins for ADDV Neon intrinsics	Jonathan Wright	2021-11-11	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement the vector reduction Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for vector reduction. * config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified builtin and remove casts. (vaddv_u16): Likewise. (vaddv_u32): Likewise. (vaddvq_u8): Likewise. (vaddvq_u16): Likewise. (vaddvq_u32): Likewise. (vaddvq_u64): Likewise.
*	aarch64: Use type-qualified builtins for ADDP Neon intrinsics	Jonathan Wright	2021-11-11	2	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement the pairwise addition Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: * config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified builtin and remove casts. (vpaddq_u16): Likewise. (vpaddq_u32): Likewise. (vpaddq_u64): Likewise. (vpadd_u8): Likewise. (vpadd_u16): Likewise. (vpadd_u32): Likewise. (vpaddd_u64): Likewise.
*	aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-42/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement (rounding) halving-narrowing-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]subhn[2]. * config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary cast. (vsubhn_s32): Likewise. (vsubhn_s64): Likewise. (vsubhn_u16): Use type-qualified builtin and remove casts. (vsubhn_u32): Likewise. (vsubhn_u64): Likewise. (vrsubhn_s16): Remove unnecessary cast. (vrsubhn_s32): Likewise. (vrsubhn_s64): Likewise. (vrsubhn_u16): Use type-qualified builtin and remove casts. (vrsubhn_u32): Likewise. (vrsubhn_u64): Likewise. (vrsubhn_high_s16): Remove unnecessary cast. (vrsubhn_high_s32): Likewise. (vrsubhn_high_s64): Likewise. (vrsubhn_high_u16): Use type-qualified builtin and remove casts. (vrsubhn_high_u32): Likewise. (vrsubhn_high_u64): Likewise. (vsubhn_high_s16): Remove unnecessary cast. (vsubhn_high_s32): Likewise. (vsubhn_high_s64): Likewise. (vsubhn_high_u16): Use type-qualified builtin and remove casts. (vsubhn_high_u32): Likewise. (vsubhn_high_u64): Likewise.
*	aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-42/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement (rounding) halving-narrowing-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]addhn[2]. * config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary cast. (vaddhn_s32): Likewise. (vaddhn_s64): Likewise. (vaddhn_u16): Use type-qualified builtin and remove casts. (vaddhn_u32): Likewise. (vaddhn_u64): Likewise. (vraddhn_s16): Remove unnecessary cast. (vraddhn_s32): Likewise. (vraddhn_s64): Likewise. (vraddhn_u16): Use type-qualified builtin and remove casts. (vraddhn_u32): Likewise. (vraddhn_u64): Likewise. (vaddhn_high_s16): Remove unnecessary cast. (vaddhn_high_s32): Likewise. (vaddhn_high_s64): Likewise. (vaddhn_high_u16): Use type-qualified builtin and remove casts. (vaddhn_high_u32): Likewise. (vaddhn_high_u64): Likewise. (vraddhn_high_s16): Remove unnecessary cast. (vraddhn_high_s32): Likewise. (vraddhn_high_s64): Likewise. (vraddhn_high_u16): Use type-qualified builtin and remove casts. (vraddhn_high_u32): Likewise. (vraddhn_high_u64): Likewise.
*	aarch64: Use type-qualified builtins for UHSUB Neon intrinsics	Jonathan Wright	2021-11-11	2	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement halving-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uhsub builtins. * config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary cast. (vhsub_s16): Likewise. (vhsub_s32): Likewise. (vhsub_u8): Use type-qualified builtin and remove casts. (vhsub_u16): Likewise. (vhsub_u32): Likewise. (vhsubq_s8): Remove unnecessary cast. (vhsubq_s16): Likewise. (vhsubq_s32): Likewise. (vhsubq_u8): Use type-qualified builtin and remove casts. (vhsubq_u16): Likewise. (vhsubq_u32): Likewise.
*	aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics	Jonathan Wright	2021-11-11	2	-38/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement (rounding) halving-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for u[r]hadd builtins. * config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary cast. (vhadd_s16): Likewise. (vhadd_s32): Likewise. (vhadd_u8): Use type-qualified builtin and remove casts. (vhadd_u16): Likewise. (vhadd_u32): Likewise. (vhaddq_s8): Remove unnecessary cast. (vhaddq_s16): Likewise. (vhaddq_s32): Likewise. (vhaddq_u8): Use type-qualified builtin and remove casts. (vhaddq_u16): Likewise. (vhaddq_u32): Likewise. (vrhadd_s8): Remove unnecessary cast. (vrhadd_s16): Likewise. (vrhadd_s32): Likewise. (vrhadd_u8): Use type-qualified builtin and remove casts. (vrhadd_u16): Likewise. (vrhadd_u32): Likewise. (vrhaddq_s8): Remove unnecessary cast. (vrhaddq_s16): Likewise. (vrhaddq_s32): Likewise. (vrhaddq_u8): Use type-wualified builtin and remove casts. (vrhaddq_u16): Likewise. (vrhaddq_u32): Likewise.
*	aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-40/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement widening-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for usub[lw][2] builtins. * config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary cast. (vsubl_s16): Likewise. (vsubl_s32): Likewise. (vsubl_u8): Use type-qualified builtin and remove casts. (vsubl_u16): Likewise. (vsubl_u32): Likewise. (vsubl_high_s8): Remove unnecessary cast. (vsubl_high_s16): Likewise. (vsubl_high_s32): Likewise. (vsubl_high_u8): Use type-qualified builtin and remove casts. (vsubl_high_u16): Likewise. (vsubl_high_u32): Likewise. (vsubw_s8): Remove unnecessary casts. (vsubw_s16): Likewise. (vsubw_s32): Likewise. (vsubw_u8): Use type-qualified builtin and remove casts. (vsubw_u16): Likewise. (vsubw_u32): Likewise. (vsubw_high_s8): Remove unnecessary cast. (vsubw_high_s16): Likewise. (vsubw_high_s32): Likewise. (vsubw_high_u8): Use type-qualified builtin and remove casts. (vsubw_high_u16): Likewise. (vsubw_high_u32): Likewise.
*	aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-40/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them to implement widening-add Neon intrinsics. This removes the need for many casts in arm_neon.h. gcc/ChangeLog: 2021-11-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uadd[lw][2] builtins. * config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary cast. (vaddl_s16): Likewise. (vaddl_s32): Likewise. (vaddl_u8): Use type-qualified builtin and remove casts. (vaddl_u16): Likewise. (vaddl_u32): Likewise. (vaddl_high_s8): Remove unnecessary cast. (vaddl_high_s16): Likewise. (vaddl_high_s32): Likewise. (vaddl_high_u8): Use type-qualified builtin and remove casts. (vaddl_high_u16): Likewise. (vaddl_high_u32): Likewise. (vaddw_s8): Remove unnecessary cast. (vaddw_s16): Likewise. (vaddw_s32): Likewise. (vaddw_u8): Use type-qualified builtin and remove casts. (vaddw_u16): Likewise. (vaddw_u32): Likewise. (vaddw_high_s8): Remove unnecessary cast. (vaddw_high_s16): Likewise. (vaddw_high_s32): Likewise. (vaddw_high_u8): Use type-qualified builtin and remove casts. (vaddw_high_u16): Likewise. (vaddw_high_u32): Likewise.
*	aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them for [R]SHRN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for [R]SHRN[2]. * config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified builtin and remove casts. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. (vrshrn_high_n_u16): Likewise. (vrshrn_high_n_u32): Likewise. (vrshrn_high_n_u64): Likewise. (vrshrn_n_u16): Likewise. (vrshrn_n_u32): Likewise. (vrshrn_n_u64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise.
*	aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics	Jonathan Wright	2021-11-11	2	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare unsigned type-qualified builtins and use them for XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare unsigned type-qualified builtins for XTN[2]. * config/aarch64/arm_neon.h (vmovn_high_u16): Use type- qualified builtin and remove casts. (vmovn_high_u32): Likewise. (vmovn_high_u64): Likewise. (vmovn_u16): Likewise. (vmovn_u32): Likewise. (vmovn_u64): Likewise.
*	aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics	Jonathan Wright	2021-11-11	2	-11/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare poly type-qualified builtins and use them for PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use poly type qualifier in builtin generator macros. * config/aarch64/arm_neon.h (vmul_p8): Use type-qualified builtin and remove casts. (vmulq_p8): Likewise. (vmull_high_p8): Likewise. (vmull_p8): Likewise.
*	aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics	Jonathan Wright	2021-11-11	2	-60/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Declare type-qualified builtins and use them for MLA/MLS Neon intrinsics that operate on unsigned types. This eliminates lots of casts in arm_neon.h. gcc/ChangeLog: 2021-11-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtin generators for unsigned MLA/MLS intrinsics. * config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified builtin. (vmla_n_u32): Likewise. (vmla_u8): Likewise. (vmla_u16): Likewise. (vmla_u32): Likewise. (vmlaq_n_u16): Likewise. (vmlaq_n_u32): Likewise. (vmlaq_u8): Likewise. (vmlaq_u16): Likewise. (vmlaq_u32): Likewise. (vmls_n_u16): Likewise. (vmls_n_u32): Likewise. (vmls_u8): Likewise. (vmls_u16): Likewise. (vmls_u32): Likewise. (vmlsq_n_u16): Likewise. (vmlsq_n_u32): Likewise. (vmlsq_u8): Likewise. (vmlsq_u16): Likewise. (vmlsq_u32): Likewise.
*	libgcc: Fix backtrace fallback on PowerPC Big-endian	Raphael Moreira Zinsly	2021-11-11	2	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the end of the backtrace stream _Unwind_Find_FDE() may not be able to find the frame unwind info and will later call the backtrace fallback instead of finishing. This occurs when using an old libc on ppc64 due to dl_iterate_phdr() not being able to set the fde in the last trace. When this occurs the cfa of the trace will be behind of context's cfa. Also, libgo’s probestackmaps() calls the backtrace with a null pointer and can get to the backchain fallback with the same problem, in this case we are only interested in find a stack map, we don't need nor can do a backchain. _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP. libgcc/ChangeLog: PR libgcc/103044 * config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's called with a null argument or at the end of the backtrace and return. * unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP.
*	Fix some side cases of side effects discovery	Jan Hubicka	2021-11-11	1	-39/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I wrote script comparing modref pure/const discovery with ipa-pure-const and found mistakes on both ends. This plugs the modref differences in handling looping pure consts which were previously missed due to early exits on ECF_CONST \| ECF_PURE. Those early exists are bit anoying and I think as a cleanup I may just drop some of them as premature optimizations coming from time modref was very simplistic on what it propagates. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (modref_summary::useful_p): Check also for side-effects with looping const/pure. (modref_summary_lto::useful_p): Likewise. (merge_call_side_effects): Merge side effects before early exit for pure/const. (process_fnspec): Also handle pure functions. (analyze_call): Do not early exit on looping pure const. (propagate_unknown_call): Also handle nontrivial SCC as side-effect. (modref_propagate_in_scc): Update.
*	tree-optimization/103190 - fix assert in reassoc stmt placement with asm	Richard Biener	2021-11-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This makes sure to only assert we don't run into a asm goto when inserting a stmt in reassoc, matching the condition in can_reassociate_p. We can handle EH edges from an asm just like EH edges from any other stmt. 2021-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/103190 * tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto.
*	Move import population from threader to path solver.	Aldy Hernandez	2021-11-11	3	-64/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Imports are our nomenclature for external SSA names to a block that are used to calculate the outgoing edges for said block. For example, in the following snippet: <bb 2> : _1 = b_10 == block_11; _2 = b_10 != -1; _3 = _1 & _2; if (_3 != 0) goto <bb 3>; [INV] else goto <bb 5>; [INV] ...the imports to the block are b_10 and block_11 since they are both needed to calculate _3. The path solver takes a bitmap of imports in addition to the path itself. This sets up the number of SSA names to be on the lookout for, while resolving the final conditional. Calculating these imports was initially done in the threader, since it was the only user of the path solver. With new clients, it has become obvious that populating the imports should be a task for the path solver, so it can be shared among the clients. This patch moves the import code to the solver, making both the solver and the threader simpler in the process. This is because intent is clearer and some duplicate code was removed. This reshuffling had the net effect of giving us a handful of new threads through my suite of .ii files (125). This was unexpected, but welcome nevertheless. There is no performance difference in callgrind over the same suite. Regstrapped on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::add_copies_to_imports): Rename to... (path_range_query::compute_imports): ...this. Adapt it so it can be passed the imports bitmap instead of working on m_imports. (path_range_query::compute_ranges): Call compute_imports in all cases unless an imports bitmap is passed. * gimple-range-path.h (path_range_query::compute_imports): New. (path_range_query::add_copies_to_imports): Remove. * tree-ssa-threadbackward.c (back_threader::resolve_def): Remove. (back_threader::find_paths_to_names): Inline resolve_def. (back_threader::find_paths): Call compute_imports. (back_threader::resolve_phi): Adjust comment.
*	Testsuite: Various fixes for nios2.	Sandra Loosemore	2021-11-11	5	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	2021-11-11 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * g++.dg/warn/Wmismatched-new-delete-5.C: Add -fdelete-null-pointer-checks. * gcc.dg/attr-returns-nonnull.c: Likewise. * gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2. * gcc.dg/ifcvt-4.c: Skip on nios2. * gcc.dg/struct-by-value-1.c: Add -G0 option for nios2.
*	tree-optimization/103188 - avoid running ranger on not-up-to-date SSA	Richard Biener	2021-11-11	2	-32/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following splits loop header copying into an analysis phase that uses ranger and a transform phase that can do without to avoid running ranger on IL that has SSA form not updated. 2021-11-11 Richard Biener <rguenther@suse.de> PR tree-optimization/103188 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Remove query parameter, split out check for size optimization. (ch_base::m_ranger, cb_base::m_query): Remove. (ch_base::copy_headers): Split processing loop into analysis around which we allocate and use ranger and transform where we do not. (pass_ch::execute): Do not allocate/free ranger here. (pass_ch_vect::execute): Likewise. * gcc.dg/torture/pr103188.c: New testcase.
*	Fix recursion discovery in ipa-pure-const	Jan Hubicka	2021-11-11	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We make self recursive functions as looping of fear of endless recursion. This is done correctly for local pure/const and for non-trivial SCCs in callgraph, but for trivial SCCs we miss the flag. I think it is bad decision since infinite recursion will run out of stack, but changing it upsets some testcases and should be done independently. So this patch is fixing current behaviour to be consistent. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * ipa-pure-const.c (propagate_pure_const): Self recursion is a side effects.
*	Fix noreturn discovery.	Jan Hubicka	2021-11-11	3	-3/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix ipa-pure-const handling of noreturn flags. It is not safe to set it for interposable symbols and we should also set it for aliases (just like we do for other flags). This patch merely copies other flag handling and implements it here. gcc/ChangeLog: 2021-11-11 Jan Hubicka <hubicka@ucw.cz> * cgraph.c (set_noreturn_flag_1): New function. (cgraph_node::set_noreturn_flag): New member function * cgraph.h (cgraph_node::set_noreturn_flags): Declare. * ipa-pure-const.c (pass_local_pure_const::execute): Use it.
*	c++: use auto_vec in cp_parser_template_argument_list	Patrick Palka	2021-11-11	1	-27/+8
\| \| \| \| \| \| \|	gcc/cp/ChangeLog: * parser.c (cp_parser_template_argument_list): Use auto_vec instead of manual memory management.
*	libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values	Jakub Jelinek	2021-11-11	4	-9/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.
*	Resolve entry loop condition for the edge remaining in the loop.	Aldy Hernandez	2021-11-11	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a known failure for gfortran.dg/vector_subscript_1.f90. It was previously failing for all optimization levels except -Os. Getting the loop header copying right, now makes it fail for all levels :-). Tested on x86-64 Linux. Co-authored-by: Richard Biener <rguenther@suse.de> gcc/ChangeLog: * tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve statically to the edge remaining in the loop.
*	middle-end/103181 - fix operation_could_trap_p for vector division	Richard Biener	2021-11-11	2	-5/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For integer vector division we only checked for all zero vector constants rather than checking whether any element in the constant vector is zero. 2021-11-11 Richard Biener <rguenther@suse.de> PR middle-end/103181 * tree-eh.c (operation_could_trap_helper_p): Properly check vector constants for a zero element for integer division. Separate floating point and integer division code. * gcc.dg/torture/pr103181.c: New testcase.
*	dwarf2out: Fix up field_byte_offset [PR101378]	Jakub Jelinek	2021-11-11	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For PCC_BITFIELD_TYPE_MATTERS field_byte_offset has quite large code to deal with it since many years ago (see it e.g. in GCC 3.2, although it used to be on HOST_WIDE_INTs, then on double_ints, now on offset_ints). But that code apparently isn't able to cope with members with empty class types with [[no_unique_address]] attribute, because the empty classes have non-zero type size but zero decl size and so one can end up from the computation with negative offset or offset 1 byte smaller than it should be. For !PCC_BITFIELD_TYPE_MATTERS, we just use tree_result = byte_position (decl); which seems exactly right even for the empty classes or anything which is not a bitfield (and for which we don't add DW_AT_bit_offset attribute). So, instead of trying to handle those no_unique_address members in the current already very complicated code, this limits it to bitfields. stor-layout.c PCC_BITFIELD_TYPE_MATTERS handling also affects only bitfields, twice it checks DECL_BIT_FIELD and once DECL_BIT_FIELD_TYPE. As discussed, this patch uses DECL_BIT_FIELD_TYPE check, because DECL_BIT_FIELD might be cleared for some bitfields with bitsizes multiple of BITS_PER_UNIT and e.g. struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s; struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t; int main () { s.c = 0x55; s.d = 0xaaaa; t.c = 0x55; t.d = 0xaaaa; s.e++; } has different debug info with DECL_BIT_FIELD check. 2021-11-11 Jakub Jelinek <jakub@redhat.com> PR debug/101378 * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS handling only for DECL_BIT_FIELD_TYPE decls. * g++.dg/debug/dwarf2/pr101378.C: New test.
*	[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr.	Prathamesh Kulkarni	2021-11-11	2	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \|	gcc/ChangeLog: PR target/102376 * config/aarch64/aarch64.c (aarch64_process_target_attr): Check if token is arch extension without leading '+' and emit appropriate diagnostic for the same. gcc/testsuite/ChangeLog: PR target/102376 * gcc.target/aarch64/pr102376.c: New test.
*	openmp: Add support for 2 argument num_teams clause	Jakub Jelinek	2021-11-11	19	-134/+582
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In OpenMP 5.1, num_teams clause can accept either one expression as before, but it in that case changed meaning, rather than create <= expression teams it is now create == expression teams. Or it accepts two expressions separated by :, with the meaning that the first is low bound and second upper bound on how many teams should be created. The other ways to set number of teams are upper bounds with lower bound of 1. The following patch does parsing of this for C/C++. For host teams, we actually don't need to do anything further right now, we always create (pretend to create) exactly the requested number of teams, so we can just evaluate and throw away the lower bound for now. For teams nested in target, we don't guarantee that though and further work will be needed. In particular, omplower now turns the teams part of: struct S { S (); S (const S &); ~S (); int s; }; void bar (S &, S &); int baz (); _Pragma ("omp declare target to (baz)"); void foo (void) { S a, b; #pragma omp target private (a) map (b) { #pragma omp teams firstprivate (b) num_teams (baz ()) { bar (a, b); } } } into: retval.0 = baz (); retval.1 = retval.0; { unsigned int retval.3; struct S * D.2549; struct S b; retval.3 = (unsigned int) retval.1; D.2549 = .omp_data_i->b; S::S (&b, D.2549); #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a) __builtin_GOMP_teams (retval.3, 0); { bar (&a, &b); } S::~S (&b); #pragma omp return(nowait) } IMHO we want a new API, say GOMP_teams3 which will take 3 arguments instead of 2 (the lower and upper bounds from num_teams and thread_limit) and will return a bool whether it should do the teams body or not. And, we should add right before outermost {} above while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0)) and remove the __builtin_GOMP_teams call. The current function performs exit equivalent (at least on NVPTX) which seems bad because that means the destructors of e.g. private variables on target aren't invoked, and at the current placement neither destructors of the already constructed privatized variables in teams. I'll do this next on the compiler side, but I'm afraid I'll need help with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be able to use %ctaid.x . I think ideal would be to use a .shared integer variable for the omp_get_team_num value, but I don't have any experience with that, are .shared variables zero initialized by default, or do they have random value at start? PTX docs say they aren't initializable. 2021-11-11 Jakub Jelinek <jakub@redhat.com> gcc/ * tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ... (OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this. (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define. * tree.c (omp_clause_num_ops): Increase num ops for OMP_CLAUSE_NUM_TEAMS to 2. * tree-pretty-print.c (dump_omp_clause): Print optional lower bound for OMP_CLAUSE_NUM_TEAMS. * gimplify.c (gimplify_scan_omp_clauses): Gimplify OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL. (optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. * omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. * omp-expand.c (expand_teams_call, get_target_arguments): Likewise. gcc/c/ * c-parser.c (c_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/cp/ * parser.c (cp_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause. * pt.c (tsubst_omp_clauses): Likewise. (tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. gcc/testsuite/ * c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression to half of the num_teams clauses. * c-c++-common/gomp/num-teams-1.c: New test. * c-c++-common/gomp/num-teams-2.c: New test. * g++.dg/gomp/attrs-1.C (bar): Supply lower-bound expression to half of the num_teams clauses. * g++.dg/gomp/attrs-2.C (bar): Likewise. * g++.dg/gomp/num-teams-1.C: New test. * g++.dg/gomp/num-teams-2.C: New test. libgomp/ * testsuite/libgomp.c-c++-common/teams-1.c: New test.
*	Remove find_pdom and find_dom	Richard Biener	2021-11-11	2	-57/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes now useless wrappers around get_immediate_dominator. 2021-11-11 Richard Biener <rguenther@suse.de> * cfganal.c (find_pdom): Remove. (control_dependences::find_control_dependence): Remove special-casing of entry block, call get_immediate_dominator directly. * gimple-predicate-analysis.cc (find_pdom): Remove. (find_dom): Likewise. (find_control_equiv_block): Call get_immediate_dominator directly. (compute_control_dep_chain): Likewise. (predicate::init_from_phi_def): Likewise.
*	Apply TLC to control dependence compute	Richard Biener	2021-11-11	5	-16/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the control dependence compute avoid a find_edge and optimizes allocation by embedding the bitmap head into the vector of control dependences instead of allocating all of them. It also uses a local bitmap obstack. The bitmap changes make it necessary to shuffle some includes. 2021-11-10 Richard Biener <rguenther@suse.de> * cfganal.h (control_dependences::control_dependence_map): Embed bitmap_head. (control_dependences::m_bitmaps): New. * cfganal.c (control_dependences::set_control_dependence_map_bit): Adjust. (control_dependences::clear_control_dependence_bitmap): Likewise. (control_dependences::find_control_dependence): Do not find_edge for the abnormal edge test. (control_dependences::control_dependences): Instead do not add abnormal edges to the edge list. Adjust. (control_dependences::~control_dependences): Likewise. (control_dependences::get_edges_dependent_on): Likewise. * function-tests.c: Include bitmap.h. gcc/analyzer/ * supergraph.cc: Include bitmap.h. gcc/c/ * gimple-parser.c: Shuffle bitmap.h include.
*	rs6000/doc: Rename future cpu with power10	Kewen Lin	2021-11-10	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Commmit 5d9d0c94588 renamed future to power10 and ace60939fd2 updated the documentation for "future" renaming. This patch is to rename the remaining "future architecture" references in documentation and polish the words for float128. gcc/ChangeLog: * doc/invoke.texi: Change references to "future cpu" to "power10", "-mcpu=future" to "-mcpu=power10". Adjust words for float128.
*	x86: Update -mtune=alderlake	Cui,Lili	2021-11-11	5	-30/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not support Intel® AVX-512. ISA features such as Intel® AVX, AVX-VNNI, Intel® AVX2, and UMONITOR/UMWAIT/TPAUSE are supported. gcc/ChangeLog * config/i386/i386-options.c (m_CORE_AVX2): Remove Alderlake from m_CORE_AVX2. (processor_cost_table): Use alderlake_cost for Alderlake. * config/i386/i386.c (ix86_sched_init_global): Handle Alderlake. * config/i386/x86-tune-costs.h (struct processor_costs): Add alderlake cost. * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Alderlake issue rate to 4. (ix86_adjust_cost): Handle Alderlake. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Enable for Alderlake. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. (X86_TUNE_MEMORY_MISMATCH_STALL): Likewise. (X86_TUNE_USE_LEAVE): Likewise. (X86_TUNE_PUSH_MEMORY): Likewise. (X86_TUNE_USE_INCDEC): Likewise. (X86_TUNE_INTEGER_DFMODE_MOVES): Likewise. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise. (X86_TUNE_USE_SAHF): Likewise. (X86_TUNE_USE_BT): Likewise. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise. (X86_TUNE_ONE_IF_CONV_INSN): Likewise. (X86_TUNE_AVOID_MFENCE): Likewise. (X86_TUNE_USE_SIMODE_FIOP): Likewise. (X86_TUNE_EXT_80387_CONSTANTS): Likewise. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Likewise. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_SSE_TYPELESS_STORES): Likewise. (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise. (X86_TUNE_AVOID_4BYTE_PREFIXES): Likewise. (X86_TUNE_USE_GATHER): Disable for Alderlake. (X86_TUNE_AVX256_MOVE_BY_PIECES): Likewise. (X86_TUNE_AVX256_STORE_BY_PIECES): Likewise.
*	Extend vpcmov to handle V8HF/V16HFmode under TARGET_XOP.	liuhongt	2021-11-11	2	-4/+22
\| \| \| \| \| \| \| \| \| \| \| \|	gcc/ChangeLog: PR target/103151 * config/i386/sse.md (V_128_256): Extend to V8HF/V16HF. (avxsizesuffix): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr103151.c: New test.
*	RISC-V: Fix wrong zifencei handling in riscv_subset_list::to_string	Kito Cheng	2021-11-11	1	-1/+1
\| \| \| \| \| \| \| \| \|	This issue cause zifencei never correctly appended on the ISA string. gcc/ChangeLog * common/config/riscv/riscv-common.c (riscv_subset_list::to_string): Fix wrong marco checking.
*	Daily bump.	GCC Administrator	2021-11-11	8	-1/+704
\|
*	Allow loop header copying when first iteration condition is known.	Aldy Hernandez	2021-11-10	2	-8/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in the PR, the loop header copying pass avoids doing so when optimizing for size. However, sometimes we can determine the loop entry conditional statically for the first iteration of the loop. This patch uses the path solver to determine the outgoing edge out of preheader->header->xx. If so, it allows header copying. Doing this in the loop optimizer saves us from doing gymnastics in the threader which doesn't have the context to determine if a loop transformation is profitable. I am only returning true in entry_loop_condition_is_static for a true conditional. Technically a false conditional is also provably static, but allowing any boolean value causes a regression in gfortran.dg/vector_subscript_1.f90. I would have preferred not passing around the query object, but the layout of pass_ch and should_duplicate_loop_header_p make it a bit awkward to get it right without an outright refactor to the pass. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102906 * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New. (should_duplicate_loop_header_p): Call entry_loop_condition_is_static. (class ch_base): Add m_ranger and m_query. (ch_base::copy_headers): Pass m_query to entry_loop_condition_is_static. (pass_ch::execute): Allocate and deallocate m_ranger and m_query. (pass_ch_vect::execute): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr102906.c: New test.
*	[COMMITTED] aarch64: [PR103170] Fix aarch64_simd_dup<mode>	Andrew Pinski	2021-11-10	2	-2/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem here is aarch64_simd_dup<mode> use the vw iterator rather than vwcore iterator. This causes problems for the V4SF and V2DF modes. I changed both of aarch64_simd_dup<mode> patterns to be consistent. Committed as obvious after a bootstrap/test on aarch64-linux-gnu. PR target/103170 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_dup<mode>): Use vwcore iterator for the r constraint output string. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/vector-dup-1.c: New test.