delta/gcc.git/libcpp/include, branch master

diagnostics: ensure that .sarif files are UTF-8 encoded [PR109098]

2023-03-25T00:52:34+00:00

PR analyzer/109098 notes that the SARIF spec mandates that .sarif
files are UTF-8 encoded, but -fdiagnostics-format=sarif-file naively
assumes that the source files are UTF-8 encoded when quoting source
artefacts in the .sarif output, which can lead to us writing out
.sarif files with non-UTF-8 bytes in them (which break my reporting
scripts).

The root cause is that sarif_builder::maybe_make_artifact_content_object
was using maybe_read_file to load the file content as bytes, and
assuming they were UTF-8 encoded.

This patch reworks both overloads of this function (one used for the
whole file, the other for snippets of quoted lines) so that they go
through input.cc's file cache, which attempts to decode the input files
according to the input charset, and then encode as UTF-8.  They also
check that the result actually is UTF-8, for cases where the input
charset is missing, or incorrectly specified, and omit the quoted
source for such awkward cases.

Doing so fixes all of the cases I've encountered.

The patch adds a new:
  { dg-final { verify-sarif-file } }
directive to all SARIF test cases in the test suite, which verifies
that the output is UTF-8 encoded, and is valid JSON.  In particular
it verifies that when we complain about encoding problems, the .sarif
report we emit is itself correctly encoded.

gcc/ChangeLog:
	PR analyzer/109098
	* diagnostic-format-sarif.cc (read_until_eof): Delete.
	(maybe_read_file): Delete.
	(sarif_builder::maybe_make_artifact_content_object): Use
	get_source_file_content rather than maybe_read_file.
	Reject it if it's not valid UTF-8.
	* input.cc (file_cache_slot::get_full_file_content): New.
	(get_source_file_content): New.
	(selftest::check_cpp_valid_utf8_p): New.
	(selftest::test_cpp_valid_utf8_p): New.
	(selftest::input_cc_tests): Call selftest::test_cpp_valid_utf8_p.
	* input.h (get_source_file_content): New prototype.

gcc/testsuite/ChangeLog:
	PR analyzer/109098
	* c-c++-common/diagnostic-format-sarif-file-1.c: Add
	verify-sarif-file directive.
	* c-c++-common/diagnostic-format-sarif-file-2.c: Likewise.
	* c-c++-common/diagnostic-format-sarif-file-3.c: Likewise.
	* c-c++-common/diagnostic-format-sarif-file-4.c: Likewise.
	* c-c++-common/diagnostic-format-sarif-file-Wbidi-chars.c: New
	test case, adapted from Wbidi-chars-1.c.
	* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c:
	New test case.
	* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-2.c:
	New test case.
	* c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-3.c:
	New test case, adapted from cpp/Winvalid-utf8-1.c.
	* c-c++-common/diagnostic-format-sarif-file-valid-CP850.c: New
	test case, adapted from gcc.dg/diagnostic-input-charset-1.c.
	* gcc.dg/plugin/crash-test-ice-sarif.c: Add verify-sarif-file
	directive.
	* gcc.dg/plugin/crash-test-write-though-null-sarif.c: Likewise.
	* gcc.dg/plugin/diagnostic-test-paths-5.c: Likewise.
	* lib/scansarif.exp (verify-sarif-file): New procedure.
	* lib/verify-sarif-file.py: New support script.

libcpp/ChangeLog:
	PR analyzer/109098
	* charset.cc (cpp_valid_utf8_p): New function.
	* include/cpplib.h (cpp_valid_utf8_p): New prototype.

Signed-off-by: David Malcolm

Update copyright years.

2023-01-16T10:52:17+00:00

pch: Fix streaming of strings with embedded null bytes

2022-10-19T13:26:09+00:00

When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
(whether they live in GC-controlled memory or not) will be marked for PCH
output by the routine gt_pch_note_object in ggc-common.cc. This routine
special-cases plain char* strings, and in particular it uses strlen() to get
their length. Thus it does not handle strings with embedded null bytes, but it
is possible for something PCH cares about (such as a string literal token in a
macro definition) to contain such embedded nulls. To fix that up, add a new
GTY option "string_length" so that gt_pch_note_object can be informed the
actual length it ought to use, and use it in the relevant libcpp structs
(cpp_string and ht_identifier) accordingly.

gcc/ChangeLog:

	* gengtype.cc (output_escaped_param): Add missing const.
	(get_string_option): Add missing check for option type.
	(walk_type): Support new "string_length" GTY option.
	(write_types_process_field): Likewise.
	* ggc-common.cc (gt_pch_note_object): Add optional length argument.
	* ggc.h (gt_pch_note_object): Adjust prototype for new argument.
	(gt_pch_n_S2): Declare...
	* stringpool.cc (gt_pch_n_S2): ...new function.
	* doc/gty.texi: Document new GTY((string_length)) option.

libcpp/ChangeLog:

	* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
	* include/symtab.h (struct ht_identifier): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/pch/pch-string-nulls.C: New test.
	* g++.dg/pch/pch-string-nulls.Hs: New test.

preprocessor: C2x identifier rules

2022-10-14T23:07:50+00:00

C2x has, like C++, adopted rules for identifiers based directly on an
unversioned normative reference to Unicode.  Make libcpp follow those
rules for c2x / gnu2x standards (this involves bringing back a flag
separate from the C++ one for whether to use these identifier rules,
but this time enabled for all C++ language versions since that was the
conclusion adopted for C++ identifier handling).

There is one change here that affects C++.  I believe the new
normative requirement for NFC only applies to identifiers, not to the
use of identifier-continue characters in pp-numbers, where there is no
such requirement and so the diagnostic ought to be a warning not a
pedwarn in pp-numbers, and that this is the case for both C and C++.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

libcpp/
	* charset.cc (ucn_valid_in_identifier): Check xid_identifiers not
	cplusplus to determine whether to use CXX23 and NXX23 flags.
	* include/cpplib.h (struct cpp_options): Add xid_identifiers.
	* init.cc (struct lang_flags, lang_defaults): Add xid_identifiers.
	(cpp_set_lang): Set xid_identifiers.
	* lex.cc (warn_about_normalization): Add parameter identifier.
	Only pedwarn about non-NFC for identifiers, not pp-numbers.
	(_cpp_lex_direct): Update calls to warn_about_normalization.

gcc/testsuite/
	* gcc.dg/cpp/c2x-ucnid-1-utf8.c, gcc.dg/cpp/c2x-ucnid-1.c: New
	tests.

middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support

2022-10-14T07:37:01+00:00

Here is a complete patch to add std::bfloat16_t support on
x86 (AArch64 and ARM left for later).  Almost no BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
By default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
The patch introduces a single __bf16 builtin - __builtin_nansf16b,
because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN,
and uses f16b suffix instead of bf16 because there would be ambiguity on
log vs. logb - __builtin_logbf16 could be either log with bf16 suffix
or logb with f16 suffix.  In other cases libstdc++ should mostly use
__builtin_*f for std::bfloat16_t overloads (we have a problem with
std::nextafter though but that one we have also for std::float16_t).

2022-10-14  Jakub Jelinek  

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	(build_common_tree_nodes): Initialize bfloat16_type_node if
	BFmode is supported.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New.
	* builtins.def (BUILT_IN_NANSF16B): New builtin.
	* fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B.
	* config/i386/i386.cc (classify_argument): Handle E_BCmode.
	(ix86_libgcc_floating_mode_supported_p): Also return true for BFmode
	for -msse2.
	(ix86_mangle_type): Mangle BFmode as DF16b.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node, only create it if still NULL.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/i386/i386.md (cbranchbf4, cstorebf4): New expanders.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.  Predefine bfloat16_type_node related
	macros for -fbuilding-libgcc.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16.
gcc/c/
	* c-typeck.cc (convert_arguments): Don't promote __bf16 to
	double.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_bfloat16,
	check_effective_target_bfloat16_runtime, add_options_for_bfloat16):
	New.
	* gcc.dg/torture/bfloat16-basic.c: New test.
	* gcc.dg/torture/bfloat16-builtin.c: New test.
	* gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test.
	* gcc.dg/torture/bfloat16-complex.c: New test.
	* gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable
	from bfloat16-builtin-issignaling-1.c.
	* gcc.dg/torture/floatn-basic.h: Allow to be includable from
	bfloat16-basic.c.
	* gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected
	diagnostics.
	* gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise.
	* gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise.
	* g++.target/i386/bfloat_cpp_typecheck.C: Likewise.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	(CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c,
	CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add
	-msse2.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DF16b.
	* testsuite/demangle-expected (_Z3xxxDF16b): New test.

Add instruction level discriminator support.

2022-09-28T21:25:18+00:00

This is the first in a series of patches to enable discriminator support
in AutoFDO.

This patch switches to tracking discriminators per statement/instruction
instead of per basic block. Tracking per basic block was problematic since
not all statements in a basic block needed a discriminator and, also, later
optimizations could move statements between basic blocks making correlation
during AutoFDO compilation unreliable. Tracking per statement also allows
us to assign different discriminators to multiple function calls in the same
basic block. A subsequent patch will add that support.

The idea of this patch is based on commit 4c311d95cf6d9519c3c20f641cc77af7df491fdf
by Dehao Chen in vendors/google/heads/gcc-4_8 but uses a slightly different
approach. In Dehao's work special (normally unused) location ids and side tables
were used to keep track of locations with discriminators. Things have changed
since then and I don't think we have unused location ids anymore. Instead,
I made discriminators a part of ad-hoc locations.

The difference from Dehao's work also includes support for discriminator
reading/writing in lto streaming and in modules.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

	* basic-block.h: Remove discriminator from basic blocks.
	* cfghooks.cc (split_block_1): Remove discriminator from basic blocks.
	* final.cc (final_start_function_1): Switch from per-bb to per statement
	discriminator.
	(final_scan_insn_1): Don't keep track of basic block discriminators.
	(compute_discriminator): Switch from basic block discriminators to
	instruction discriminators.
	(insn_discriminator): New function to return instruction discriminator.
	(notice_source_line): Use insn_discriminator.
	* gimple-pretty-print.cc (dump_gimple_bb_header): Remove dumping of
	basic block discriminators.
	* gimple-streamer-in.cc (input_bb): Remove reading of basic block
	discriminators.
	* gimple-streamer-out.cc (output_bb): Remove writing of basic block
	discriminators.
	* input.cc (make_location): Pass 0 discriminator to COMBINE_LOCATION_DATA.
	(location_with_discriminator): New function to combine locus with
	a discriminator.
	(has_discriminator): New function to check if a location has a discriminator.
	(get_discriminator_from_loc): New function to get the discriminator
	from a location.
	* input.h: Declarations of new functions.
	* lto-streamer-in.cc (cmp_loc): Use discriminators in location comparison.
	(apply_location_cache): Keep track of current discriminator.
	(input_location_and_block): Read discriminator from stream.
	* lto-streamer-out.cc (clear_line_info): Set current discriminator to
	UINT_MAX.
	(lto_output_location_1): Write discriminator to stream.
	* lto-streamer.h: Add discriminator to cached_location.
	Add current_discr to lto_location_cache.
	Add current_discr to output_block.
	* print-rtl.cc (print_rtx_operand_code_i): Print discriminator.
	* rtl.h: Add extern declaration of insn_discriminator.
	* tree-cfg.cc (assign_discriminator): New function to assign a unique
	discriminator value to all statements in a basic block that have the given
	line number.
	(assign_discriminators): Assign discriminators to statement locations.
	* tree-pretty-print.cc (dump_location): Dump discriminators.
	* tree.cc (set_block): Preserve discriminator when setting block.
	(set_source_range): Preserve discriminator when setting source range.

gcc/cp/ChangeLog:
	* module.cc (write_location): Write discriminator.
	(read_location): Read discriminator.

libcpp/ChangeLog:

	* include/line-map.h: Add discriminator to location_adhoc_data.
	(get_combined_adhoc_loc): Add discriminator parameter.
	(get_discriminator_from_adhoc_loc): Add external declaration.
	(get_discriminator_from_loc): Add external declaration.
	(COMBINE_LOCATION_DATA): Add discriminator parameter.
	* lex.cc (get_location_for_byte_range_in_cur_line) Pass 0 discriminator
	in a call to COMBINE_LOCATION_DATA.
	(warn_about_normalization): Pass 0 discriminator in a call to
	COMBINE_LOCATION_DATA.
	(_cpp_lex_direct): Pass 0 discriminator in a call to
	COMBINE_LOCATION_DATA.
	* line-map.cc (location_adhoc_data_hash): Use discriminator compute
	location_adhoc_data hash.
	(location_adhoc_data_eq): Use discriminator when comparing
	location_adhoc_data.
	(can_be_stored_compactly_p): Check discriminator to determine
	compact storage.
	(get_combined_adhoc_loc): Add discriminator parameter.
	(get_discriminator_from_adhoc_loc): New function to get the discriminator
	from an ad-hoc location.
	(get_discriminator_from_loc): New function to get the discriminator
	from a location.

gcc/testsuite/ChangeLog:

	* c-c++-common/ubsan/pr85213.c: Pass -gno-statement-frontiers.

c: New C2x keywords

2022-09-07T13:56:46+00:00

C2x follows C++ in making alignas, alignof, bool, false,
static_assert, thread_local and true keywords; implement this
accordingly.  This implementation makes them normal keywords in C2x
mode just like any other keyword (C2x leaves open the possibility of
implementation using predefined macros instead - thus, there aren't
any testcases asserting that they aren't macros).  As in C++ and
previous versions of C, true and false are handled like signed 1 and 0
in #if (there was an intermediate state in some C2x drafts where they
had different macro expansions that were unsigned in #if).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

As with the removal of unprototyped functions, this change has a high
risk of breaking some old code and people doing GNU/Linux distribution
builds may wish to see how much is broken in a build with a -std=gnu2x
default.

gcc/
	* ginclude/stdalign.h [defined __STDC_VERSION__ &&
	__STDC_VERSION__ > 201710L]: Disable all content.
	* ginclude/stdbool.h [defined __STDC_VERSION__ && __STDC_VERSION__
	> 201710L] (bool, true, false): Do not define.

gcc/c-family/
	* c-common.cc (c_common_reswords): Use D_C2X instead of D_CXXONLY
	for alignas, alignof, bool, false, static_assert, thread_local and
	true.

gcc/c/
	* c-parser.cc (c_parser_static_assert_declaration_no_semi)
	(c_parser_alignas_specifier, c_parser_alignof_expression): Allow
	for C2x spellings of keywords.
	(c_parser_postfix_expression): Handle RID_TRUE and RID_FALSE.

gcc/testsuite/
	* gcc.dg/c11-keywords-1.c, gcc.dg/c2x-align-1.c,
	gcc.dg/c2x-align-6.c, gcc.dg/c2x-bool-2.c,
	gcc.dg/c2x-static-assert-3.c, gcc.dg/c2x-static-assert-4.c,
	gcc.dg/c2x-thread-local-1.c: New tests.
	* gcc.dg/c2x-bool-1.c: Update expectations.

libcpp/
	* include/cpplib.h (struct cpp_options): Add true_false.
	* expr.cc (eval_token): Check true_false not cplusplus to
	determine whether to handle true and false keywords.
	* init.cc (struct lang_flags): Add true_false.
	(lang_defaults): Update.
	(cpp_set_lang): Set true_false.

libcpp: Named universal character escapes and delimited escape sequence tweaks

2022-09-07T06:44:38+00:00

On Tue, Aug 30, 2022 at 09:10:37PM +0000, Joseph Myers wrote:
> I'm seeing build failures of glibc for powerpc64, as illustrated by the
> following C code:
>
> #if 0
> \NARG
> #endif
>
> (the actual sysdeps/powerpc/powerpc64/sysdep.h code is inside #ifdef
> __ASSEMBLER__).
>
> This shows some problems with this feature - and with delimited escape
> sequences - as it affects C.  It's fine to accept it as an extension
> inside string and character literals, because \N or \u{...} would be
> invalid in the absence of the feature (i.e. the syntax for such literals
> fails to match, meaning that the rule about undefined behavior for a
> single ' or " as a pp-token applies).  But outside string and character
> literals, the usual lexing rules apply, the \ is a pp-token on its own and
> the code is valid at the preprocessing level, and with expansion of macros
> appearing before or after the \ (e.g. u defined as a macro in the \u{...}
> case) it may be valid code at the language level as well.  I don't know
> what older C++ versions say about this, but for C this means e.g.
>
> #define z(x) 0
> #define a z(
> int x = a\NARG);
>
> needs to be accepted as expanding to "int x = 0;", not interpreted as
> using the \N feature in an identifier and produce an error.

The following patch changes this, so that:
1) outside of string/character literals, \N without following { is never
   treated as an error nor warning, it is silently treated as \ separate
   token followed by whatever is after it
2) \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} are not handled as
   extension at all outside of string/character literals in the strict
   standard modes (-std=c*) except for -std=c++{23,2b}, only in the
   -std=gnu* modes, because it changes behavior on valid sources, e.g.
   #define z(x) 0
   #define a z(
   int x = a\u{123});
   int y = a\N{LATIN SMALL LETTER A WITH ACUTE});
3) introduces -Wunicode warning (on by default) and warns for cases
   of what looks like invalid delimited escape sequence or named
   universal character escape outside of string/character literals
   and is treated as separate tokens

2022-09-07  Jakub Jelinek  

libcpp/
	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
	(enum cpp_warning_reason): Add CPP_W_UNICODE.
	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
	In possible identifier contexts, don't emit an error and punt
	if \N isn't followed by {, or if \N{} surrounds some lower case
	letters or _.  In possible identifier contexts when not C++23, don't
	emit an error but warning about unknown character names and treat as
	separate tokens.  When treating as separate tokens \u{ or \N{, emit
	warnings.
gcc/
	* doc/invoke.texi (-Wno-unicode): Document.
gcc/c-family/
	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
	" in comments" from description.
	(Wunicode): New option.
gcc/testsuite/
	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
	* g++.dg/cpp23/named-universal-char-escape2.C: New test.

c/c++: new warning: -Wxor-used-as-pow [PR90885]

2022-09-02T22:29:33+00:00

PR c/90885 notes various places in real-world code where people have
written C/C++ code that uses ^ (exclusive or) where presumbably they
meant exponentiation.

For example
  https://codesearch.isocpp.org/cgi-bin/cgi_ppsearch?q=2%5E32&search=Search
currently finds 11 places using "2^32", and all of them appear to be
places where the user means 2 to the power of 32, rather than 2
exclusive-orred with 32 (which is 34).

This patch adds a new -Wxor-used-as-pow warning to the C and C++
frontends to complain about ^ when the left-hand side is the decimal
constant 2 or the decimal constant 10.

This is the same name as the corresponding clang warning:
  https://clang.llvm.org/docs/DiagnosticsReference.html#wxor-used-as-pow

As per the clang warning, the warning suggests converting the left-hand
side to a hexadecimal constant if you really mean xor, which suppresses
the warning (though this patch implements a fix-it hint for that, whereas
the clang implementation only has a fix-it hint for the initial
suggestion of exponentiation).

I initially tried implementing this without checking for decimals, but
this version had lots of false positives.  Checking for decimals
requires extending the lexer to capture whether or not a CPP_NUMBER
token was decimal.  I added a new DECIMAL_INT flag to cpplib.h for this.
Unfortunately, c_token and cp_tokens both have only an unsigned char for
their flags (as captured by c_lex_with_flags), whereas this would add
the 12th flag to cpp_tokens.  Of the first 8 flags, all but BOL are used
in the C or C++ frontends, but BOL is not, so I moved that to a higher
position, using its old value for the new DECIMAL_INT flag, so that it
is representable within an unsigned char.

Example output:

demo.c:5:13: warning: result of '2^8' is 10; did you mean '1 << 8' (256)? [-Wxor-used-as-pow]
    5 | int t2_8 = 2^8;
      |             ^
      |            --
      |            1<<
demo.c:5:12: note: you can silence this warning by using a hexadecimal constant (0x2 rather than 2)
    5 | int t2_8 = 2^8;
      |            ^
      |            0x2
demo.c:21:15: warning: result of '10^6' is 12; did you mean '1e6'? [-Wxor-used-as-pow]
   21 | int t10_6 = 10^6;
      |               ^
      |             ---
      |             1e
demo.c:21:13: note: you can silence this warning by using a hexadecimal constant (0xa rather than 10)
   21 | int t10_6 = 10^6;
      |             ^~
      |             0xa

gcc/c-family/ChangeLog:
	PR c/90885
	* c-common.h (check_for_xor_used_as_pow): New decl.
	* c-lex.cc (c_lex_with_flags): Add DECIMAL_INT to flags as appropriate.
	* c-warn.cc (check_for_xor_used_as_pow): New.
	* c.opt (Wxor-used-as-pow): New.

gcc/c/ChangeLog:
	PR c/90885
	* c-parser.cc (c_parser_string_literal): Clear ret.m_decimal.
	(c_parser_expr_no_commas): Likewise.
	(c_parser_conditional_expression): Likewise.
	(c_parser_binary_expression): Clear m_decimal when popping the
	stack.
	(c_parser_unary_expression): Clear ret.m_decimal.
	(c_parser_has_attribute_expression): Likewise for result.
	(c_parser_predefined_identifier): Likewise for expr.
	(c_parser_postfix_expression): Likewise for expr.
	Set expr.m_decimal when handling a CPP_NUMBER that was a decimal
	token.
	* c-tree.h (c_expr::m_decimal): New bitfield.
	* c-typeck.cc (parser_build_binary_op): Clear result.m_decimal.
	(parser_build_binary_op): Call check_for_xor_used_as_pow.

gcc/cp/ChangeLog:
	PR c/90885
	* cp-tree.h (class cp_expr): Add bitfield m_decimal.  Clear it in
	existing ctors.  Add ctor that allows specifying its value.
	(cp_expr::decimal_p): New accessor.
	* parser.cc (cp_parser_expression_stack_entry::flags): New field.
	(cp_parser_primary_expression): Set m_decimal of cp_expr when
	handling numbers.
	(cp_parser_binary_expression): Extract flags from token when
	populating stack.  Call check_for_xor_used_as_pow.

gcc/ChangeLog:
	PR c/90885
	* doc/invoke.texi (Warning Options): Add -Wxor-used-as-pow.

gcc/testsuite/ChangeLog:
	PR c/90885
	* c-c++-common/Wxor-used-as-pow-1.c: New test.
	* c-c++-common/Wxor-used-as-pow-fixits.c: New test.
	* g++.dg/parse/expr3.C: Convert 2 to 0x2 to suppress
	-Wxor-used-as-pow.
	* g++.dg/warn/Wparentheses-10.C: Likewise.
	* g++.dg/warn/Wparentheses-18.C: Likewise.
	* g++.dg/warn/Wparentheses-19.C: Likewise.
	* g++.dg/warn/Wparentheses-9.C: Likewise.
	* g++.dg/warn/Wxor-used-as-pow-named-op.C: New test.
	* gcc.dg/Wparentheses-6.c: Convert 2 to 0x2 to suppress
	-Wxor-used-as-pow.
	* gcc.dg/Wparentheses-7.c: Likewise.
	* gcc.dg/precedence-1.c: Likewise.

libcpp/ChangeLog:
	PR c/90885
	* include/cpplib.h (BOL): Move macro to 1 << 12 since it is
	not used by C/C++'s unsigned char token flags.
	(DECIMAL_INT): New, using 1 << 6, so that it is visible as
	part of C/C++'s 8 bits of token flags.

Signed-off-by: David Malcolm

libcpp: Add -Winvalid-utf8 warning [PR106655]

2022-09-01T07:56:44+00:00

The following patch introduces a new warning - -Winvalid-utf8 similarly
to what clang now has - to diagnose invalid UTF-8 byte sequences in
comments, but not just in those, but also in string/character literals
and outside of them.

The warning is on by default when explicit -finput-charset=UTF-8 is
used and C++23 compilation is requested and if -{,W}pedantic or
-pedantic-errors it is actually a pedwarn.

The reason it is on by default only for -finput-charset=UTF-8 is
that the sources often are UTF-8, but sometimes could be some ASCII
compatible single byte encoding where non-ASCII characters only
appear in comments.  So having the warning off by default
is IMO desirable.  The C++23 pedantic mode for when the source code
is UTF-8 is -std=c++23 -pedantic-errors -finput-charset=UTF-8.

2022-09-01  Jakub Jelinek  

	PR c++/106655
libcpp/
	* include/cpplib.h (struct cpp_options): Implement C++23
	P2295R6 - Support for UTF-8 as a portable source file encoding.
	Add cpp_warn_invalid_utf8 and cpp_input_charset_explicit fields.
	(enum cpp_warning_reason): Add CPP_W_INVALID_UTF8 enumerator.
	* init.cc (cpp_create_reader): Initialize cpp_warn_invalid_utf8
	and cpp_input_charset_explicit.
	* charset.cc (_cpp_valid_utf8): Adjust function comment.
	* lex.cc (UCS_LIMIT): Define.
	(utf8_continuation): New const variable.
	(utf8_signifier): Move earlier in the file.
	(_cpp_warn_invalid_utf8, _cpp_handle_multibyte_utf8): New functions.
	(_cpp_skip_block_comment): Handle -Winvalid-utf8 warning.
	(skip_line_comment): Likewise.
	(lex_raw_string, lex_string): Likewise.
	(_cpp_lex_direct): Likewise.
gcc/
	* doc/invoke.texi (-Winvalid-utf8): Document it.
gcc/c-family/
	* c.opt (-Winvalid-utf8): New warning.
	* c-opts.cc (c_common_handle_option) :
	Set cpp_opts->cpp_input_charset_explicit.
	(c_common_post_options): If -finput-charset=UTF-8 is explicit
	in C++23, enable -Winvalid-utf8 by default and if -pedantic
	or -pedantic-errors, make it a pedwarn.
gcc/testsuite/
	* c-c++-common/cpp/Winvalid-utf8-1.c: New test.
	* c-c++-common/cpp/Winvalid-utf8-2.c: New test.
	* c-c++-common/cpp/Winvalid-utf8-3.c: New test.
	* g++.dg/cpp23/Winvalid-utf8-1.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-2.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-3.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-4.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-5.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-6.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-7.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-8.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-9.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-10.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-11.C: New test.
	* g++.dg/cpp23/Winvalid-utf8-12.C: New test.