diff options
author | Arnold D. Robbins <arnold@skeeve.com> | 2023-04-07 14:40:03 +0300 |
---|---|---|
committer | Arnold D. Robbins <arnold@skeeve.com> | 2023-04-07 14:40:03 +0300 |
commit | 2003b18129d4eb24011f9b39eb35c79598daf546 (patch) | |
tree | 52ac01766fcf312bb49d189b1f35c244a0c61e9d | |
parent | e63e393634006ca2e94f0d0715e486193d82ae66 (diff) | |
download | gawk-2003b18129d4eb24011f9b39eb35c79598daf546.tar.gz |
Improve CSV record handling.
-rw-r--r-- | ChangeLog | 6 | ||||
-rw-r--r-- | doc/ChangeLog | 6 | ||||
-rw-r--r-- | doc/gawk.info | 1057 | ||||
-rw-r--r-- | doc/gawk.texi | 6 | ||||
-rw-r--r-- | doc/gawktexi.in | 3 | ||||
-rw-r--r-- | io.c | 19 |
6 files changed, 556 insertions, 541 deletions
@@ -1,3 +1,9 @@ +2023-04-07 Andrew J. Schorr <aschorr@telemetry-investments.com> + + * io.c (csvscan): Instead of stripping all carriage returns in the + input, simply include the CR in the RT if it occurs just before + the linefeed character. + 2023-04-07 Arnold D. Robbins <arnold@skeeve.com> * array.c (assoc_info): Update to handle additional cases so diff --git a/doc/ChangeLog b/doc/ChangeLog index 5225862a..13e2f082 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,9 @@ +2023-04-06 Andrew J. Schorr <aschorr@telemetry-investments.com> + + * gawktexi.in (Carriage-Return--Line-Feed Line Endings In CSV Files): + Revise to explain that carriage-returns will be included in RT + when they appear just before a line-feed. + 2023-04-04 Arnold D. Robbins <arnold@skeeve.com> * gawktexi.in (Controlling Scanning): Fix the logic in the example. diff --git a/doc/gawk.info b/doc/gawk.info index 754378a5..e3bcc9e7 100644 --- a/doc/gawk.info +++ b/doc/gawk.info @@ -5332,7 +5332,8 @@ Records::). Many CSV files are imported from systems where the line terminator for text files is a carriage-return–line-feed pair (CR-LF, ‘\r’ followed by ‘\n’). For ease of use, when processing CSV files, ‘gawk’ simply -strips out any carriage-return characters in the input. +includes the carriage-return character in the record terminator when it +occurs immediately prior to a line-feed character in the input. The behavior of the ‘split()’ function (not formally discussed yet, see *note String Functions::) differs slightly when processing CSV @@ -39663,533 +39664,533 @@ Node: Regexp Field Splitting243342 Node: Single Character Fields247171 Node: Comma Separated Fields248260 Ref: table-csv-examples249679 -Node: Command Line Field Separator251609 -Node: Full Line Fields254995 -Ref: Full Line Fields-Footnote-1256575 -Ref: Full Line Fields-Footnote-2256621 -Node: Field Splitting Summary256729 -Node: Constant Size259163 -Node: Fixed width data259907 -Node: Skipping intervening263426 -Node: Allowing trailing data264228 -Node: Fields with fixed data265293 -Node: Splitting By Content266919 -Ref: Splitting By Content-Footnote-1271188 -Node: More CSV271351 -Node: FS versus FPAT273004 -Node: Testing field creation274213 -Node: Multiple Line275991 -Node: Getline282473 -Node: Plain Getline285059 -Node: Getline/Variable287709 -Node: Getline/File288906 -Node: Getline/Variable/File290354 -Ref: Getline/Variable/File-Footnote-1291999 -Node: Getline/Pipe292095 -Node: Getline/Variable/Pipe294908 -Node: Getline/Coprocess296091 -Node: Getline/Variable/Coprocess297414 -Node: Getline Notes298180 -Node: Getline Summary301141 -Ref: table-getline-variants301585 -Node: Read Timeout302490 -Ref: Read Timeout-Footnote-1306454 -Node: Retrying Input306512 -Node: Command-line directories307779 -Node: Input Summary308717 -Node: Input Exercises312097 -Node: Printing312537 -Node: Print314480 -Node: Print Examples315986 -Node: Output Separators318839 -Node: OFMT320950 -Node: Printf322373 -Node: Basic Printf323178 -Node: Control Letters324814 -Node: Format Modifiers330283 -Node: Printf Examples336569 -Node: Redirection339114 -Node: Special FD346188 -Ref: Special FD-Footnote-1349478 -Node: Special Files349564 -Node: Other Inherited Files350193 -Node: Special Network351258 -Node: Special Caveats352146 -Node: Close Files And Pipes353129 -Ref: Close Files And Pipes-Footnote-1359265 -Node: Close Return Value359421 -Ref: table-close-pipe-return-values360696 -Ref: Close Return Value-Footnote-1361530 -Node: Noflush361686 -Node: Nonfatal363198 -Node: Output Summary365615 -Node: Output Exercises366901 -Node: Expressions367592 -Node: Values368794 -Node: Constants369472 -Node: Scalar Constants370169 -Ref: Scalar Constants-Footnote-1372747 -Ref: Scalar Constants-Footnote-2372997 -Node: Nondecimal-numbers373077 -Node: Regexp Constants376198 -Node: Using Constant Regexps376744 -Node: Standard Regexp Constants377390 -Node: Strong Regexp Constants380690 -Node: Variables384541 -Node: Using Variables385206 -Node: Assignment Options387186 -Node: Conversion389748 -Node: Strings And Numbers390280 -Ref: Strings And Numbers-Footnote-1393499 -Node: Locale influences conversions393608 -Ref: table-locale-affects396458 -Node: All Operators397101 -Node: Arithmetic Ops397742 -Node: Concatenation400572 -Ref: Concatenation-Footnote-1403522 -Node: Assignment Ops403645 -Ref: table-assign-ops408784 -Node: Increment Ops410166 -Node: Truth Values and Conditions413765 -Node: Truth Values414891 -Node: Typing and Comparison415982 -Node: Variable Typing416818 -Ref: Variable Typing-Footnote-1423480 -Ref: Variable Typing-Footnote-2423560 -Node: Comparison Operators423643 -Ref: table-relational-ops424070 -Node: POSIX String Comparison427756 -Ref: POSIX String Comparison-Footnote-1429515 -Ref: POSIX String Comparison-Footnote-2429658 -Node: Boolean Ops429742 -Ref: Boolean Ops-Footnote-1434435 -Node: Conditional Exp434531 -Node: Function Calls436317 -Node: Precedence440267 -Node: Locales444144 -Node: Expressions Summary445826 -Node: Patterns and Actions448489 -Node: Pattern Overview449631 -Node: Regexp Patterns451357 -Node: Expression Patterns451903 -Node: Ranges455812 -Node: BEGIN/END458990 -Node: Using BEGIN/END459803 -Ref: Using BEGIN/END-Footnote-1462713 -Node: I/O And BEGIN/END462823 -Node: BEGINFILE/ENDFILE465304 -Node: Empty468745 -Node: Using Shell Variables469062 -Node: Action Overview471400 -Node: Statements473835 -Node: If Statement475733 -Node: While Statement477302 -Node: Do Statement479390 -Node: For Statement480576 -Node: Switch Statement483933 -Node: Break Statement486484 -Node: Continue Statement488676 -Node: Next Statement490608 -Node: Nextfile Statement493105 -Node: Exit Statement495966 -Node: Built-in Variables498499 -Node: User-modified499678 -Node: Auto-set507889 -Ref: Auto-set-Footnote-1525988 -Ref: Auto-set-Footnote-2526206 -Node: ARGC and ARGV526262 -Node: Pattern Action Summary530701 -Node: Arrays533317 -Node: Array Basics534694 -Node: Array Intro535544 -Ref: figure-array-elements537560 -Ref: Array Intro-Footnote-1540429 -Node: Reference to Elements540561 -Node: Assigning Elements543083 -Node: Array Example543578 -Node: Scanning an Array545547 -Node: Controlling Scanning548644 -Ref: Controlling Scanning-Footnote-1555290 -Node: Numeric Array Subscripts555614 -Node: Uninitialized Subscripts557888 -Node: Delete559567 -Ref: Delete-Footnote-1562381 -Node: Multidimensional562438 -Node: Multiscanning565643 -Node: Arrays of Arrays567315 -Node: Arrays Summary572215 -Node: Functions574404 -Node: Built-in575464 -Node: Calling Built-in576653 -Node: Boolean Functions578700 -Node: Numeric Functions579270 -Ref: Numeric Functions-Footnote-1583463 -Ref: Numeric Functions-Footnote-2584147 -Ref: Numeric Functions-Footnote-3584199 -Node: String Functions584475 -Ref: String Functions-Footnote-1610706 -Ref: String Functions-Footnote-2610840 -Ref: String Functions-Footnote-3611100 -Node: Gory Details611187 -Ref: table-sub-escapes613094 -Ref: table-sub-proposed614740 -Ref: table-posix-sub616250 -Ref: table-gensub-escapes617938 -Ref: Gory Details-Footnote-1618872 -Node: I/O Functions619026 -Ref: table-system-return-values625713 -Ref: I/O Functions-Footnote-1627884 -Ref: I/O Functions-Footnote-2628032 -Node: Time Functions628152 -Ref: Time Functions-Footnote-1639308 -Ref: Time Functions-Footnote-2639384 -Ref: Time Functions-Footnote-3639546 -Ref: Time Functions-Footnote-4639657 -Ref: Time Functions-Footnote-5639775 -Ref: Time Functions-Footnote-6640010 -Node: Bitwise Functions640292 -Ref: table-bitwise-ops640894 -Ref: Bitwise Functions-Footnote-1647148 -Ref: Bitwise Functions-Footnote-2647327 -Node: Type Functions647524 -Node: I18N Functions651117 -Node: User-defined652860 -Node: Definition Syntax653680 -Ref: Definition Syntax-Footnote-1659508 -Node: Function Example659585 -Ref: Function Example-Footnote-1662564 -Node: Function Calling662586 -Node: Calling A Function663180 -Node: Variable Scope664150 -Node: Pass By Value/Reference667204 -Node: Function Caveats669936 -Ref: Function Caveats-Footnote-1672031 -Node: Return Statement672155 -Node: Dynamic Typing675210 -Node: Indirect Calls677602 -Node: Functions Summary688761 -Node: Library Functions691538 -Ref: Library Functions-Footnote-1695086 -Ref: Library Functions-Footnote-2695229 -Node: Library Names695404 -Ref: Library Names-Footnote-1699198 -Ref: Library Names-Footnote-2699425 -Node: General Functions699521 -Node: Strtonum Function700715 -Node: Assert Function703797 -Node: Round Function707249 -Node: Cliff Random Function708827 -Node: Ordinal Functions709860 -Ref: Ordinal Functions-Footnote-1712969 -Ref: Ordinal Functions-Footnote-2713221 -Node: Join Function713435 -Ref: Join Function-Footnote-1715238 -Node: Getlocaltime Function715442 -Node: Readfile Function719216 -Node: Shell Quoting721245 -Node: Isnumeric Function722701 -Node: Data File Management724113 -Node: Filetrans Function724745 -Node: Rewind Function729039 -Node: File Checking731018 -Ref: File Checking-Footnote-1732390 -Node: Empty Files732597 -Node: Ignoring Assigns734664 -Node: Getopt Function736238 -Ref: Getopt Function-Footnote-1752072 -Node: Passwd Functions752284 -Ref: Passwd Functions-Footnote-1761466 -Node: Group Functions761554 -Ref: Group Functions-Footnote-1769692 -Node: Walking Arrays769905 -Node: Library Functions Summary772953 -Node: Library Exercises774377 -Node: Sample Programs774864 -Node: Running Examples775646 -Node: Clones776398 -Node: Cut Program777670 -Node: Egrep Program788111 -Node: Id Program797428 -Node: Split Program807542 -Ref: Split Program-Footnote-1817777 -Node: Tee Program817964 -Node: Uniq Program820873 -Node: Wc Program828738 -Node: Bytes vs. Characters829133 -Node: Using extensions830735 -Node: wc program831515 -Node: Miscellaneous Programs836521 -Node: Dupword Program837750 -Node: Alarm Program839813 -Node: Translate Program844726 -Ref: Translate Program-Footnote-1849467 -Node: Labels Program849745 -Ref: Labels Program-Footnote-1853186 -Node: Word Sorting853278 -Node: History Sorting857472 -Node: Extract Program859747 -Node: Simple Sed868016 -Node: Igawk Program871232 -Ref: Igawk Program-Footnote-1886479 -Ref: Igawk Program-Footnote-2886685 -Ref: Igawk Program-Footnote-3886815 -Node: Anagram Program886942 -Node: Signature Program890038 -Node: Programs Summary891290 -Node: Programs Exercises892548 -Ref: Programs Exercises-Footnote-1896864 -Node: Advanced Features896950 -Node: Nondecimal Data899444 -Node: Boolean Typed Values901074 -Node: Array Sorting903049 -Node: Controlling Array Traversal903778 -Ref: Controlling Array Traversal-Footnote-1912285 -Node: Array Sorting Functions912407 -Ref: Array Sorting Functions-Footnote-1918526 -Node: Two-way I/O918734 -Ref: Two-way I/O-Footnote-1926729 -Ref: Two-way I/O-Footnote-2926920 -Node: TCP/IP Networking927002 -Node: Profiling930182 -Node: Persistent Memory939892 -Ref: Persistent Memory-Footnote-1948850 -Node: Extension Philosophy948981 -Node: Advanced Features Summary950516 -Node: Internationalization952786 -Node: I18N and L10N954492 -Node: Explaining gettext955187 -Ref: Explaining gettext-Footnote-1961340 -Ref: Explaining gettext-Footnote-2961535 -Node: Programmer i18n961700 -Ref: Programmer i18n-Footnote-1966813 -Node: Translator i18n966862 -Node: String Extraction967698 -Ref: String Extraction-Footnote-1968876 -Node: Printf Ordering968974 -Ref: Printf Ordering-Footnote-1971836 -Node: I18N Portability971904 -Ref: I18N Portability-Footnote-1974478 -Node: I18N Example974549 -Ref: I18N Example-Footnote-1977949 -Ref: I18N Example-Footnote-2978025 -Node: Gawk I18N978142 -Node: I18N Summary978798 -Node: Debugger980199 -Node: Debugging981223 -Node: Debugging Concepts981672 -Node: Debugging Terms983498 -Node: Awk Debugging986111 -Ref: Awk Debugging-Footnote-1987088 -Node: Sample Debugging Session987228 -Node: Debugger Invocation987780 -Node: Finding The Bug989409 -Node: List of Debugger Commands996095 -Node: Breakpoint Control997472 -Node: Debugger Execution Control1001304 -Node: Viewing And Changing Data1004784 -Node: Execution Stack1008522 -Node: Debugger Info1010203 -Node: Miscellaneous Debugger Commands1014502 -Node: Readline Support1019755 -Node: Limitations1020701 -Node: Debugging Summary1023345 -Node: Namespaces1024648 -Node: Global Namespace1025775 -Node: Qualified Names1027220 -Node: Default Namespace1028255 -Node: Changing The Namespace1029030 -Node: Naming Rules1030724 -Node: Internal Name Management1032639 -Node: Namespace Example1033709 -Node: Namespace And Features1036292 -Node: Namespace Summary1037749 -Node: Arbitrary Precision Arithmetic1039262 -Node: Computer Arithmetic1040781 -Ref: table-numeric-ranges1044598 -Ref: table-floating-point-ranges1045096 -Ref: Computer Arithmetic-Footnote-11045755 -Node: Math Definitions1045814 -Ref: table-ieee-formats1048859 -Node: MPFR features1049433 -Node: MPFR On Parole1049886 -Ref: MPFR On Parole-Footnote-11050730 -Node: MPFR Intro1050889 -Node: FP Math Caution1052579 -Ref: FP Math Caution-Footnote-11053653 -Node: Inexactness of computations1054030 -Node: Inexact representation1055061 -Node: Comparing FP Values1056444 -Node: Errors accumulate1057702 -Node: Strange values1059169 -Ref: Strange values-Footnote-11061835 -Node: Getting Accuracy1061940 -Node: Try To Round1064677 -Node: Setting precision1065584 -Ref: table-predefined-precision-strings1066289 -Node: Setting the rounding mode1068174 -Ref: table-gawk-rounding-modes1068556 -Ref: Setting the rounding mode-Footnote-11072614 -Node: Arbitrary Precision Integers1072797 -Ref: Arbitrary Precision Integers-Footnote-11076009 -Node: Checking for MPFR1076165 -Node: POSIX Floating Point Problems1077655 -Ref: POSIX Floating Point Problems-Footnote-11082519 -Node: Floating point summary1082557 -Node: Dynamic Extensions1084821 -Node: Extension Intro1086420 -Node: Plugin License1087728 -Node: Extension Mechanism Outline1088541 -Ref: figure-load-extension1088992 -Ref: figure-register-new-function1090577 -Ref: figure-call-new-function1091687 -Node: Extension API Description1093811 -Node: Extension API Functions Introduction1095540 -Ref: table-api-std-headers1097438 -Node: General Data Types1101902 -Ref: General Data Types-Footnote-11111070 -Node: Memory Allocation Functions1111385 -Ref: Memory Allocation Functions-Footnote-11116110 -Node: Constructor Functions1116209 -Node: API Ownership of MPFR and GMP Values1120114 -Node: Registration Functions1121675 -Node: Extension Functions1122379 -Node: Exit Callback Functions1127955 -Node: Extension Version String1129274 -Node: Input Parsers1129969 -Node: Output Wrappers1144613 -Node: Two-way processors1149461 -Node: Printing Messages1151822 -Ref: Printing Messages-Footnote-11153036 -Node: Updating ERRNO1153191 -Node: Requesting Values1153990 -Ref: table-value-types-returned1154743 -Node: Accessing Parameters1155852 -Node: Symbol Table Access1157136 -Node: Symbol table by name1157652 -Ref: Symbol table by name-Footnote-11160863 -Node: Symbol table by cookie1160995 -Ref: Symbol table by cookie-Footnote-11165276 -Node: Cached values1165340 -Ref: Cached values-Footnote-11168984 -Node: Array Manipulation1169141 -Ref: Array Manipulation-Footnote-11170244 -Node: Array Data Types1170281 -Ref: Array Data Types-Footnote-11173103 -Node: Array Functions1173203 -Node: Flattening Arrays1178232 -Node: Creating Arrays1185284 -Node: Redirection API1190134 -Node: Extension API Variables1193155 -Node: Extension Versioning1193880 -Ref: gawk-api-version1194317 -Node: Extension GMP/MPFR Versioning1196105 -Node: Extension API Informational Variables1197811 -Node: Extension API Boilerplate1198972 -Node: Changes from API V11203108 -Node: Finding Extensions1204742 -Node: Extension Example1205317 -Node: Internal File Description1206141 -Node: Internal File Ops1210465 -Ref: Internal File Ops-Footnote-11222023 -Node: Using Internal File Ops1222171 -Ref: Using Internal File Ops-Footnote-11224602 -Node: Extension Samples1224880 -Node: Extension Sample File Functions1226449 -Node: Extension Sample Fnmatch1234587 -Node: Extension Sample Fork1236182 -Node: Extension Sample Inplace1237458 -Node: Extension Sample Ord1241130 -Node: Extension Sample Readdir1242006 -Ref: table-readdir-file-types1242903 -Node: Extension Sample Revout1244041 -Node: Extension Sample Rev2way1244638 -Node: Extension Sample Read write array1245390 -Node: Extension Sample Readfile1248664 -Node: Extension Sample Time1249795 -Node: Extension Sample API Tests1252085 -Node: gawkextlib1252593 -Node: Extension summary1255629 -Node: Extension Exercises1259487 -Node: Language History1260765 -Node: V7/SVR3.11262479 -Node: SVR41264829 -Node: POSIX1266361 -Node: BTL1267786 -Node: POSIX/GNU1268555 -Node: Feature History1275086 -Node: Common Extensions1294653 -Node: Ranges and Locales1296130 -Ref: Ranges and Locales-Footnote-11300931 -Ref: Ranges and Locales-Footnote-21300958 -Ref: Ranges and Locales-Footnote-31301197 -Node: Contributors1301420 -Node: History summary1307625 -Node: Installation1309071 -Node: Gawk Distribution1310035 -Node: Getting1310527 -Node: Extracting1311526 -Node: Distribution contents1313238 -Node: Unix Installation1321318 -Node: Quick Installation1322140 -Node: Compiling with MPFR1324686 -Node: Shell Startup Files1325392 -Node: Additional Configuration Options1326549 -Node: Configuration Philosophy1328936 -Node: Compiling from Git1331438 -Node: Building the Documentation1331997 -Node: Non-Unix Installation1333409 -Node: PC Installation1333885 -Node: PC Binary Installation1334758 -Node: PC Compiling1335663 -Node: PC Using1336841 -Node: Cygwin1340569 -Node: MSYS1341825 -Node: OpenVMS Installation1342457 -Node: OpenVMS Compilation1343138 -Ref: OpenVMS Compilation-Footnote-11344621 -Node: OpenVMS Dynamic Extensions1344683 -Node: OpenVMS Installation Details1346319 -Node: OpenVMS Running1348754 -Node: OpenVMS GNV1352891 -Node: Bugs1353646 -Node: Bug definition1354570 -Node: Bug address1358221 -Node: Usenet1361812 -Node: Performance bugs1363043 -Node: Asking for help1366061 -Node: Maintainers1368052 -Node: Other Versions1369079 -Node: Installation summary1378011 -Node: Notes1379395 -Node: Compatibility Mode1380205 -Node: Additions1381027 -Node: Accessing The Source1381972 -Node: Adding Code1383507 -Node: New Ports1390643 -Node: Derived Files1395153 -Ref: Derived Files-Footnote-11401000 -Ref: Derived Files-Footnote-21401035 -Ref: Derived Files-Footnote-31401652 -Node: Future Extensions1401766 -Node: Implementation Limitations1402438 -Node: Extension Design1403680 -Node: Old Extension Problems1404844 -Ref: Old Extension Problems-Footnote-11406420 -Node: Extension New Mechanism Goals1406481 -Ref: Extension New Mechanism Goals-Footnote-11409977 -Node: Extension Other Design Decisions1410178 -Node: Extension Future Growth1412377 -Node: Notes summary1413001 -Node: Basic Concepts1414214 -Node: Basic High Level1414899 -Ref: figure-general-flow1415181 -Ref: figure-process-flow1415888 -Ref: Basic High Level-Footnote-11419289 -Node: Basic Data Typing1419478 -Node: Glossary1422896 -Node: Copying1456018 -Node: GNU Free Documentation License1493779 -Node: Index1519102 +Node: Command Line Field Separator251689 +Node: Full Line Fields255075 +Ref: Full Line Fields-Footnote-1256655 +Ref: Full Line Fields-Footnote-2256701 +Node: Field Splitting Summary256809 +Node: Constant Size259243 +Node: Fixed width data259987 +Node: Skipping intervening263506 +Node: Allowing trailing data264308 +Node: Fields with fixed data265373 +Node: Splitting By Content266999 +Ref: Splitting By Content-Footnote-1271268 +Node: More CSV271431 +Node: FS versus FPAT273084 +Node: Testing field creation274293 +Node: Multiple Line276071 +Node: Getline282553 +Node: Plain Getline285139 +Node: Getline/Variable287789 +Node: Getline/File288986 +Node: Getline/Variable/File290434 +Ref: Getline/Variable/File-Footnote-1292079 +Node: Getline/Pipe292175 +Node: Getline/Variable/Pipe294988 +Node: Getline/Coprocess296171 +Node: Getline/Variable/Coprocess297494 +Node: Getline Notes298260 +Node: Getline Summary301221 +Ref: table-getline-variants301665 +Node: Read Timeout302570 +Ref: Read Timeout-Footnote-1306534 +Node: Retrying Input306592 +Node: Command-line directories307859 +Node: Input Summary308797 +Node: Input Exercises312177 +Node: Printing312617 +Node: Print314560 +Node: Print Examples316066 +Node: Output Separators318919 +Node: OFMT321030 +Node: Printf322453 +Node: Basic Printf323258 +Node: Control Letters324894 +Node: Format Modifiers330363 +Node: Printf Examples336649 +Node: Redirection339194 +Node: Special FD346268 +Ref: Special FD-Footnote-1349558 +Node: Special Files349644 +Node: Other Inherited Files350273 +Node: Special Network351338 +Node: Special Caveats352226 +Node: Close Files And Pipes353209 +Ref: Close Files And Pipes-Footnote-1359345 +Node: Close Return Value359501 +Ref: table-close-pipe-return-values360776 +Ref: Close Return Value-Footnote-1361610 +Node: Noflush361766 +Node: Nonfatal363278 +Node: Output Summary365695 +Node: Output Exercises366981 +Node: Expressions367672 +Node: Values368874 +Node: Constants369552 +Node: Scalar Constants370249 +Ref: Scalar Constants-Footnote-1372827 +Ref: Scalar Constants-Footnote-2373077 +Node: Nondecimal-numbers373157 +Node: Regexp Constants376278 +Node: Using Constant Regexps376824 +Node: Standard Regexp Constants377470 +Node: Strong Regexp Constants380770 +Node: Variables384621 +Node: Using Variables385286 +Node: Assignment Options387266 +Node: Conversion389828 +Node: Strings And Numbers390360 +Ref: Strings And Numbers-Footnote-1393579 +Node: Locale influences conversions393688 +Ref: table-locale-affects396538 +Node: All Operators397181 +Node: Arithmetic Ops397822 +Node: Concatenation400652 +Ref: Concatenation-Footnote-1403602 +Node: Assignment Ops403725 +Ref: table-assign-ops408864 +Node: Increment Ops410246 +Node: Truth Values and Conditions413845 +Node: Truth Values414971 +Node: Typing and Comparison416062 +Node: Variable Typing416898 +Ref: Variable Typing-Footnote-1423560 +Ref: Variable Typing-Footnote-2423640 +Node: Comparison Operators423723 +Ref: table-relational-ops424150 +Node: POSIX String Comparison427836 +Ref: POSIX String Comparison-Footnote-1429595 +Ref: POSIX String Comparison-Footnote-2429738 +Node: Boolean Ops429822 +Ref: Boolean Ops-Footnote-1434515 +Node: Conditional Exp434611 +Node: Function Calls436397 +Node: Precedence440347 +Node: Locales444224 +Node: Expressions Summary445906 +Node: Patterns and Actions448569 +Node: Pattern Overview449711 +Node: Regexp Patterns451437 +Node: Expression Patterns451983 +Node: Ranges455892 +Node: BEGIN/END459070 +Node: Using BEGIN/END459883 +Ref: Using BEGIN/END-Footnote-1462793 +Node: I/O And BEGIN/END462903 +Node: BEGINFILE/ENDFILE465384 +Node: Empty468825 +Node: Using Shell Variables469142 +Node: Action Overview471480 +Node: Statements473915 +Node: If Statement475813 +Node: While Statement477382 +Node: Do Statement479470 +Node: For Statement480656 +Node: Switch Statement484013 +Node: Break Statement486564 +Node: Continue Statement488756 +Node: Next Statement490688 +Node: Nextfile Statement493185 +Node: Exit Statement496046 +Node: Built-in Variables498579 +Node: User-modified499758 +Node: Auto-set507969 +Ref: Auto-set-Footnote-1526068 +Ref: Auto-set-Footnote-2526286 +Node: ARGC and ARGV526342 +Node: Pattern Action Summary530781 +Node: Arrays533397 +Node: Array Basics534774 +Node: Array Intro535624 +Ref: figure-array-elements537640 +Ref: Array Intro-Footnote-1540509 +Node: Reference to Elements540641 +Node: Assigning Elements543163 +Node: Array Example543658 +Node: Scanning an Array545627 +Node: Controlling Scanning548724 +Ref: Controlling Scanning-Footnote-1555370 +Node: Numeric Array Subscripts555694 +Node: Uninitialized Subscripts557968 +Node: Delete559647 +Ref: Delete-Footnote-1562461 +Node: Multidimensional562518 +Node: Multiscanning565723 +Node: Arrays of Arrays567395 +Node: Arrays Summary572295 +Node: Functions574484 +Node: Built-in575544 +Node: Calling Built-in576733 +Node: Boolean Functions578780 +Node: Numeric Functions579350 +Ref: Numeric Functions-Footnote-1583543 +Ref: Numeric Functions-Footnote-2584227 +Ref: Numeric Functions-Footnote-3584279 +Node: String Functions584555 +Ref: String Functions-Footnote-1610786 +Ref: String Functions-Footnote-2610920 +Ref: String Functions-Footnote-3611180 +Node: Gory Details611267 +Ref: table-sub-escapes613174 +Ref: table-sub-proposed614820 +Ref: table-posix-sub616330 +Ref: table-gensub-escapes618018 +Ref: Gory Details-Footnote-1618952 +Node: I/O Functions619106 +Ref: table-system-return-values625793 +Ref: I/O Functions-Footnote-1627964 +Ref: I/O Functions-Footnote-2628112 +Node: Time Functions628232 +Ref: Time Functions-Footnote-1639388 +Ref: Time Functions-Footnote-2639464 +Ref: Time Functions-Footnote-3639626 +Ref: Time Functions-Footnote-4639737 +Ref: Time Functions-Footnote-5639855 +Ref: Time Functions-Footnote-6640090 +Node: Bitwise Functions640372 +Ref: table-bitwise-ops640974 +Ref: Bitwise Functions-Footnote-1647228 +Ref: Bitwise Functions-Footnote-2647407 +Node: Type Functions647604 +Node: I18N Functions651197 +Node: User-defined652940 +Node: Definition Syntax653760 +Ref: Definition Syntax-Footnote-1659588 +Node: Function Example659665 +Ref: Function Example-Footnote-1662644 +Node: Function Calling662666 +Node: Calling A Function663260 +Node: Variable Scope664230 +Node: Pass By Value/Reference667284 +Node: Function Caveats670016 +Ref: Function Caveats-Footnote-1672111 +Node: Return Statement672235 +Node: Dynamic Typing675290 +Node: Indirect Calls677682 +Node: Functions Summary688841 +Node: Library Functions691618 +Ref: Library Functions-Footnote-1695166 +Ref: Library Functions-Footnote-2695309 +Node: Library Names695484 +Ref: Library Names-Footnote-1699278 +Ref: Library Names-Footnote-2699505 +Node: General Functions699601 +Node: Strtonum Function700795 +Node: Assert Function703877 +Node: Round Function707329 +Node: Cliff Random Function708907 +Node: Ordinal Functions709940 +Ref: Ordinal Functions-Footnote-1713049 +Ref: Ordinal Functions-Footnote-2713301 +Node: Join Function713515 +Ref: Join Function-Footnote-1715318 +Node: Getlocaltime Function715522 +Node: Readfile Function719296 +Node: Shell Quoting721325 +Node: Isnumeric Function722781 +Node: Data File Management724193 +Node: Filetrans Function724825 +Node: Rewind Function729119 +Node: File Checking731098 +Ref: File Checking-Footnote-1732470 +Node: Empty Files732677 +Node: Ignoring Assigns734744 +Node: Getopt Function736318 +Ref: Getopt Function-Footnote-1752152 +Node: Passwd Functions752364 +Ref: Passwd Functions-Footnote-1761546 +Node: Group Functions761634 +Ref: Group Functions-Footnote-1769772 +Node: Walking Arrays769985 +Node: Library Functions Summary773033 +Node: Library Exercises774457 +Node: Sample Programs774944 +Node: Running Examples775726 +Node: Clones776478 +Node: Cut Program777750 +Node: Egrep Program788191 +Node: Id Program797508 +Node: Split Program807622 +Ref: Split Program-Footnote-1817857 +Node: Tee Program818044 +Node: Uniq Program820953 +Node: Wc Program828818 +Node: Bytes vs. Characters829213 +Node: Using extensions830815 +Node: wc program831595 +Node: Miscellaneous Programs836601 +Node: Dupword Program837830 +Node: Alarm Program839893 +Node: Translate Program844806 +Ref: Translate Program-Footnote-1849547 +Node: Labels Program849825 +Ref: Labels Program-Footnote-1853266 +Node: Word Sorting853358 +Node: History Sorting857552 +Node: Extract Program859827 +Node: Simple Sed868096 +Node: Igawk Program871312 +Ref: Igawk Program-Footnote-1886559 +Ref: Igawk Program-Footnote-2886765 +Ref: Igawk Program-Footnote-3886895 +Node: Anagram Program887022 +Node: Signature Program890118 +Node: Programs Summary891370 +Node: Programs Exercises892628 +Ref: Programs Exercises-Footnote-1896944 +Node: Advanced Features897030 +Node: Nondecimal Data899524 +Node: Boolean Typed Values901154 +Node: Array Sorting903129 +Node: Controlling Array Traversal903858 +Ref: Controlling Array Traversal-Footnote-1912365 +Node: Array Sorting Functions912487 +Ref: Array Sorting Functions-Footnote-1918606 +Node: Two-way I/O918814 +Ref: Two-way I/O-Footnote-1926809 +Ref: Two-way I/O-Footnote-2927000 +Node: TCP/IP Networking927082 +Node: Profiling930262 +Node: Persistent Memory939972 +Ref: Persistent Memory-Footnote-1948930 +Node: Extension Philosophy949061 +Node: Advanced Features Summary950596 +Node: Internationalization952866 +Node: I18N and L10N954572 +Node: Explaining gettext955267 +Ref: Explaining gettext-Footnote-1961420 +Ref: Explaining gettext-Footnote-2961615 +Node: Programmer i18n961780 +Ref: Programmer i18n-Footnote-1966893 +Node: Translator i18n966942 +Node: String Extraction967778 +Ref: String Extraction-Footnote-1968956 +Node: Printf Ordering969054 +Ref: Printf Ordering-Footnote-1971916 +Node: I18N Portability971984 +Ref: I18N Portability-Footnote-1974558 +Node: I18N Example974629 +Ref: I18N Example-Footnote-1978029 +Ref: I18N Example-Footnote-2978105 +Node: Gawk I18N978222 +Node: I18N Summary978878 +Node: Debugger980279 +Node: Debugging981303 +Node: Debugging Concepts981752 +Node: Debugging Terms983578 +Node: Awk Debugging986191 +Ref: Awk Debugging-Footnote-1987168 +Node: Sample Debugging Session987308 +Node: Debugger Invocation987860 +Node: Finding The Bug989489 +Node: List of Debugger Commands996175 +Node: Breakpoint Control997552 +Node: Debugger Execution Control1001384 +Node: Viewing And Changing Data1004864 +Node: Execution Stack1008602 +Node: Debugger Info1010283 +Node: Miscellaneous Debugger Commands1014582 +Node: Readline Support1019835 +Node: Limitations1020781 +Node: Debugging Summary1023425 +Node: Namespaces1024728 +Node: Global Namespace1025855 +Node: Qualified Names1027300 +Node: Default Namespace1028335 +Node: Changing The Namespace1029110 +Node: Naming Rules1030804 +Node: Internal Name Management1032719 +Node: Namespace Example1033789 +Node: Namespace And Features1036372 +Node: Namespace Summary1037829 +Node: Arbitrary Precision Arithmetic1039342 +Node: Computer Arithmetic1040861 +Ref: table-numeric-ranges1044678 +Ref: table-floating-point-ranges1045176 +Ref: Computer Arithmetic-Footnote-11045835 +Node: Math Definitions1045894 +Ref: table-ieee-formats1048939 +Node: MPFR features1049513 +Node: MPFR On Parole1049966 +Ref: MPFR On Parole-Footnote-11050810 +Node: MPFR Intro1050969 +Node: FP Math Caution1052659 +Ref: FP Math Caution-Footnote-11053733 +Node: Inexactness of computations1054110 +Node: Inexact representation1055141 +Node: Comparing FP Values1056524 +Node: Errors accumulate1057782 +Node: Strange values1059249 +Ref: Strange values-Footnote-11061915 +Node: Getting Accuracy1062020 +Node: Try To Round1064757 +Node: Setting precision1065664 +Ref: table-predefined-precision-strings1066369 +Node: Setting the rounding mode1068254 +Ref: table-gawk-rounding-modes1068636 +Ref: Setting the rounding mode-Footnote-11072694 +Node: Arbitrary Precision Integers1072877 +Ref: Arbitrary Precision Integers-Footnote-11076089 +Node: Checking for MPFR1076245 +Node: POSIX Floating Point Problems1077735 +Ref: POSIX Floating Point Problems-Footnote-11082599 +Node: Floating point summary1082637 +Node: Dynamic Extensions1084901 +Node: Extension Intro1086500 +Node: Plugin License1087808 +Node: Extension Mechanism Outline1088621 +Ref: figure-load-extension1089072 +Ref: figure-register-new-function1090657 +Ref: figure-call-new-function1091767 +Node: Extension API Description1093891 +Node: Extension API Functions Introduction1095620 +Ref: table-api-std-headers1097518 +Node: General Data Types1101982 +Ref: General Data Types-Footnote-11111150 +Node: Memory Allocation Functions1111465 +Ref: Memory Allocation Functions-Footnote-11116190 +Node: Constructor Functions1116289 +Node: API Ownership of MPFR and GMP Values1120194 +Node: Registration Functions1121755 +Node: Extension Functions1122459 +Node: Exit Callback Functions1128035 +Node: Extension Version String1129354 +Node: Input Parsers1130049 +Node: Output Wrappers1144693 +Node: Two-way processors1149541 +Node: Printing Messages1151902 +Ref: Printing Messages-Footnote-11153116 +Node: Updating ERRNO1153271 +Node: Requesting Values1154070 +Ref: table-value-types-returned1154823 +Node: Accessing Parameters1155932 +Node: Symbol Table Access1157216 +Node: Symbol table by name1157732 +Ref: Symbol table by name-Footnote-11160943 +Node: Symbol table by cookie1161075 +Ref: Symbol table by cookie-Footnote-11165356 +Node: Cached values1165420 +Ref: Cached values-Footnote-11169064 +Node: Array Manipulation1169221 +Ref: Array Manipulation-Footnote-11170324 +Node: Array Data Types1170361 +Ref: Array Data Types-Footnote-11173183 +Node: Array Functions1173283 +Node: Flattening Arrays1178312 +Node: Creating Arrays1185364 +Node: Redirection API1190214 +Node: Extension API Variables1193235 +Node: Extension Versioning1193960 +Ref: gawk-api-version1194397 +Node: Extension GMP/MPFR Versioning1196185 +Node: Extension API Informational Variables1197891 +Node: Extension API Boilerplate1199052 +Node: Changes from API V11203188 +Node: Finding Extensions1204822 +Node: Extension Example1205397 +Node: Internal File Description1206221 +Node: Internal File Ops1210545 +Ref: Internal File Ops-Footnote-11222103 +Node: Using Internal File Ops1222251 +Ref: Using Internal File Ops-Footnote-11224682 +Node: Extension Samples1224960 +Node: Extension Sample File Functions1226529 +Node: Extension Sample Fnmatch1234667 +Node: Extension Sample Fork1236262 +Node: Extension Sample Inplace1237538 +Node: Extension Sample Ord1241210 +Node: Extension Sample Readdir1242086 +Ref: table-readdir-file-types1242983 +Node: Extension Sample Revout1244121 +Node: Extension Sample Rev2way1244718 +Node: Extension Sample Read write array1245470 +Node: Extension Sample Readfile1248744 +Node: Extension Sample Time1249875 +Node: Extension Sample API Tests1252165 +Node: gawkextlib1252673 +Node: Extension summary1255709 +Node: Extension Exercises1259567 +Node: Language History1260845 +Node: V7/SVR3.11262559 +Node: SVR41264909 +Node: POSIX1266441 +Node: BTL1267866 +Node: POSIX/GNU1268635 +Node: Feature History1275166 +Node: Common Extensions1294733 +Node: Ranges and Locales1296210 +Ref: Ranges and Locales-Footnote-11301011 +Ref: Ranges and Locales-Footnote-21301038 +Ref: Ranges and Locales-Footnote-31301277 +Node: Contributors1301500 +Node: History summary1307705 +Node: Installation1309151 +Node: Gawk Distribution1310115 +Node: Getting1310607 +Node: Extracting1311606 +Node: Distribution contents1313318 +Node: Unix Installation1321398 +Node: Quick Installation1322220 +Node: Compiling with MPFR1324766 +Node: Shell Startup Files1325472 +Node: Additional Configuration Options1326629 +Node: Configuration Philosophy1329016 +Node: Compiling from Git1331518 +Node: Building the Documentation1332077 +Node: Non-Unix Installation1333489 +Node: PC Installation1333965 +Node: PC Binary Installation1334838 +Node: PC Compiling1335743 +Node: PC Using1336921 +Node: Cygwin1340649 +Node: MSYS1341905 +Node: OpenVMS Installation1342537 +Node: OpenVMS Compilation1343218 +Ref: OpenVMS Compilation-Footnote-11344701 +Node: OpenVMS Dynamic Extensions1344763 +Node: OpenVMS Installation Details1346399 +Node: OpenVMS Running1348834 +Node: OpenVMS GNV1352971 +Node: Bugs1353726 +Node: Bug definition1354650 +Node: Bug address1358301 +Node: Usenet1361892 +Node: Performance bugs1363123 +Node: Asking for help1366141 +Node: Maintainers1368132 +Node: Other Versions1369159 +Node: Installation summary1378091 +Node: Notes1379475 +Node: Compatibility Mode1380285 +Node: Additions1381107 +Node: Accessing The Source1382052 +Node: Adding Code1383587 +Node: New Ports1390723 +Node: Derived Files1395233 +Ref: Derived Files-Footnote-11401080 +Ref: Derived Files-Footnote-21401115 +Ref: Derived Files-Footnote-31401732 +Node: Future Extensions1401846 +Node: Implementation Limitations1402518 +Node: Extension Design1403760 +Node: Old Extension Problems1404924 +Ref: Old Extension Problems-Footnote-11406500 +Node: Extension New Mechanism Goals1406561 +Ref: Extension New Mechanism Goals-Footnote-11410057 +Node: Extension Other Design Decisions1410258 +Node: Extension Future Growth1412457 +Node: Notes summary1413081 +Node: Basic Concepts1414294 +Node: Basic High Level1414979 +Ref: figure-general-flow1415261 +Ref: figure-process-flow1415968 +Ref: Basic High Level-Footnote-11419369 +Node: Basic Data Typing1419558 +Node: Glossary1422976 +Node: Copying1456098 +Node: GNU Free Documentation License1493859 +Node: Index1519182 End Tag Table diff --git a/doc/gawk.texi b/doc/gawk.texi index 54ac53b3..55c99d2e 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -8162,7 +8162,8 @@ Many CSV files are imported from systems where the line terminator for text files is a carriage-return--line-feed pair (CR-LF, @samp{\r} followed by @samp{\n}). For ease of use, when processing CSV files, @command{gawk} simply -strips out any carriage-return characters in the input. +includes the carriage-return character in the record terminator +when it occurs immediately prior to a line-feed character in the input. @docbook </sidebar> @@ -8183,7 +8184,8 @@ Many CSV files are imported from systems where the line terminator for text files is a carriage-return--line-feed pair (CR-LF, @samp{\r} followed by @samp{\n}). For ease of use, when processing CSV files, @command{gawk} simply -strips out any carriage-return characters in the input. +includes the carriage-return character in the record terminator +when it occurs immediately prior to a line-feed character in the input. @end cartouche @end ifnotdocbook diff --git a/doc/gawktexi.in b/doc/gawktexi.in index 01c37578..44a7d551 100644 --- a/doc/gawktexi.in +++ b/doc/gawktexi.in @@ -7718,7 +7718,8 @@ Many CSV files are imported from systems where the line terminator for text files is a carriage-return--line-feed pair (CR-LF, @samp{\r} followed by @samp{\n}). For ease of use, when processing CSV files, @command{gawk} simply -strips out any carriage-return characters in the input. +includes the carriage-return character in the record terminator +when it occurs immediately prior to a line-feed character in the input. @end sidebar The behavior of the @code{split()} function (not formally discussed @@ -3855,14 +3855,6 @@ csvscan(IOBUF *iop, struct recmatch *recm, SCANSTATE *state) while (*bp != rs) { if (*bp == '\"') in_quote = ! in_quote; - else if (*bp == '\r') { // strip CRs - size_t count = (iop->dataend - bp); - - // shift it all down by one - memmove(bp, bp + 1, count); - iop->dataend--; - bp--; // compensate for the upcoming bp++ - } bp++; } } while (in_quote && bp < iop->dataend && bp++); @@ -3871,8 +3863,15 @@ csvscan(IOBUF *iop, struct recmatch *recm, SCANSTATE *state) recm->len = bp - recm->start; if (bp < iop->dataend) { /* found it in the buffer */ - recm->rt_start = bp; - recm->rt_len = 1; + if (bp > iop->off && bp[-1] == '\r') { + /* handle CR LF conventional CSV record terminator */ + recm->rt_start = bp - 1; + recm->rt_len = 2; + } + else { + recm->rt_start = bp; + recm->rt_len = 1; + } *state = NOSTATE; return REC_OK; } else { |