summaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright for 2019Bruce Momjian2019-01-021-1/+1
| | | | Backpatch-through: certain files through 9.4
* Restrict vertical tightness to parentheses in Perl codeAndrew Dunstan2018-05-091-2/+4
| | | | | | | | | | | | | | | | | The vertical tightness settings collapse vertical whitespace between opening and closing brackets (parentheses, square brakets and braces). This can make data structures in particular harder to read, and is not very consistent with our style in non-Perl code. This patch restricts that setting to parentheses only, and reformats all the perl code accordingly. Not applying this to parentheses has some unfortunate effects, so the consensus is to keep the setting for parentheses and not for the others. The diff for this patch does highlight some places where structures should have trailing commas. They can be added manually, as there is no automatic tool to do so. Discussion: https://postgr.es/m/a2f2b87c-56be-c070-bfc0-36288b4b41c1@2ndQuadrant.com
* Update copyright for 2018Bruce Momjian2018-01-021-1/+1
| | | | Backpatch-through: certain files through 9.3
* Avoid putting build-location-dependent strings into generated files.Tom Lane2017-12-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various Perl scripts we use to generate files were in the habit of printing things like "generated by $0" into their output files. That looks like a fine idea at first glance, but it results in non-reproducible output, because in VPATH builds $0 won't be just the name of the script file, but a full path for it. We'd prefer that you get identical results whether using VPATH or not, so this is a bad thing. Some of these places also printed their input file name(s), causing an additional hazard of the same type. Hence, establish a policy that thou shalt not print $0, nor input file pathnames, into output files (they're still allowed in error messages, though). Instead just write the script name verbatim. While we are at it, we can make these annotations more useful by giving the script's full relative path name within the PG source tree, eg instead of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl. Not all of the changes made here actually affect any files shipped in finished tarballs today, but it seems best to apply the policy everyplace so that nobody copies unsafe code into places where it could matter. Christoph Berg and Tom Lane Discussion: https://postgr.es/m/20171215102223.GB31812@msg.df7cb.de
* Post-PG 10 beta1 pgperltidy runBruce Momjian2017-05-171-8/+9
|
* Use radix tree for character encoding conversions.Heikki Linnakangas2017-03-131-3/+7
| | | | | | | | | | | | | | | | | | | Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
* Small fixes to the Perl scripts to create unicode conversion tables.Heikki Linnakangas2017-02-011-1/+1
| | | | | | | | | Add missing semicolons in UCS_to_* perl scripts. For consistency, use "$hashref->{key}" style everywhere. Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170130.153738.139030994.horiguchi.kyotaro@lab.ntt.co.jp
* Update copyright via script for 2017Bruce Momjian2017-01-031-1/+1
|
* Make all unicode perl scripts to use strict, rearrange logic for clarity.Heikki Linnakangas2016-11-301-9/+9
| | | | | | | The loops were a bit difficult to understand, due to breaking out of them early. Also fix things that perlcritic complained about. Daniel Gustafsson
* Rewrite the perl scripts to produce our Unicode conversion tables.Heikki Linnakangas2016-11-301-103/+51
| | | | | | | | | | | | | | | | | Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi
* Fix commentsPeter Eisentraut2016-02-291-5/+1
| | | | | Some of these comments were copied and pasted without updating them, some of them were duplicates.
* Update copyright for 2016Bruce Momjian2016-01-021-1/+1
| | | | Backpatch certain files through 9.1
* Auto-generate file header comments in Unicode mapping files.Tom Lane2015-11-271-0/+4
| | | | | | | | | | | | | | | Some of the Unicode/*.map files had identification comments added to them, evidently by hand. Others did not. Modify the generating scripts to produce these comments automatically, and update the generated files that lacked them. This is just minor cleanup as a by-product of trying to verify that the *.map files can indeed be reproduced from authoritative data. There are a depressingly large number that fail to reproduce from the claimed sources. I have not touched those in this commit, except for the JIS 2004-related files which required only a single comment update to match. Since this only affects comments, no need to consider a back-patch.
* Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.Tom Lane2015-05-141-2/+2
| | | | | | | | | | | | | | | | | | | | | Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.
* Update copyright for 2015Bruce Momjian2015-01-061-1/+1
| | | | Backpatch certain files through 9.0
* Update copyright for 2014Bruce Momjian2014-01-071-1/+1
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-011-1/+1
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Run newly-configured perltidy script on Perl files.Bruce Momjian2012-07-041-30/+46
| | | | Run on HEAD and 9.2.
* Update copyright notices for year 2012.Bruce Momjian2012-01-011-1/+1
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-011-1/+1
|
* Remove useless whitespace at end of linesPeter Eisentraut2010-11-231-1/+1
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-201-1/+1
|
* Update copyright for the year 2010.Bruce Momjian2010-01-021-2/+2
|
* Update copyright for 2009.Bruce Momjian2009-01-011-2/+2
|
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-011-2/+2
|
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-051-2/+2
| | | | back-stamped for this.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-051-2/+2
|
* Rename canonical encodings, per Peter:Bruce Momjian2005-03-071-2/+2
| | | | | | | | | UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.
* Some more missed copyright notices. Many of these look like theyTom Lane2005-01-011-2/+2
| | | | | should have been caught by the src/tools/copyright script ... why weren't they?
* Update copyright to 2004.Bruce Momjian2004-08-291-2/+2
|
* make sure the $Id tags are converted to $PostgreSQL as well ...PostgreSQL Daemon2003-11-291-1/+1
|
* Fix some copyright notices that weren't updated. Improve copyright toolTom Lane2003-08-041-2/+2
| | | | so it won't miss 'em again.
* Correction for mathematical properties in Unicode converison maps.Tatsuo Ishii2001-04-161-3/+3
| | | | Patches contributed by Eiji Tokuya (e-tokuya@sankyo-unyu.co.jp)
* Add support for code conversion between Unicode and other encodings.Tatsuo Ishii2000-10-301-0/+112
Supported encodings are: EUC_JP, EUC_CN, EUC_KR, EUC_TW, Shift JIS, Big5, ISO8859-[1-5]. TODO: testings! and documentations...