summaryrefslogtreecommitdiff
path: root/Objects/unicodeobject.c
Commit message (Collapse)AuthorAgeFilesLines
* Removing UTF-16 aware Unicode comparison code. This kind of compareMarc-André Lemburg2000-08-081-0/+33
| | | | | | | function (together with other locale aware ones) should into a new collation support module. See python-dev for a discussion of this removal. Note: This patch should also be applied to the 1.6 branch.
* This patch finalizes the move from UTF-8 to a default encoding inMarc-André Lemburg2000-08-031-40/+40
| | | | | | | | | | | | | | | | | | the Python Unicode implementation. The internal buffer used for implementing the buffer protocol is renamed to defenc to make this change visible. It now holds the default encoded version of the Unicode object and is calculated on demand (NULL otherwise). Since the default encoding defaults to ASCII, this will mean that Unicode objects which hold non-ASCII characters will no longer work on C APIs using the "s" or "t" parser markers. C APIs must now explicitly provide Unicode support via the "u", "U" or "es"/"es#" parser markers in order to work with non-ASCII Unicode strings. (Note: this patch will also have to be applied to the 1.6 branch of the CVS tree.)
* Changing the CNRI copyright notice according to CNRI's instructions.Guido van Rossum2000-08-031-1/+1
| | | | | This is a notice without a date, which apparently is not a claim to copyright but only advice to the reader. IANAL. :-)
* merge Include/my*.h into Include/pyport.hPeter Schneider-Kamp2000-07-311-1/+0
| | | | marked my*.h as obsolete
* Miscelaneous ANSIfications. I'm assuming here 'main' should take (int,Thomas Wouters2000-07-221-20/+4
| | | | | char**) and return an int even on PC platforms. If not, please fix PC/utils/makesrc.c ;-P
* Fixed problems with UTF error reporting macros and some formatting bugs.Marc-André Lemburg2000-07-171-45/+64
|
* gcc is being stupid with if/else constructsGreg Stein2000-07-171-6/+14
| | | | clean out some other warnings
* stop messing around with goto and just write the macro correctly.Greg Stein2000-07-161-7/+6
|
* - change \x to mean "byte" also in unicode literalsFredrik Lundh2000-07-161-3/+5
| | | | (patch #100912)
* Fix fatal compiler (MSVC6) error:Tim Peters2000-07-161-0/+1
| | | | | unicodeobject.c(735) : error C2143: syntax error : missing ';' before '}'
* Fix to a bug found by Florian Weimer:Marc-André Lemburg2000-07-161-1/+2
| | | | | | | | | | | | | | | | The UTF-8 decoder is still buggy (i.e. it doesn't pass Markus Kuhn's stress test), mainly due to the following construct: #define UTF8_ERROR(details) do { \ if (utf8_decoding_error(&s, &p, errors, details)) \ goto onError; \ continue; \ } while (0) (The "continue" statement is supposed to exit from the outer loop, but of course, it doesn't. Indeed, this is a marvelous example of the dangers of the C programming language and especially of the C preprocessor.)
* Spelling fixes supplied by Rob W. W. Hooft. All these are fixes in eitherThomas Wouters2000-07-161-2/+2
| | | | | | | | | | comments, docstrings or error messages. I fixed two minor things in test_winreg.py ("didn't" -> "Didn't" and "Didnt" -> "Didn't"). There is a minor style issue involved: Guido seems to have preferred English grammar (behaviour, honour) in a couple places. This patch changes that to American, which is the more prominent style in the source. I prefer English myself, so if English is preferred, I'd be happy to supply a patch myself ;)
* replace PyXXX_Length calls with PyXXX_Size callsJeremy Hylton2000-07-121-1/+1
|
* Jeremy Hylton:Marc-André Lemburg2000-07-111-2/+4
| | | | better error message for unicode coercion failure
* - changed hash calculation for unicode strings. the newFredrik Lundh2000-07-101-18/+20
| | | | | | | | | | value is calculated from the character values, in a way that makes sure an 8-bit ASCII string and a unicode string with the same contents get the same hash value. (as a side effect, this also works for ISO Latin 1 strings). for more details, see the python-dev discussion.
* New surrogate support in the UTF-8 codec. By Bill Tutt.Marc-André Lemburg2000-07-071-29/+80
|
* Added new API PyUnicode_FromEncodedObject() which supports decodingMarc-André Lemburg2000-07-071-6/+49
| | | | | | objects including instance objects. The old API PyUnicode_FromObject() is still available as shortcut.
* Fix to bug #393 (UTF16 codec didn't like empty strings) andMarc-André Lemburg2000-07-071-7/+6
| | | | | corrected some usage of 'unsigned long' where Py_UNICODE should have been used.
* Two more places where long should be used instead of int. EspeciallySjoerd Mullender2000-07-071-2/+2
| | | | true after revision 2.36 was checked in...
* Fixed some code that used 'short' to use 'long' instead.Marc-André Lemburg2000-07-061-3/+3
|
* Fixed a couple of places where 'int' was used where 'long'Marc-André Lemburg2000-07-061-7/+7
| | | | should have been used.
* Added new .isalpha() and .isalnum() methods which provide interfacesMarc-André Lemburg2000-07-051-0/+66
| | | | to the new alphabetic lookup APIs in unicodectype.c.
* Bill Tutt:Marc-André Lemburg2000-07-041-6/+29
| | | | | Make unicode_compare a true UTF-16 compare function (includes support for surrogates).
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-1/+1
| | | | A previous patch by Jack Jansen was accidently reverted.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-301-23/+61
| | | | | | New buffer overflow checks for formatting strings. By Trent Mick.
* Jack Jansen: Use include "" instead of <>; and staticforward declarationsGuido van Rossum2000-06-291-1/+1
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-281-0/+121
| | | | | | | | Patch to the standard unicode-escape codec which dynamically loads the Unicode name to ordinal mapping from the module ucnhash. By Bill Tutt.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-281-1/+4
| | | | | Better error message for "1 in unicodestring". Submitted by Andrew Kuchling.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-181-6/+4
| | | | | | | | Fixed a bug in PyUnicode_Count() which would have caused a core dump in case of substring coercion failure. Synchronized .count() with the string method of the same name to return len(s)+1 for s.count('').
* Vladimir MARANGOZOV <Vladimir.Marangozov@inrialpes.fr>:Marc-André Lemburg2000-06-171-3/+4
| | | | | This patch fixes an optimisation mystery in _PyUnicodeNew causing segfaults on AIX when the interpreter is compiled with -O.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-141-0/+28
| | | | Added code so that .isXXX() testing returns 0 for emtpy strings.
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-101-2/+1
| | | | | Fixed a typo and removed a debug printf(). Thanks to Finn Bock for finding these.
* Patch from Michael Hudson: improve unclear error messageAndrew M. Kuchling2000-06-091-1/+1
|
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-081-8/+29
| | | | | | | | Fixed %c formatting to check for one character arguments. Thanks to Finn Bock for finding this bug. Added a fix for bug PR#348 which originated from not resetting the globals correctly in _PyUnicode_Fini().
* Marc-Andre Lemburg <mal@lemburg.com>:Marc-André Lemburg2000-06-071-1/+1
| | | | | | | | | | Change the default encoding to 'ascii' (it was previously defined as UTF-8). Note: The implementation still uses UTF-8 to implement the buffer protocol, so C APIs will still see UTF-8. This is on purpose: rather than fixing the Unicode implementation, the C APIs should be made Unicode aware.
* Minimal change so I can add the rest of MAL's checkin message:Fred Drake2000-05-091-1/+1
| | | | | M.-A. Lemburg <mal@lemburg.com>: Fixed a core dump in PyUnicode_Format().
* M.-A. Lemburg <mal@lemburg.com>:Fred Drake2000-05-091-20/+71
| | | | | | | Added support for user settable default encodings. The current implementation uses a per-process global which defines the value of the encoding parameter in case it is set to NULL (meaning: use the default encoding).
* Trent Mick:Guido van Rossum2000-05-091-7/+14
| | | | | | | | | | | Fix the string methods that implement slice-like semantics with optional args (count, find, endswith, etc.) to properly handle indeces outside [INT_MIN, INT_MAX]. Previously the "i" formatter for PyArg_ParseTuple was used to get the indices. These could overflow. This patch changes the string methods to use the "O&" formatter with the slice_index() function from ceval.c which is used to do the same job for Python code slices (e.g. 'abcabcabc'[0:1000000000L]).
* Mark Hammond should get his act into gear (his words :-). Zero lengthGuido van Rossum2000-05-041-2/+7
| | | | strings _are_ valid!
* Fix warning detected by VC++ on assignment of Py_UNICODE to char.Guido van Rossum2000-05-031-1/+1
|
* Vladimir Marangozov's long-awaited malloc restructuring.Guido van Rossum2000-05-031-9/+8
| | | | | | | | | | For more comments, read the patches@python.org archives. For documentation read the comments in mymalloc.h and objimpl.h. (This is not exactly what Vladimir posted to the patches list; I've made a few changes, and Vladimir sent me a fix in private email for a problem that only occurs in debug mode. I'm also holding back on his change to main.c, which seems unnecessary to me.)
* Mark Hammond withdraws his fix -- the size includes the trailing 0 soGuido van Rossum2000-05-031-7/+2
| | | | a size of 0 *is* illegal.
* Mark Hammond:Guido van Rossum2000-05-031-2/+7
| | | | Fixes the MBCS codec to work correctly with zero length strings.
* Marc-Andre Lemburg:Guido van Rossum2000-05-011-4/+4
| | | | | Fixed \OOO interpretation for Unicode objects. \777 now correctly produces the Unicode character with ordinal 511.
* Marc-Andre Lemburg:Guido van Rossum2000-04-271-14/+16
| | | | | | | Fixed a reference leak in the allocator. Renamed utf8_string to _PyUnicode_AsUTF8String() and made it external for use by other parts of the interpreter.
* Marc-Andre Lemburg:Guido van Rossum2000-04-111-13/+13
| | | | | | The maxsplit functionality in .splitlines() was replaced by the keepends functionality which allows keeping the line end markers together with the string.
* Marc-Andre Lemburg:Guido van Rossum2000-04-101-22/+79
| | | | | | | | | | | | | | | | | | | * New exported API PyUnicode_Resize() * The experimental Keep-Alive optimization was turned back on after some tweaks to the implementation. It should now work without causing core dumps... this has yet to tested though (switching it off is easy: see the unicodeobject.c file for details). * Fixed a memory leak in the Unicode freelist cleanup code. * Added tests to correctly process the return code from _PyUnicode_Resize(). * Fixed a bug in the 'ignore' error handling routines of some builtin codecs. Added test cases for these to test_unicode.py.
* Skip Montanaro: add string precisions to calls to PyErr_FormatGuido van Rossum2000-04-101-22/+22
| | | | to prevent possible buffer overruns.
* Conrad Huang points out that "if (0 < ch < 256)", while legal C,Guido van Rossum2000-04-061-1/+1
| | | | doesn't mean what the Python programmer thought...
* Fredrik Lundh: eliminate a MSVC compiler warning.Guido van Rossum2000-04-051-1/+1
|