summaryrefslogtreecommitdiff
path: root/ext/intl/breakiterator/breakiterator_class.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Generate ext/intl class entries from stubsMáté Kocsis2021-02-091-16/+6
| | | | Closes GH-6670
* Clean up BreakIterator create_object handlerNikita Popov2020-08-251-3/+3
| | | | | | Use standard zend_object_alloc() function and fix the object_init_properties() call (which works out okay because there are no properties).
* Introduce InternalIteratorNikita Popov2020-06-241-2/+1
| | | | | | | | | | | | | | | | | | | Userland classes that implement Traversable must do so either through Iterator or IteratorAggregate. The same requirement does not exist for internal classes: They can implement the internal get_iterator mechanism, without exposing either the Iterator or IteratorAggregate APIs. This makes them usable in get_iterator(), but incompatible with any Iterator based APIs. A lot of internal classes do this, because exposing the userland APIs is simply a lot of work. This patch alleviates this issue by providing a generic InternalIterator class, which acts as an adapater between get_iterator and Iterator, and can be easily used by many internal classes. At the same time, we extend the requirement that Traversable implies Iterator or IteratorAggregate to internal classes as well. Closes GH-5216.
* Generate method entries for ext/intlMáté Kocsis2020-04-141-57/+3
| | | | Closes GH-5370
* Add a ZEND_UNCOMPARABLE valueNikita Popov2020-03-311-3/+0
| | | | | | To explicitly indicate that objects are uncomparable. For now this has no functional difference from the usual 1 return value, but makes intent clearer.
* Get rid of method mapping of BreakIterator classesMáté Kocsis2020-02-251-29/+29
|
* Add stubs for Intl BreakIteratorMáté Kocsis2020-02-251-64/+29
| | | | Closes GH-5207
* Comparison cleanup:Dmitry Stogov2019-10-071-1/+6
| | | | | - introduce zend_compare() that returns -1,0,1 dirctly (without intermediate zval) - remove compare_objects() object handler, and keep only compare() handler
* Remove mention of PHP major version in Copyright headersGabriel Caruso2019-09-251-2/+0
| | | | Closes GH-4732.
* Refactor zend_object_handlers API to pass zend_object* and zend_string* ↵Dmitry Stogov2019-02-041-5/+5
| | | | insted of zval(s).
* Require ICU ≥ 50.1Christoph M. Becker2018-09-151-2/+0
| | | | | | | | | | | | | | | | | Given that ICU is a set of lively developed libraries, that ICU 50.1 has been released on 2012-11-05, and PHP 7.4 is scheduled to be released seven years after it, we consider it appropriate to ditch these legacy versions. Particularly, that would be a reasonable groundwork to implement part two of the “Deprecate and remove INTL_IDNA_VARIANT_2003” RFC[1], namely to default idn_to_ascii()'s and idn_to_utf8()'s $variant parameter to INTL_IDNA_VARIANT_UTS46, which is not defined in ICU < 4.6. See also the related discussion on internals@[2]. [1] <https://wiki.php.net/rfc/deprecate-and-remove-intl_idna_variant_2003> [2] <http://news.php.net/php.internals/101626>ff
* Merge branch 'PHP-7.2'Christoph M. Becker2018-06-301-1/+1
|\ | | | | | | | | * PHP-7.2: Fix #76556: get_debug_info handler for BreakIterator shows wrong type
| * Fix #76556: get_debug_info handler for BreakIterator shows wrong typeChristoph M. Becker2018-06-301-1/+1
| | | | | | | | | | | | | | We use the retrieved type for the "type" element instead of the text. This has been confused during the PHP 7 upgrade[1]. [1] http://git.php.net/?p=php-src.git;a=commit;h=1d793348067e5769144c0f7efd86428a4137baec
* | Export standard object handlers, to avoid indirect accessDmitry Stogov2018-05-311-1/+1
| |
* | Simplify namespace accessAnatol Belski2018-04-011-1/+1
| | | | | | | | The icu namespace is an alias which resolves to the real namespace.
* | Utilize the recommended way to handle the icu namespaceAnatol Belski2018-03-311-0/+1
| |
* | Refactored array creation API. array_init() and array_init_size() are ↵Dmitry Stogov2017-09-201-2/+1
|/ | | | converted into macros calling zend_new_array(). They are not functions anymore and don't return any values.
* Remove useless dtor handlers in intlNikita Popov2016-07-161-8/+0
| | | | These are only indirections to the default handler
* Use ZSTR_ API to access zend_string elements (this is just renaming without ↵Dmitry Stogov2015-06-301-1/+1
| | | | semantick changes).
* first shot remove TSRMLS_* thingsAnatol Belski2014-12-131-31/+31
|
* s/PHP 5/PHP 7/Johannes Schlüter2014-09-191-1/+1
|
* master renames phase 1Anatol Belski2014-08-251-26/+26
|
* basic macro replacements, all at onceAnatol Belski2014-08-191-25/+25
|
* Fixed temporarily un-expected object re-initXinchen Hui2014-06-291-2/+4
|
* Fixed get_debug_infoXinchen Hui2014-06-281-10/+15
|
* Refactoring ext/intl (only compilerable now, far to finish :<)Xinchen Hui2014-06-281-1/+1
|
* Refactoring ext/intl (incompleted)Xinchen Hui2014-06-281-50/+31
|
* Cleanup (1-st round)Dmitry Stogov2014-04-151-1/+1
|
* Use better data structures (incomplete)Dmitry Stogov2014-02-101-2/+2
|
* intl: remove extra quotes from arginfo paramsGustavo André dos Santos Lopes2013-07-211-8/+8
|
* Fix arginfo of BreakIterator::getLocaleGustavo Lopes2013-01-291-1/+1
|
* BreakIterator: fix compat with old ICU versionsGustavo André dos Santos Lopes2012-06-251-1/+3
|
* BreakIterator::getPartsIterator: new optional argGustavo André dos Santos Lopes2012-06-221-1/+5
| | | | | | | | | Can take one of: * IntlPartsIterator::KEY_SEQUENTIAL (keys are 0, 1, ...) * IntlPartsIterator::KEY_LEFT (keys are left boundaries) * IntlPartsIterator::KEY_LEFT (keys are right boundaries) The default is IntlPartsIterator::KEY_SEQUENTIAL (the previous behavior).
* Added IntlCodePointBreakIterator.Gustavo André dos Santos Lopes2012-06-221-1/+23
| | | | | | | | | | | | | | | | | Objects of this class can be instantiated with IntlBreakIterator::createCodePointInstance() The method does not take a locale, as it would not make sense in this context. This class has one additional method: long IntlCodePointIterator::getLastCodePoint() which returns either -1 or the last code point we moved over, if any (and discounting any movement before the last call to IntlBreakIterator::first() or IntlBreakIterator::last()).
* Add Intl prefix to BreakIterator/RuleBasedBIGustavo André dos Santos Lopes2012-06-101-4/+4
|
* Remove trailing spaceGustavo André dos Santos Lopes2012-06-101-8/+8
|
* BreakIter: Removed getAvailableLocales/getHashCodeGustavo André dos Santos Lopes2012-06-101-2/+0
|
* BreakIterator: add rules status constantsGustavo André dos Santos Lopes2012-06-041-0/+29
|
* BreakIterator and RuleBasedBreakiterator addedGustavo André dos Santos Lopes2012-06-041-0/+342
This commit adds wrappers for the classes BreakIterator and RuleBasedbreakIterator. The C++ ICU classes are described here: <http://icu-project.org/apiref/icu4c/classBreakIterator.html> <http://icu-project.org/apiref/icu4c/classRuleBasedBreakIterator.html> Additionally, a tutorial is available at: <http://userguide.icu-project.org/boundaryanalysis> This implementation wraps UTF-8 text in a UText. The text is iterated without any copying or conversion to UTF-16. There is also no validation that the input is actually UTF-8; where there are malformed sequences, the UText will simply U+FFFD. The class BreakIterator cannot be instantiated directly (has a private constructor). It provides the interface exposed by the ICU abstract class with the same name. The PHP class is not abstract because we may use it to wrap native subclasses of BreakIterator that we don't know how to wrap. This class includes methods to move the iterator position to the beginning (first()), to the end (last()), forward (next()), backwards (previous()), to the boundary preceding a certain position (preceding()) and following a certain position (following()) and to obtain the current position (current()). next() can also be used to advance or recede an arbitrary number of positions. BreakIterator also exposes other native methods: getAvailableLocales(), getLocale() and factory methods to build several predefined types of BreakIterators: createWordInstance() for word boundaries, createCharacterInstance() for locale dependent notions of "characters", createSentenceInstance() for sentences, createLineInstance() and createTitleInstance() -- for title casing breaks. These factories currently return RuleBasedbreakIterators where the names of the rule sets are found in the ICU data, observing the passed locale (although the locale is taken into considering there are very few exceptions to the root rules). The clone and compare_object PHP object handlers are also implemented, though the comparison does not yield meaningful results when used with >, <, >= and <=. Note that BreakIterator is an iterator only in the sense of the first 'Iterator' in 'IteratorIterator', i.e., it does not implement the Iterator interface. The reason is that there is no sensible implementation for Iterator::key(). Using it for an ordinal of the current boundary is not feasible because we are allowed to move to any boundary at any time. It we were to determine the current ordinal when last() is called we'd have to traverse the whole input text to find out how many breaks there were before. Therefore, BreakIterator implements only Traversable. It can be wrapped in an IteratorIterator, but the usual warnings apply. Finally, I added a convenience method to BreakIterator: getPartsIterator(). This provides an IntlIterator, backed by the BreakIterator PHP object (i.e. moving the pointer or changing the text in BreakIterator affects the iterator and also moving the iterator affects the backing BreakIterator), which allows traversing the text between each boundary. This iterator uses the original text to retrieve the text between two positions, not the code points returned by the wrapping UText. Therefore, if the text includes invalid code unit sequences, these invalid sequences will be in the output of this iterator, not U+FFFD code points. The class RuleBasedIterator exposes a constructor that allows building an iterator from arbitrary compiled or non-compiled rules. The form of these rules in described in the tutorial linked above. The rest of the methods allow retrieving the rules -- getRules() and getCompiledRules() --, a hash code of the rule set (hashCode()) and the rules statuses (getRuleStatus() and getRuleStatusVec()). Because the RuleBasedBreakIterator constructor may return parse errors, I reuse the UParseError to text function that was in the transliterator files. Therefore, I move that function to intl_error.c. common_enum.cpp was also changed, mainly to expose previously static functions. This avoided code duplication when implementing the BreakIterator iterator and the IntlIterator returned by BreakIterator::getPartsIterator().