Python 2 Unicode strings can be used as inputs to char * or std::string types

Requires SWIG_PYTHON_2_UNICODE to be defined when compiling generated code.
author: William S Fulton <wsf@fultondesigns.co.uk> 2015-12-19 03:52:33 +0000
committer: William S Fulton <wsf@fultondesigns.co.uk> 2015-12-19 03:55:26 +0000
commit: 01611702ec04fa70445fd2c7d37b9b312d3f7561 (patch)
tree: 94118fe116503d25354f9603e873c4aae743c8bc
parent: 291186cfaf39497a42f6ed6395ddaeb2b466ed04 (diff)
download: swig-01611702ec04fa70445fd2c7d37b9b312d3f7561.tar.gz
5 files changed, 88 insertions, 0 deletions
diff --git a/CHANGES.current b/CHANGES.current
index a0e6dfa2b..050ff54cc 100644
--- a/CHANGES.current
+++ b/CHANGES.current
@@ -5,6 +5,10 @@ See the RELEASENOTES file for a summary of changes in each release.
 Version 3.0.8 (in progress)
 ===========================
 
+2015-12-19: wsfulton
+            [Python] Python 2 Unicode UTF-8 strings can be used as inputs to char * or
+            std::string types if the generated C/C++ code has SWIG_PYTHON_2_UNICODE defined.
+
 2015-12-17: wsfulton
             Issues #286, #128
             Remove ccache-swig.1 man page - please use the CCache.html docs instead.
diff --git a/Doc/Manual/Contents.html b/Doc/Manual/Contents.html
index 21ba6eaad..6d2cdaa76 100644
--- a/Doc/Manual/Contents.html
+++ b/Doc/Manual/Contents.html
@@ -1598,6 +1598,7 @@
 <li><a href="Python.html#Python_nn75">Buffer interface</a>
 <li><a href="Python.html#Python_nn76">Abstract base classes</a>
 <li><a href="Python.html#Python_nn77">Byte string output conversion</a>
+<li><a href="Python.html#Python_2_unicode">Python 2 Unicode</a>
 </ul>
 </ul>
 </div>
diff --git a/Doc/Manual/Python.html b/Doc/Manual/Python.html
index 962ee6843..c5219b693 100644
--- a/Doc/Manual/Python.html
+++ b/Doc/Manual/Python.html
@@ -122,6 +122,7 @@
 <li><a href="#Python_nn75">Buffer interface</a>
 <li><a href="#Python_nn76">Abstract base classes</a>
 <li><a href="#Python_nn77">Byte string output conversion</a>
+<li><a href="#Python_2_unicode">Python 2 Unicode</a>
 </ul>
 </ul>
 </div>
@@ -6163,6 +6164,71 @@ For more details about the <tt>surrogateescape</tt> error handler, please see
 <a href="https://www.python.org/dev/peps/pep-0383/">PEP 383</a>.
 </p>
 
+<H3><a name="Python_2_unicode"></a>36.12.5 Python 2 Unicode</H3>
+
+
+<p>
+A Python 3 string is a Unicode string so by default a Python 3 string that contains Unicode
+characters passed to C/C++ will be accepted and converted to a C/C++ string
+(<tt>char *</tt> or <tt>std::string</tt> types).
+A Python 2 string is not a unicode string by default and should a Unicode string be
+passed to C/C++ it will fail to convert to a C/C++ string
+(<tt>char *</tt> or <tt>std::string</tt> types).
+The Python 2 behavior can be made more like Python 3 by defining
+<tt>SWIG_PYTHON_2_UNICODE</tt> when compiling the generated C/C++ code.
+By default when the following is wrapped:
+</p>
+
+<div class="code"><pre>
+%module unicode_strings
+char *charstring(char *s) {
+  return s;
+}
+</pre></div>
+
+<p>
+An error will occur when using Unicode strings in Python 2:
+</p>
+
+<div class="targetlang"><pre>
+&gt;&gt;&gt; from unicode_strings import *
+&gt;&gt;&gt; charstring("hi")
+'hi'
+&gt;&gt;&gt; charstring(u"hi")
+Traceback (most recent call last):
+  File "&lt;stdin&gt;", line 1, in ?
+TypeError: in method 'charstring', argument 1 of type 'char *'
+</pre></div>
+
+<p>
+When the <tt>SWIG_PYTHON_2_UNICODE</tt> macro is added to the generated code:
+</p>
+
+<div class="code"><pre>
+%module unicode_strings
+%begin %{
+#define SWIG_PYTHON_2_UNICODE
+%}
+
+char *charstring(char *s) {
+  return s;
+}
+</pre></div>
+
+<p>
+Unicode strings will be successfully accepted and converted from UTF-8,
+but note that they are returned as a normal Python 2 string:
+</p>
+
+<div class="targetlang"><pre>
+&gt;&gt;&gt; from unicode_strings import *
+&gt;&gt;&gt; charstring("hi")
+'hi'
+&gt;&gt;&gt; charstring(u"hi")
+'hi'
+&gt;&gt;&gt;
+</pre></div>
+
 </body>
 </html>
 
diff --git a/Examples/test-suite/python/unicode_strings_runme.py b/Examples/test-suite/python/unicode_strings_runme.py
index e1fc7adec..3ce98bcdb 100644
--- a/Examples/test-suite/python/unicode_strings_runme.py
+++ b/Examples/test-suite/python/unicode_strings_runme.py
@@ -12,3 +12,12 @@ if sys.version_info[0:2] >= (3, 1):
         raise ValueError('Test comparison mismatch')
     if unicode_strings.non_utf8_std_string() != test_string:
         raise ValueError('Test comparison mismatch')
+
+# Testing SWIG_PYTHON_2_UNICODE flag which allows unicode strings to be passed to C
+if sys.version_info[0:2] < (3, 0):
+    assert unicode_strings.charstring("hello1") == "hello1"
+    assert unicode_strings.charstring(str(u"hello2")) == "hello2"
+    assert unicode_strings.charstring(u"hello3") == "hello3"
+    assert unicode_strings.charstring(unicode("hello4")) == "hello4"
+    unicode_strings.charstring(u"hell\xb05")
+    unicode_strings.charstring(u"hell\u00f66")
diff --git a/Examples/test-suite/unicode_strings.i b/Examples/test-suite/unicode_strings.i
index 56063c8a4..9be3748e6 100644
--- a/Examples/test-suite/unicode_strings.i
+++ b/Examples/test-suite/unicode_strings.i
@@ -2,6 +2,10 @@
 
 %include <std_string.i>
 
+%begin %{
+#define SWIG_PYTHON_2_UNICODE
+%}
+
 %inline %{
 
 const char* non_utf8_c_str(void) {
@@ -12,4 +16,8 @@ std::string non_utf8_std_string(void) {
         return std::string("h\xe9llo w\xc3\xb6rld");
 }
 
+char *charstring(char *s) {
+  return s;
+}
+
 %}
author	William S Fulton <wsf@fultondesigns.co.uk>	2015-12-19 03:52:33 +0000
committer	William S Fulton <wsf@fultondesigns.co.uk>	2015-12-19 03:55:26 +0000
commit	01611702ec04fa70445fd2c7d37b9b312d3f7561 (patch)
tree	94118fe116503d25354f9603e873c4aae743c8bc
parent	291186cfaf39497a42f6ed6395ddaeb2b466ed04 (diff)
download	swig-01611702ec04fa70445fd2c7d37b9b312d3f7561.tar.gz