diff options
author | Nobody <nobody@swig.org> | 2004-12-14 21:43:06 +0000 |
---|---|---|
committer | Nobody <nobody@swig.org> | 2004-12-14 21:43:06 +0000 |
commit | be88310c7f302e32660f5af737f277cc878a7098 (patch) | |
tree | 6e2a7a025e0ed74f9fc9d5f042fd460855a49ac5 /swigweb/article_cpp.ht | |
parent | 1594f6de8a996d2a810e8434b4ffd33b7f1d7dbc (diff) | |
download | swig-be88310c7f302e32660f5af737f277cc878a7098.tar.gz |
This commit was manufactured by cvs2svn to create tag 'rel-1-3-24'.v1.3.24rel-1-3-24
git-svn-id: https://swig.svn.sourceforge.net/svnroot/swig/tags/rel-1-3-24@6879 626c5289-ae23-0410-ae9c-e8d60b6d4f22
Diffstat (limited to 'swigweb/article_cpp.ht')
-rw-r--r-- | swigweb/article_cpp.ht | 275 |
1 files changed, 0 insertions, 275 deletions
diff --git a/swigweb/article_cpp.ht b/swigweb/article_cpp.ht deleted file mode 100644 index 11ff34b59..000000000 --- a/swigweb/article_cpp.ht +++ /dev/null @@ -1,275 +0,0 @@ -Thoughts on the Insanity C++ Parsing - -<h2>Thoughts on the Insanity of C++ Parsing</h2> - -<center> -<em> -"Parsing C++ is simply too complex to do correctly." -- Anonymous -</em> -</center> -<p> -Author: David Beazley (beazley@cs.uchicago.edu) - -<p> -August 12, 2002 - -<p> -A central goal of the SWIG project is to generate extension modules by -parsing the contents of C++ header files. It's not too hard to come up -with reasons why this might be useful---after all, if you've got -several hundred class definitions, do you really want to go off and -write a bunch of hand-crafted wrappers? No, of course not---you're -busy and like everyone else, you've got better things to do with -your time. - -<p> -Okay, so there are many reasons why parsing C++ would be nice. -However, parsing C++ is also a nightmare. In fact, C++ would -probably the last language that any normal person would choose to -serve as an interface specification language. It's hard to parse, -hard to analyze, and it involves all sorts -of nasty little problems related to scoping, typenames, templates, -access, and so forth. Because of this, most of the tools that claim -to "parse" C++ don't. Instead, they parse a subset of the language -that happens to match the C++ programming style used by the tool's -creator (believe me, I know---this is how SWIG started). Not -surprisingly, these tools tend to break down when presented with code -that starts to challenge the capabilities of the C++ compiler. -Needless to say, critics see this as opportunity to make bold claims -such as "writing a C++ parser is folly" or "this whole approach is too -hard to ever work correctly." - -<p> -Well, one does have to give the critics a little credit---writing a -C++ parser certainly <em>is</em> hard and writing a parser that -actually works correctly is even harder. However, these tasks are -certainly not "impossible." After all, there would be no working C++ -compiler if such claims were true! Therefore, the question of whether -or not a wrapper generator can parse C++ is clearly the wrong question -to ask. Instead, the real question is whether or not a wrapper -generation tool that parses C++ can actually do anything useful. - -<h3>The problem with using C++ as an interface definition language</h3> - -If you cut through all of the low-level details of parsing, the primary -problem of using C++ as an module specification language is that of -ambiguity. Consider a declaration like this: - -<blockquote> -<pre> -void foo(double *x, int n); -</pre> -</blockquote> - -If you look at this declaration, you can ask yourself the question, -what is "x"? Is it a single input value? Is it an output value -(modified by the function)? Is it an array? Is "n" somehow related? -Perhaps the real problem in this example is that of expressing the -programmer's intent. Yes, the function clearly accepts a pointer to -some object and an integer, but the declaration does not contain -enough additional information to determine the purpose of these -parameters--information that could be useful in generating a suitable -set of a wrappers. - -<p> -IDL compilers associated with popular component frameworks (e.g., -CORBA, COM, etc.) get around this problem by requiring interfaces to -be precisely specified--input and output values are clearly indicated -as such. Thus, one might adopt a similar approach and extend C++ -syntax with some special modifiers or qualifiers. For example: - -<blockquote> -<pre> -void foo(%output double *x, int n); -</pre> -</blockquote> - -The problem with this approach is that it breaks from C++ syntax and -it requires the user to annotate their input files (a task that C++ -wrapper generators are supposed to eliminate). Meanwhile, critics sit -back and say "Ha! I told you C++ parsing would never work." - -<p> -Another problem with using C++ as an input language is that interface -building often involves more than just blindly wrapping declarations. For instance, -users might want to rename declarations, specify exception handling procedures, -add customized code, and so forth. This suggests that a -wrapper generator really needs to do -more than just parse C++---it must give users the freedom to customize -various aspects of the wrapper generation process. Again, things aren't -looking too good for C++. - -<h3>The SWIG approach: pattern matching</h3> - -SWIG takes a different approach to the C++ wrapping problem. -Instead of trying to modify C++ with all sorts of little modifiers and -add-ons, wrapping is largely controlled by a pattern matching mechanism that is -built into the underlying C++ type system. - -<p> -One part of the pattern matcher is programmed to look for specific sequences of -datatypes and argument names. These patterns, known as typemaps, are -responsible for all aspects of data conversion. They work by simply attaching -bits of C conversion code to specific datatypes and argument names in the -input file. For example, a typemap might be used like this: - -<blockquote> -<pre> -%typemap(in) <b>double *items</b> { - // Get an array from the input - ... -} -... -void foo(<b>double *items</b>, int n); -</pre> -</blockquote> - -With this approach, type and argument names are used as -a basis for specifying customized wrapping behavior. For example, if a program -always used an argument of <tt>double *items</tt> to refer to an -array, SWIG can latch onto that and use it to provide customized -processing. It is even possible to write pattern matching rules for -sequences of arguments. For example, you could write the following: - -<blockquote> -<pre> -%typemap(in) (<b>double *items, int n</b>) { - // Get an array of items. Set n to number of items - ... -} -... -void foo(<b>double *items, int n</b>); -</pre> -</blockquote> - -The precise details of typemaps are not so important (in fact, most of -this pattern matching is hidden from SWIG users). What is important -is that pattern matching allows customized data handling to be -specified without breaking C++ syntax--instead, a user merely has to -define a few patterns that get applied across the declarations that -appear in C++ header files. In some sense, you might view this -approach as providing customization through naming conventions rather than -having to annotate arguments with extra qualifiers. - -<p> -The other pattern matching mechanism used by SWIG is a declaration annotator -that is used to attach properties to specific declarations. A simple example of declaration -annotation might be renaming. For example: - -<blockquote> -<pre> -%rename(cprint) print; // Rename all occurrences of 'print' to 'cprint' -</pre> -</blockquote> - -A more advanced form of declaration matching would be exception handling. -For example: - -<blockquote> -<pre> -%exception Foo::getitem(int) { - try { - $action - } catch (std::out_of_range& e) { - SWIG_exception(SWIG_IndexError,const_cast<char*>(e.what())); - } -} - -... -template<class T> class Foo { -public: - ... - T &getitem(int index); // Exception handling code attached - ... -}; -</pre> -</blockquote> - -Like typemaps, declaration matching does not break from C++ syntax. -Instead, a user merely specifies special processing rules in advance. -These rules are then attached to any matching C++ -declaration that appears later in the input. This means that raw C++ -header files can often be parsed and customized with few, if any, -modifications. - -<h3>The SWIG difference</h3> - -Pattern based approaches to wrapper code generation are not unique to SWIG. -However, most prior efforts have based their pattern matching engines on simple -regular-expression matching. The key difference between SWIG and these systems -is that SWIG's customization features are fully integrated into the -underlying C++ type system. This means that SWIG is able to deal with very -complicated types of C/C++ code---especially code that makes heavy use of -<tt>typedef</tt>, namespaces, aliases, class hierarchies, and more. To -illustrate, consider some code like this: - -<blockquote> -<pre> -// A simple SWIG typemap -%typemap(in) int { - $1 = PyInt_AsLong($input); -} - -... -// Some raw C++ code (included later) -namespace X { - typedef int Integer; - - class _FooImpl { - public: - typedef Integer value_type; - }; - typedef _FooImpl Foo; -} - -namespace Y = X; -using Y::Foo; - -class Bar : public Foo { -}; - -void spam(Bar::value_type x); -</pre> -</blockquote> - -If you trace your way through this example, you will find that the -<tt>Bar::value_type</tt> argument to function <tt>spam()</tt> is -really an integer. What's more, if you take a close look at the SWIG -generated wrappers, you will find that the typemap pattern defined for -<tt>int</tt> is applied to it--in other words, SWIG does exactly the right thing despite -our efforts to make the code confusing. - -<p> -Similarly, declaration annotation is integrated into the type system -and can be used to define properties that span inheritance hierarchies -and more (in fact, there are many similarities between the operation of -SWIG and tools developed for Aspect Oriented Programming). - -<h3>What does this mean?</h3> - -Pattern-based approaches allow wrapper generation tools to parse C++ -declarations and to provide a wide variety of high-level customization -features. Although this approach is quite different than that found -in a typical IDL, the use of patterns makes it possible to work from -existing header files without having to make many (if any) changes to -those files. Moreover, when the underlying pattern matching mechanism -is integrated with the C++ type system, it is possible to build -reliable wrappers to real software---even if that software is filled -with namespaces, templates, classes, <tt>typedef</tt> declarations, -pointers, and other bits of nastiness. - -<h3>The bottom line</h3> - -Not only is it possible to generate extension modules by parsing C++, -it is possible to do so with real software and with a high degree of -reliability. Don't believe me? Download SWIG-1.3.14 and try it for -yourself. - - - - - - - - - |