WARNING! This document is historical but left out of the historical directory because it describes the basic architecture of the compiler that mostly still apply as of writing. Refer to the code itself for up to date information such as what AST Nodes are in use. OMG INTERFACE DEFINITION LANGUAGE COMPILER FRONT END PROTOCOLS ============================================================== INTRODUCTION ------------ Welcome to the publicly available source release of SunSoft's implementation of the compiler front end (CFE) for OMG Interface Definition Language! This document explains how to use the release to create a fully functional OMG Interface Definition Language to target language compiler for your selected target system configuration. The section OVERVIEW explains this document's structure. CONTEXT ------- The implementation has three parts: 1. A main program driving the compilation process 2. A parser and attendant utilities for converting the IDL input into an internal form 3. One or more back ends which take as input the internal form representing the IDL input, and which produce output in a target language and target format The release contains components 1 and 2, and a demonstration implementation of component 3. To use this release, you - write a back end which takes the internal representation of the parsed input and translates it to the target language and format. You may replace or modify the demonstration back end provided. - link the back end with the provided main program and parser sources to produce a complete compiler. OVERVIEW -------- This document does not explain IDL nor does it introduce IDL features. For this information, refer to the OMG CORBA specification, available by anonymous FTP from omg.org. This document does not explain C++, except to demonstrate how it is used to construct the CFE. The ARM by Stroustrup and Ellis provides a thorough explanation of C++. This document consists of two independent parts. The first part s all CFE supported protocols and the required application programmer's interface entry points that a conformant BE must provide. The second part steps through the process of constructing a working BE. The first part describes: - The compilation process - The Abstract Syntax Tree (AST) internal representation of parsed IDL input - How access to member data fields is managed - How the AST is generated from the IDL input (Generator protocol) - How definition scopes are nested and how name lookup works - The narrowing mechanism - How definition scopes are managed and how nodes are added to scopes - How BEs get control during the AST construction process (Add protocol) - The inheritance scheme used by the AST and how it affects BEs - How errors are handled and reported - How the CFE is initialized - How the command line arguments are parsed - What global variables and functions are provided - What API is required to be supported by a BE in order to link with the CFE - What files must be included in each BE file The second part describes - The API to be supplied by each BE - How to subclass from the AST to add BE specific functionality - How to subclass from the Generator protocol to create BE specific extended AST nodes - How to write constructors for the derived BE classes - How to use the Add protocol to store BE specific information - How to maintain BE specific information which applies to the entire AST generated from the IDL input - How to use data members in your BE - How to build a complete compiler PART I. FEATURES OF THE CFE -=========================- THE COMPILATION PROCESS ----------------------- The OMG IDL compiler operates as follows: - Parses command line arguments. If an option is directed at a BE, an appropriate operation provided by the BE is invoked to process the option. - Performs global initialization. - Forks a copy of the compiler for each file specified as input. - An ANSI-compatible preprocessor preprocesses the IDL input. - Parses the file using the CFE parser, and constructs an AST describing the IDL input. - Prints the AST for verification, if requested. - Invokes the BE to process the AST and produce the output characteristic of that BE. ABSTRACT SYNTAX TREE -------------------- The AST (Abstract Syntax Tree) is the primary mechanism for communication between a BE and the CFE. It consists of a tree of instances of classes defined in the CFE or refinements of those classes as defined in a BE. The class hierarchy of the AST closely resembles the structure of the IDL syntax. Most AST classes have direct equivalents in IDL constructs. The UTL_Scope class defines common functionality for definition scope management and name lookup. This is explained in a following section. UTL_Scope is defined in include/utl_scope.hh and implemented in util/utl_scope.cc. The AST provides the following classes: AST_Decl Base of the AST class hierarchy. Each class in the AST inherits from AST_Decl. Defined in include/ast_decl.hh and implemented in ast/ast_decl.cc AST_Type Common base class for all classes which represent IDL type constructs. Defined in include/ast_type.hh and implemented in ast/ast_type.cc. Inherits from AST_Decl. AST_ConcreteType Common base class for all classes which represent IDL types other than interfaces. Defined in the file include/ast_concrete_type.hh and implemented in ast/ast_concrete_type.cc. Inherits from AST_Type. AST_PredefinedType Instances of this class represent all predefined types such as long, char and so forth. Defined in the file include/ast_predefined_type.hh and implemented in ast/ast_predefined_type.cc. Inherits from AST_ConcreteType. AST_Module Represents the IDL module construct. Defined in the file include/ast_module.hh and implemented in ast/ast_module.cc. Inherits from AST_Decl and UTL_Scope. AST_Root Represents the root of the abstract syntax tree being constructed. Is a subclass of AST_Module. Can be subclassed in BEs to store information associated with the entire AST. Defined in the file include/ast_root.hh and implemented in ast/ast_root.cc. Inherits from AST_Module. AST_Interface Represents the IDL interface construct. Defined in include/ast_interface.hh and implemented in the file ast/ast_interface.cc. Inherits from AST_Type and UTL_Scope. AST_InterfaceFwd Represents a forward declaration of an IDL interface. Defined in include/ast_interface_fwd.hh and implemented in ast/ast_interface_fwd.cc. Inherits from AST_Decl. AST_Attribute Represents an IDL attribute construct. Defined in include/ast_attribute.hh and implemented in the file ast/ast_attribute.cc. Inherits from AST_Decl. AST_Exception Represents an IDL exception construct. Defined in include/ast_exception.hh and implemented in the file ast/ast_exception.cc. Inherits from AST_Decl. AST_Structure Represents an IDL struct construct. Defined in the file include/ast_structure.hh and implemented in the file ast/ast_structure.cc. Inherits from AST_ConcreteType and UTL_Scope. AST_Field Represents a field in an IDL struct or exception construct. Defined in include/ast_field.hh and implemented in ast/ast_field.cc. Inherits from AST_Decl. AST_Operation Represents an IDL operation construct. Defined in the file include/ast_operation.hh and implemented in ast/ast_operation.cc. Inherits from AST_Decl and UTL_Scope. AST_Argument Represents an argument to an IDL operation construct. Defined in include/ast_argument.hh and implemented in ast/ast_argument.cc. Inherits from AST_Field. AST_Union Represents an IDL union construct. Defined in include/ast_union.hh and implemented in ast/ast_union.cc. Inherits from AST_ConcreteType and from UTL_Scope. AST_UnionBranch Represents an individual branch in an IDL union construct. Defined in include/ast_union_branch.hh and implemented in ast/ast_union_branch.cc. Inherits from AST_Field. AST_UnionLabel Represents the label of an individual branch in an IDL union construct. Defined in include/ast_union_label.hh and implemented in ast/ast_union_label.cc AST_Constant Represents an IDL constant construct. Defined in include/ast_constant.hh and implemented in the file ast/ast_constant.cc. Inherits from AST_Decl. AST_Enum Represents an IDL enum construct. Defined in the file include/ast_enum.hh and implemented in ast/ast_enum.cc. Inherits from AST_ConcreteType and UTL_Scope. AST_EnumVal Represents an enumerator in an IDL enum construct. Defined in include/ast_enum_val.hh and implemented in ast/ast_enum_val.cc. Inherits from AST_Constant. AST_Sequence Represents an IDL sequence construct. Defined in include/ast_sequence.hh and implemented in ast/ast_sequence.cc. Inherits from AST_Decl. AST_String Represents an IDL string construct. Defined in the file include/ast_string.hh and implemented in ast/ast_string.cc. Inherits from AST_Decl. AST_Array Represents an array modifier to the type of an IDL field or typedef declaration. Defined in the file include/ast_array.hh and implemented in ast/ast_array.cc. Inherits from AST_Decl. AST_Typedef Represents an IDL typedef construct. Defined in the file include/ast_typedef.hh and implemented in ast/ast_typedef.cc. Inherits from AST_Decl. AST_Expression Represents an IDL expression. Defined in the file include/ast_expression.hh and implemented in ast/ast_expression.cc. AST_Root A subclass of AST_Module, an instance of this class is used to represent the distinguished root node of the AST. Defined in include/ast_root.hh and implemented in ast/ast_root.cc. Inherits from AST_Module. USING INSTANCE DATA ------------------- The AST classes define member data fields in addition to defining operations on instances. These member data fields are all private, to allow only the instance in which they are stored direct access. Other objects (including other instances of the same class) can obtain access to the member data fields of an instance through accessor functions. These accessor functions allow retrieval of the data, and in some cases update functions are also provided to store new values. There are several reasons why this approach is taken. First, it hides the actual implementation of the member data fields from outside the class. For example, a Thermometer class would not expose whether its temperature reading is stored in Farenheit or Celsius units, and it could allow access through either unit method. Second, protecting access to member data in this manner restricts the ability to update it to the instance itself, save where update functions are explicitly provided. This makes for more reliable implementations, since the manipulation of the data is isolated in the class implementation itself. Third, wrapping a function call around access to member data allows such access and update operations to be protected in a multithreaded environment. While the CFE itself is not multithreaded and the access operations as currently defined do no special work to protect against mutliple conflicting access operations, this may be changed in a future version. Moving the CFE to a multithreaded environment without protecting access to member data in this manner would be extremely difficult. The protocol defined in the CFE is that member data fields are all private and have names which start with the prefix "pd_" (denoting Private Data). The access functions have names which are the same as the name of the field sans the prefix. For example, AST_Decl has a field pd_defined_in and an access function defined_in(). The update functions have names starting with "set_" followed by the name of the corresponding access function. Thus, AST_Decl defines a function set_in_main_file(boolean) which sets the pd_in_main_file data member's value to the boolean provided. GENERATION OF THE AST --------------------- The CFE generates the abstract syntax tree after parsing IDL input. The nodes of the AST are defined by classes introduced in the previous section, or by subclasses thereof as defined by each BE. In writing the CFE, we were faced with the following problem: how to generate the AST containing nodes of the derived classes as defined in each BE without knowledge of the types and conventions of these BE classes. One alternative was to define a naming scheme which predetermines the names of each subclass a BE can define. The AST would then be generated by calling an appropriate constructor on the BE derived class. This scheme suffers from some shortcomings: - It breaks the modularity of the compiler and imports knowledge about types defined in a BE into the CFE, where this information does not belong. - It restricts a compiler to having only one BE loaded at a time because the names of these classes can be in use in only one BE at a time. - It requires a BE to provide derived classes for all AST classes, even for those classes where the BE adds no functionality. The mechanism we chose is different. We define the AST_Generator class which has an operation for each constructor defined on each AST class. The operation takes arguments appropriate to the constructor, invokes it and returns the created AST node, using the type known to the CFE. All such operations on the generator are declared virtual. The names of all operations start with "create_" and contain the name of the construct. Thus, an operation which invokes a constructor of an AST_Module is named create_module. AST_Generator is defined in include/ast_generator.hh and implemented in ast/ast_generator.cc. If a BE derives from any AST class, it must also derive from the AST_Generator class and redefine the relevant operations to invoke constructors of the BE provided class instead of the AST provided class. For example, if BE_Module is a subclass of AST_Module in a BE, the BE would also define BE_Generator and redefine create_module to call the constructor of BE_Module instead of that provided by AST_Module. During initialization, the CFE causes an instance of the BE derived generator to be created and saved. This is explained in the section on REQUIRED ENTRY POINTS SUPPLIED BY A BE. During parsing, actions in the Yacc grammar invoke operations on the saved instance to create new nodes for the AST as it is being built. These operations invoke constructors for BE derived classes or for AST provided classes if they were not overridden. DEFINITION SCOPES ----------------- IDL is a nested scoped language. The scoping rules are defined by the CORBA spec and closely follow those of C++. Scope management is implemented in two classes provided in the utilities library, UTL_Scope and UTL_Stack. UTL_Scope manages associations between names and AST nodes, and UTL_Stack manages scope nesting and entry and exit from definition scopes as the parse is proceeding. UTL_Scope is defined in include/utl_scope.hh and implemented in util/utl_scope.cc. UTL_Stack is defined in include/utl_stack.hh and implemented in util/utl_stack.cc. During initialization, the CFE creates an instance of UTL_Stack and saves it. During parsing, as definition scopes are entered and exited, AST nodes are pushed onto, or popped from, the stack represented by the saved instances. Nodes on the stack are stored as instances of UTL_Scope. Section THE NARROWING MECHANISM explains how to obtain the real type of a node retrieved from the stack. All definition scopes are linked in a tree rooted in the distinguished AST root node. This linkage is implemented by UTL_Scope and AST_Decl. The linkage is a permanent record of the scope nesting while the stack is a dynamic record which at each instant represents the current state of the parse. The nesting information is used to do name lookup. IDL uses scoped names which are concatenations of definition scope names ending with individual construct names. For example, in interface a { struct b { long c; }; const long k = 23; struct s { long ar[k]; }; }; the name a::b::c represents the long field in the struct b inside the interface a. Lookup is performed by searching down the linkage chain for the first component of the name, then, when found, recursively resolving the remaining components in the scope defined by the first component. Lookup is relative to the scope of use; in the above example, k could also have been referred to as a::k within the struct s. Nodes are stored in a definition scope as instances of AST_Decl. Thus, name lookup returns instances of AST_Decl. The next section, THE NARROWING MECHANISM, explains how to obtain the real type of a node retrieved from a definition scope. THE NARROWING MECHANISM ----------------------- Here we give only a cursory explanation of how narrowing works. We concentrate on defining the problem and showing how to use our narrowing mechanism. The narrowing mechanism is defined in include/idl_narrow.hh. As explained above, nodes are stored on the scope stack as instances of UTL_Scope, and inside definition scopes as instances of AST_Decl. Also, nodes are linked in a nesting tree as instances of AST_Decl. Given a node retrieved from the stack or a definition scope, one is faced with the task of obtaining its real class. C++ does not currently provide an implicit mechanism for narrowing to a derived class, so the CFE defines its own mechanism. This mechanism requires some work on your part as BE implementor and requires some explicit code to be written when it is to be used. The class AST_Decl defines an enum whose members encode specific AST node classes. AST_Decl provides an accessor function, node_type(), which retrieves a member of the enum representing the AST type of the node. Thus, if an instance of AST_Decl really is an instance of AST_Module, the node_type() accessor returns AST_Decl::NT_module. The class UTL_Scope also provides an accessor function, scope_node_type(), which returns a member of the enum encoding the actual type of the node. Thus, given an UTL_Scope instance which is really an instance of AST_Operation, scope_node_type() would return AST_Decl::NT_op. Perusing the header files for classes provided by the AST, you will note the use of some macros defined in include/idl_narrow.hh. These macros define the explicit narrowing mechanism: DEF_NARROW_METHODSx(,) for x equal to 0,1,2 or 3, defines a narrowing method for the specified class which has 0,1,2 or 3 immediate base classes from which it inherits. For example, ast_module.hh which defines AST_Module contains the following line: DEF_NARROW_METHODS2(AST_Module, AST_Decl, UTL_Scope) This is because AST_Module inherits directly from AST_Decl and UTL_Scope. DEF_NARROW_FROM_DECL() appears in class definitions for classes which are derived from AST_Decl and which can be stored in a definition scope. This macro declares a static operation narrow_from_decl(AST_Decl *) on the class in which it appears. The operation returns the provided instance as an instance of if it can be narrowed, or NULL. DEF_NARROW_FROM_SCOPE() appears in class definitions of classes which are derived from UTL_Scope and which can be stored on the scope stack. This macro declares a static operation narrow_from_scope(UTL_Scope *) on the class in which it appears. The operation returns the provided instance as an instance of if it can be narrowed, or NULL. Now look in the files implementing these classes. You will note occurrences of the following macros: IMPL_NARROW_METHODSx(,) for x equal to 0,1,2 or 3, implements a narrowing method for the specified class which has 0,1,2 or 3 immediate base classes from which it inherits. For example, ast_module.cc which implements AST_Module contains the following line: IMPL_NARROW_METHODS2(AST_Module, AST_Decl, UTL_Scope) IMPL_NARROW_FROM_DECL() implements a method to narrow from an instance of AST_Decl to an instance of as defined above. IMPL_NARROW_FROM_SCOPE() implements a method to narrow from an instance of UTL_Scope to an instance of as defined above. To put it all together: In the file ast_module.hh, you will find: // Narrowing DEF_NARROW_METHODS2(AST_Module, AST_Decl, UTL_Scope); DEF_NARROW_FROM_DECL(AST_Module); DEF_NARROW_FROM_SCOPE(AST_Module); In the file ast_module.cc, you will see: /* * Narrowing methods */ IMPL_NARROW_METHODS2(AST_Module, AST_Decl, UTL_Scope) IMPL_NARROW_FROM_DECL(AST_Module) IMPL_NARROW_FROM_SCOPE(AST_Module) The CFE uses narrowing internally to obtain the correct type of nodes in the AST. The CFE contains many code fragments such as the following: AST_Decl *d = get_an_AST_Decl_from_somewhere(); AST_Module *m; ... if (d->node_type() == AST_Decl::NT_module) { m = AST_Module::narrow(d); if (m == NULL) { // Narrow failed ... } else { // Success, do normal processing ... } } ... Similar code implements narrowing instances of UTL_Scope to their actual types. In your BE classes which derive from UTL_Scope you must include a line defining how to narrow from a scope, so: DEF_NARROW_FROM_SCOPE() and similarly for your BE classes which derive from AST_Decl. The narrowing mechanism is defined only for narrowing from AST_Decl and UTL_Scope. If your BE class inherits directly from one or more classes which themselves are derived from AST_Decl and/or UTL_Scope, you must include a line DEF_NARROW_METHODSx(,,) To make this concrete, here is what you'd write in a definition of BE_union which inherits from AST_Union: DEF_NARROW_METHODS1(BE_Union, AST_Union); DEF_NARROW_FROM_DECL(BE_Union); DEF_NARROW_FROM_SCOPE(BE_Union); and in the implementation file of BE_Union: /* * Narrowing methods: */ IMPL_NARROW_METHODS1(BE_Union, AST_Union) IMPL_NARROW_FROM_DECL(BE_Union) IMPL_NARROW_FROM_SCOPE(BE_Union) Then, in BE code which expects to see an instance of your derived BE_Union class, you will write: AST_Decl *d = get_an_AST_Decl_from_somewhere(); BE_Union *u; ... if (d->node_type() == AST_Decl::NT_union) { u = BE_Union::narrow_from_decl(d); if (u == NULL) { // Narrow failed ... } else { // Success, do normal processing ... } } ... SCOPE MANAGEMENT ---------------- Instances of classes which are derived from UTL_Scope implement definition scopes. A definition scope can contain any kind of AST node as long as it is derived from AST_Decl. However, specific kinds of definition scopes such as interfaces and unions can contain only a restricted subset of all AST node types. UTL_Scope provides operations to add instances of each AST provided class to a definition scope. The names of these operations are constructed by prepending the string "add_" to the name of the IDL construct. So, to add an interface to a definition scope, invoke the operation add_interface. The operations are all defined virtual and are intended to be overridden in classes derived from UTL_Scope. If the node was successfully added to the definition scope, the node is returned as the result. Otherwise the node is not added to the definition scope and NULL is returned. All add operation implementations in UTL_Scope return NULL. Thus, only the operations which implement legal additions to a specific kind of definition scope must be overridden in the implementation of that definition scope. For example, in AST_Module the add_interface operation is overridden to add the provided instance of AST_Interface to the scope and to return the provided instance if the addition was successful. Operations which were not overridden return NULL to indicate that the addition is illegal in this context. For example, in AST_Operation the definition of add_interface is not overridden since it is illegal to store an interface inside an operation definition scope. The add operations are invoked in the actions in the Yacc grammar. The following fragment is a representative example of code using the add operations: AST_Constant *d = construct_a_new_constant(); ... if (current_scope->add_constant(d) == NULL) { // Failed ... } else { // Succeeded ... } BE INTERACTION DURING THE PARSING PROCESS ----------------------------------------- The add operations can be overridden in BE derived classes to let the BE perform additional house-keeping work during the process of constructing the AST. For example, a BE could keep separate lists of interfaces as they are being added to a module. If you override an add operation in your BE, you must invoke the overridden operation in the superclass of your derived class to allow the CFE to perform its own house-keeping tasks. A good rule is to invoke the operation on the superclass before you do your own processing; then, if the superclass operation returns NULL, this indicates that the addition failed and your own code should immediately return NULL. An example explains this: AST_Interface * BE_Module::add_interface(AST_Interface *i) { if (AST_Module::add_interface(i) == NULL) // Failed, bail out! return NULL; ... // Do your own work here return i; // Return success indication } We strongly advise you to only define add operations that override add operations provided by the AST classes. Add operations which do not override equivalent operations in the AST in effect extend the semantics of the language accepted by the compiler. For example, the CFE does not have an add_interface operation on AST_Operation. If you were to define one in your BE_Operation class, the resulting compiler would allow an interface to be stored in an operation definition scope. The current CORBA specification does not allow this. AST INHERITANCE SCHEME ---------------------- The AST classes all use public virtual inheritance to construct the inheritance tree. This ensures that a class may appear several times in the inheritance tree through different paths and the derived class's instances will have only one copy of the inherited class's data. The use of public virtual inheritance has several important effects on how a BE is constructed. We explain those effects below. First, you must define a default constructor for your BE class, since your class may be used as a virtual base class of some other class. In this case the compiler may want to call a default constructor for your class. It is a good idea to have a default constructor anyway, even if you do not plan to subclass your BE class, since for most C++ compilers this causes the code to be smaller. Your default constructor should initialize all constant data members. Additionally, it may initialize any non-constant data member whose value must be set before the first time the instance is used. Second, the constructor of your BE derived class must explicitly call all constructors of virtual base classes which perform useful work. For example, if a class in the AST from which your BE class inherits has an initializer for a data member, you must call that constructor. This rule is discussed in detail in the C++ ARM. An example may help here. Suppose you define a class BE_attribute which inherits from AST_Attribute. Its constructor should be as follows: BE_Attribute::BE_Attribute(boolean ro, AST_Type *ft, UTL_ScopedName *n, UTL_StrList *p) : AST_Attribute(ro, ft, n, p), AST_Field(ft, n, p), AST_Decl(AST_Decl::NT_attr, n, p) { } The calls to the constructors of AST_Attribute, AST_Field and AST_Decl are needed because these constructors do useful initializations on their classes. Note that there is some redundancy in the data passed to these constructors. We chose to preserve this redundancy since it should be possible to create BEs which subclass only some of the classes supplied by the AST. This means that the constructors on each class provided by the AST should take arguments which are sufficient to construct the instance if the AST class is the most derived one. The code supplied with this release contains a demonstration BE which subclasses all the AST provided classes. The constructors for each class provided by the BE are found in the file be/be_classes.cc. INITIALIZATION -------------- The following steps take place at initialization: - The global data instance is created, stored in idl_global and filled with default values (in driver/drv_init.cc). - The command line arguments are parsed (in driver/drv_args.cc). - For each IDL input file, a copy of the compiler process is forked (in driver/drv_fork.cc). - The IDL input is preprocessed (in driver/drv_preproc.cc). - FE initialization stage 1 is done: the scopes stack is created and stored in the global data variable idl_global->scopes() field (in fe/fe_init.cc). - BE_init is called to create the generator instance and the returned instance is stored in the global data variable idl_global->gen() field. - FE initialization stage 2 is done: the global scope is created, pushed on the scopes stack and populated with predefined types (in fe/fe_init.cc). GLOBAL STATE AND ENTRY POINTS ----------------------------- The CFE has one global variable named idl_global, which stores an instance of a class IDL_GlobalData as explained below: The CFE defines a class IDL_GlobalData which defines the global information used in a specific run of the compiler. IDL_GlobalData is defined in include/idl_global.hh and implemented in the file util/utl_global.cc. Initialization creates an instance of this class and stores it in the value of the global variable idl_global. Thus, the individual pieces of information stored in the instance are accessible everywhere. ERROR HANDLING -------------- All error handling is defined by a class provided by the CFE, UTL_Error. This class is defined in include/utl_error.hh and implemented in the file util/utl_error.cc. The class provides several methods for reporting specific errors as well as generic error reporting methods taking zero to three arguments. The CFE instantiates the class and stores the instance as part of the global state, accessible as idl_global->err(). Thus, to cause an error report, you would write code similar to the following: if (error condition found) idl_global->err()->specific_error_message(arg1, ..); or if (error condition found) idl_global->err()->generic_error_message(flag, arg1, ..); The flag argument is one of the predefined error conditions found in the enum at the head of the UTL_Error class definition. The arguments to the specific error message routine are defined by the signature of that routine. The arguments to a generic error message routine are always instances of AST_Decl. The running count of errors is accessible as idl_global->err_count(). If the value returned by this operation is non-zero after the IDL input has been parsed, the BE is not invoked. HANDLING OF COMMAND LINE ARGUMENTS ---------------------------------- Defined command line arguments are specified in the document CLI, in this directory. The CFE calls the required BE API entry point BE_prep_arg to process arguments passed within a -Wb flag. REQUIRED ENTRY POINTS SUPPLIED BY A BE -------------------------------------- The following API entry points must be supplied by a BE in order to successfully link with the CFE: extern "C" AST_Generator *BE_init(); Creates an instance of the generator object and returns it. Note that the global scope is not yet set up and the scopes stack is empty when this routine is called. extern "C" void BE_produce(); Called by the compiler main program after the IDL input has been successfully parsed and processed. The job of this routine is to carry out the specific function of the BE. The AST is accessible as the value of idl_global->root(). extern "C" void BE_prep_arg(char *, idl_bool); Called to process an argument passed in with a -Wb flag. The boolean will always be FALSE. extern "C" void BE_abort(); Called when the CFE decides to abort the compilation. Can be used in a BE to clean up after itself, e.g. remove temporary files or directories it created while the parse was in progress. extern "C" void BE_version(); Called when a -V argument is processed. This should produce a message for the user identifying the BE that is loaded and its version information. PART II. WRITING A BACK END -=========================- REQUIRED API THAT EACH BE MUST SUPPORT -------------------------------------- Below are the API entry points that each BE must supply in order to use the CFE framework. This is a repeat of the BE API section: extern "C" AST_Generator *BE_init(); Creates an instance of the generator object and returns it. Note that the scopes stack is still not set up at the time this routine is called. extern "C" void BE_produce(); Called by the compiler main program after the IDL input has been successfully parsed and processed. The job of this routine is to carry out the specific function of the BE. The AST is accessible as the value of idl_global->root(). extern "C" void BE_prep_arg(char *, boolean); Called to process an argument passed in with a -Wb flag. The boolean will always be FALSE. extern "C" void BE_abort(); Called when the CFE decides to abort the compilation. Can be used in a BE to clean up after itself, e.g. remove temporary files or directories it created while the parse was in progress. extern "C" void BE_version(); Called when a -V argument is processed. This should produce a message for the user identifying the BE that is loaded and its version information. WHAT FILES TO INCLUDE --------------------- To use the CFE, each implementation file of your BE must include the following two header files: #include #include Following this, you can include any header files needed by your BE. HOW TO SUBCLASS THE AST ----------------------- Your BE may subclass from any of the classes provided by the AST. Your class should use public virtual inheritance to ensure that only one copy of the class's data members is present in each instance. Read the section on HOW TO WRITE CONSTRUCTORS to learn about additional considerations that you must take into account when writing constructors for your BE classes. HOW TO SUBCLASS THE GENERATOR TO CREATE BE ENHANCED AST NODES ------------------------------------------------------------- Your BE subclasses from classes provided by the AST. To ensure that instances of these classes are constructed when the AST is built, you must also subclass AST_Generator and return an instance of your subclass from the call to BE_init. The AST_Generator class provides operations to create instances of all classes defined in the AST. For example, the operation to create an AST_Attribute node is as follows: AST_Attribute * AST_Generator::create_attribute(...) { return new AST_Attribute(...); } In your BE_Generator subclass of AST_Generator, you will override methods for creation of nodes of all AST classes which you have subclassed. Thus, if your BE has a class BE_Attribute which is a subclass of AST_Attribute, your BE_Generator class definition has to override the create_attribute method to ensure that instances of BE_Attribute are created. The definition of the overriden operations should call the constructor of the derived class and return the new node as an instance of the inherited class. Thus, the implementation of create_attribute is as follows: AST_Attribute * BE_Generator::create_attribute(...) { return new BE_Attribute(...); } The Yacc grammar actions call create_xxx operations on the generator instance stored in the global variable idl_global->gen() field. By storing an instance of your derived generator class BE_Generator you ensure that instances of the BE classes you defined will be created. HOW TO WRITE CONSTRUCTORS FOR BE CLASSES ---------------------------------------- As mentioned above, the AST uses public virtual inheritance to derive the AST class hierarchy. This has two important effects on how you write a BE, specifically how you write constructors for derived BE classes. First, you must define a default constructor for your BE class, since your class may be used as a virtual base class of some other class. In that case the compiler may want to call a default constructor for your class. It is a good idea to have a default constructor anyway, even if you do not plan to subclass your BE class, since for most C++ compilers this causes the code to be smaller. Your default constructor should initialize all constant data members. Additionally, it may initialize any non-constant data member whose value must be set before the first time the instance is used. Second, the constructor for your BE class must explicitly call all constructors of virtual base classes which do some useful work. For example, if a class in the AST from which your BE class inherits, directly or indirectly, has an initializer for a data member, your BE class's constructor must call the AST class's constructor. This is discussed extensively in the C++ ARM. Below is a list showing how to write constructors for subclasses of each class provided by the BE. For each AST class we show a definition of a constructor for a derived class which calls all neccessary constructors on AST classes: AST_Argument: BE_Argument::BE_Argument(AST_Argument::Direction d, AST_Type *ft, UTL_ScopedName *n, UTL_StrList *p) : AST_Argument(d, ft, n, p), AST_Field(AST_Decl::NT_argument, ft, n, p), AST_Decl(AST_Decl::NT_argument, n, p) { } AST_Array: BE_Array::BE_Array(UTL_ScopedName *n, unsigned long nd, UTL_ExprList *ds) : AST_Array(n, nd, ds), AST_Decl(AST_Decl::NT_array, n, NULL) { } AST_Attribute: BE_Attribute::BE_Attribute(boolean ro, AST_Type *ft, UTL_ScopedName *n, UTL_StrList *p) : AST_Attribute(ro, ft, n, p), AST_Field(AST_Decl::NT_attr, ft, n, p), AST_Decl(AST_Decl::NT_attr, n, p) { } AST_ConcreteType: BE_ConcreteType::BE_ConcreteType(AST_Decl::NodeType nt, UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(nt, n, p) { } AST_Constant: BE_Constant::BE_Constant(AST_Expression::ExprType t, AST_Expression *v, UTL_ScopedName *n, UTL_StrList *p) : AST_Constant(t, v, n, p), AST_Decl(AST_Decl::NT_const, n, p) { } AST_Decl: BE_Decl::BE_Decl(AST_Decl::NodeType nt, UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(nt, n, p) { } AST_Enum: BE_Enum::BE_Enum(UTL_ScopedName *n, UTL_StrList *p) : AST_Enum(n, p), AST_Decl(AST_Decl::NT_enum, n, p), UTL_Scope(AST_Decl::NT_enum) { } AST_EnumVal: BE_EnumVal::BE_EnumVal(unsigned long v, UTL_ScopedName *n, UTL_StrList *p) : AST_EnumVal(v, n, p), AST_Constant(AST_Expression::EV_ulong, AST_Decl::NT_enum_val, new AST_Expression(v), n, p), AST_Decl(AST_Decl::NT_enum_val, n, p) { } AST_Exception: BE_Exception::BE_Exception(UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(AST_Decl::NT_except, n, p), AST_Structure(AST_Decl::NT_except, n, p), UTL_Scope(AST_Decl::NT_except) { } AST_Field: BE_Field::BE_Field(AST_Type *ft, UTL_ScopedName *n, UTL_StrList *p) : AST_Field(ft, n, p), AST_Decl(AST_Decl::NT_field, n, p) { } AST_Interface: BE_Interface::BE_Interface(UTL_ScopedName *n, AST_Interface **ih, long nih, UTL_StrList *p) : AST_Interface(n, ih, nih, p), AST_Decl(AST_Decl::NT_interface, n, p), UTL_Scope(AST_Decl::NT_interface) { } AST_InterfaceFwd: BE_InterfaceFwd::BE_InterfaceFwd(UTL_ScopedName *n, UTL_StrList *p) : AST_InterfaceFwd(n, p), AST_Decl(AST_Decl::NT_interface_fwd, n, p) { } AST_Module: BE_Module::BE_Module(UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(AST_Decl::NT_module, n, p), UTL_Scope(AST_Decl::NT_module) { } AST_Operation: BE_Operation::BE_Operation(AST_Type *rt, AST_Operation::Flags fl, UTL_ScopedName *n, UTL_StrList *p) : AST_Operation(rt, fl, n, p), AST_Decl(AST_Decl::NT_op, n, p), UTL_Scope(AST_Decl::NT_op) { } AST_PredefinedType: BE_PredefinedType::BE_PredefinedType( AST_PredefinedType::PredefinedType *pt, UTL_ScopedName *n, UTL_StrList *p) : AST_PredefinedType(pt, n, p), AST_Decl(AST_Decl::NT_pre_defined, n, p) { } AST_Root: BE_Root::BE_Root(UTL_ScopedName *n, UTL_StrList *p) : AST_Module(n, p), AST_Decl(AST_Decl::NT_module, n, p), UTL_Scope(AST_Decl::NT_module) { } AST_Sequence: BE_Sequence::BE_Sequence(AST_Expression *ms, AST_Type *bt) : AST_Sequence(ms, bt), AST_Decl(AST_Decl::NT_sequence, new UTL_ScopedName(new String("sequence"), NULL), NULL) { } AST_String: BE_String::BE_String(AST_Expression *ms) : AST_String(ms), AST_Decl(AST_Decl::NT_string, new UTL_ScopedName(new String("string"), NULL), NULL) { } AST_Structure: BE_Structure::BE_Structure(UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(AST_Decl::NT_struct, n, p), UTL_Scope(AST_Decl::NT_struct) { } AST_Type: BE_Type::BE_Type(AST_Decl::NodeType nt, UTL_ScopedName *n, UTL_StrList *p) : AST_Decl(nt, n, p) { } AST_Typedef: BE_Typedef::BE_Typedef(AST_Type *bt, UTL_ScopedName *n, UTL_StrList *p) : AST_Typedef(bt, n, p), AST_Decl(AST_Decl::NT_typedef, n, p) { } AST_Union: BE_Union::BE_Union(AST_ConcreteType *dt, UTL_ScopedName *n, UTL_StrList *p) : AST_Union(dt, n, p), AST_Structure(AST_Decl::NT_union, n, p), AST_Decl(AST_Decl::NT_union, n, p), UTL_Scope(AST_Decl::NT_union) { } AST_UnionBranch: BE_UnionBranch::BE_UnionBranch(AST_UnionLabel *fl, AST_Type *ft, UTL_ScopedName *n, UTL_StrList *p) : AST_UnionBranch(fl, ft, n, p), AST_Field(ft, n, p), AST_Decl(AST_Decl::NT_union_branch, n, p) { } AST_UnionLabel: BE_UnionLabel::BE_UnionLabel(AST_UnionLabel::UnionLabel lk, AST_Expression *lv) : AST_UnionLabel(lk, lv) { } HOW TO USE THE ADD PROTOCOL --------------------------- As explained the section SCOPE MANAGEMENT, the CFE manages scopes by calling type-specific functions to add new nodes to the scope to be augmented. These functions can be overridden in your BE classes to do work specific to your BE class. For example, in a BE_module class, you might override add_interface to do additional work. The protocol defined by the "add_" functions is that they return NULL to indicate failure. They return the node that was added (and which was given as an argument) if the operation succeeded. Your functions in your BE class should follow the same protocol. The "add_" functions defined in the BE must call the overridden function in the base class defind in the CFE in order for the CFE scope management mechanism to work. Otherwise, the CFE does not get an opportunity to augment its scopes with the new node to be added. It is good practice to call the overridden "add_" function as the first action in your BE function, because the success or failure of the CFE operation indicates whether your function should complete its task or abort early. Here is an example. Suppose you have defined a class BE_module which inherits from AST_Module. You may wish to override the add_interface function as follows: class BE_Module : public virtual AST_Module { .... /* * ADD protocol */ virtual AST_Interface *add_interface(AST_Interface *); ... }; The implementation of this function would look something like the following: AST_Interface * BE_Module::add_interface(AST_Interface *new_in) { /* * Check that the CFE operation succeeds. If it returns * NULL, stop any further work */ if (AST_Module::add_interface(new_in) == NULL) return NULL; /* * OK, non-NULL, this means the BE can do its own work here */ ... /* * Finally, don't forget to return the argument to indicate * success */ return new_in; } HOW TO MAINTAIN BE SPECIFIC INFORMATION --------------------------------------- The CFE provides a special class AST_Root, a subclass of AST_Module. An instance of the AST_Root class is used as the distinguished root of the abstract syntax tree built during a parse. Your BE can subclass BE_Root from AST_Root and override the create_root operation in your BE_Generator class derived from AST_Generator. This will cause the CFE to create an instance of your BE_Root class as the root of the tree being constructed. You can use the instance of the BE_Root class as a convenient place to store information specific to an individual tree. For example, you could add operations on the BE_Root class to count how many nodes of each class are created. HOW TO USE MEMBER DATA ---------------------- As explained above, the AST classes provide access and update functions for manipulating data members. Your BE classes must use these functions when they require access to data members defined in the AST classes, since the data members themselves are private. It is good practice to follow the same scheme in your BE classes. Make all data members private. Prepend the names of all such fields with "pd_". Define access functions with names equal to the name of the field without the prefix. Define update functions according to need by prepending the name of the access function with the prefix "set_". Using these techniques will allow your BE to enjoy the same benefits that are imparted onto the CFE. Your BE will be easier to move to a multithreaded environment and its data members will be better protected and hidden. HOW TO BUILD A COMPLETE COMPILER -------------------------------- We now have all information needed to write a BE and to link it in with the CFE, to produce a complete IDL compiler. The following assumes that your BE will be stored in the "be" directory under the "release" directory. See the document ROADMAP for an explanation of the directory structure of the source release. If you decide to use a different directory to store your BE, you may have to modify the CPP_FLAGS in "idl_make_vars" in the top-level directory to allow your BE to find the include files it needs. You will also need to modify several targets in the Makefile in the top-level directory to correctly compile your BE into a library and to correctly link it in with the CFE to produce a complete compiler. You can get started quickly on writing your BE by modifying the sources found in the "demo_be" directory. The Makefile supports all the the targets that are needed to build a complete system and the maintenance target "clean" which assists in keeping the files and directories tidy. The files provided in the "demo_be" directory also provide all the API entry points that are mandated by this document. To build a complete compiler, invoke "make" or "make all" in the top-level directory. This will compile your BE and all the CFE sources, if this is the first invocation. On subsequent invocations this will recompile only the modified files. You will rarely if at all modify the CFE sources, so the overhead of compiling the CFE is incurred only the first time. To build just your BE, you can invoke "make all" or "make" in the "demo_be" directory. You can also, from the top-level directory, invoke "make demo_be/libbe.a". HOW TO OBTAIN ASSISTANCE ------------------------ First, read all the documents provided. If you have unanswered questions, mail them to idl-cfe@sun.com Sun does not promise to support the IDL CFE source release in any manner. However, we will attempt to answer questions and correct problems as time allows. NOTE: SunOS, SunSoft, Sun, Solaris, Sun Microsystems or the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. COPYRIGHT NOTICE ---------------- Copyright 1992, 1993, 1994 Sun Microsystems, Inc. Printed in the United States of America. All Rights Reserved. This product is protected by copyright and distributed under the following license restricting its use. The Interface Definition Language Compiler Front End (CFE) is made available for your use provided that you include this license and copyright notice on all media and documentation and the software program in which this product is incorporated in whole or part. You may copy and extend functionality (but may not remove functionality) of the Interface Definition Language CFE without charge, but you are not authorized to license or distribute it to anyone else except as part of a product or program developed by you or with the express written consent of Sun Microsystems, Inc. ("Sun"). The names of Sun Microsystems, Inc. and any of its subsidiaries or affiliates may not be used in advertising or publicity pertaining to distribution of Interface Definition Language CFE as permitted herein. This license is effective until terminated by Sun for failure to comply with this license. Upon termination, you shall destroy or return all code and documentation for the Interface Definition Language CFE. INTERFACE DEFINITION LANGUAGE CFE IS PROVIDED AS IS WITH NO WARRANTIES OF ANY KIND INCLUDING THE WARRANTIES OF DESIGN, MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR ARISING FROM A COURSE OF DEALING, USAGE OR TRADE PRACTICE. INTERFACE DEFINITION LANGUAGE CFE IS PROVIDED WITH NO SUPPORT AND WITHOUT ANY OBLIGATION ON THE PART OF Sun OR ANY OF ITS SUBSIDIARIES OR AFFILIATES TO ASSIST IN ITS USE, CORRECTION, MODIFICATION OR ENHANCEMENT. SUN OR ANY OF ITS SUBSIDIARIES OR AFFILIATES SHALL HAVE NO LIABILITY WITH RESPECT TO THE INFRINGEMENT OF COPYRIGHTS, TRADE SECRETS OR ANY PATENTS BY INTERFACE DEFINITION LANGUAGE CFE OR ANY PART THEREOF. IN NO EVENT WILL SUN OR ANY OF ITS SUBSIDIARIES OR AFFILIATES BE LIABLE FOR ANY LOST REVENUE OR PROFITS OR OTHER SPECIAL, INDIRECT AND CONSEQUENTIAL DAMAGES, EVEN IF SUN HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and FAR 52.227-19. Sun, Sun Microsystems and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. SunSoft, Inc. 2550 Garcia Avenue Mountain View, California 94043