Extending <acronym>SQL</acronym>

Extending <acronym>SQL</acronym> extending SQL In the sections that follow, we will discuss how you can extend the PostgreSQL SQL query language by adding: functions (starting in ) aggregates (starting in ) data types (starting in ) operators (starting in ) operator classes for indexes (starting in ) packages of related objects (starting in ) How Extensibility Works PostgreSQL is extensible because its operation is catalog-driven. If you are familiar with standard relational database systems, you know that they store information about databases, tables, columns, etc., in what are commonly known as system catalogs. (Some systems call this the data dictionary.) The catalogs appear to the user as tables like any other, but the DBMS stores its internal bookkeeping in them. One key difference between PostgreSQL and standard relational database systems is that PostgreSQL stores much more information in its catalogs: not only information about tables and columns, but also information about data types, functions, access methods, and so on. These tables can be modified by the user, and since PostgreSQL bases its operation on these tables, this means that PostgreSQL can be extended by users. By comparison, conventional database systems can only be extended by changing hardcoded procedures in the source code or by loading modules specially written by the DBMS vendor. The PostgreSQL server can moreover incorporate user-written code into itself through dynamic loading. That is, the user can specify an object code file (e.g., a shared library) that implements a new type or function, and PostgreSQL will load it as required. Code written in SQL is even more trivial to add to the server. This ability to modify its operation on the fly makes PostgreSQL uniquely suited for rapid prototyping of new applications and storage structures. The <productname>PostgreSQL</productname> Type System base type data type base composite type data type composite PostgreSQL data types are divided into base types, composite types, domains, and pseudo-types. Base Types Base types are those, like int4, that are implemented below the level of the SQL language (typically in a low-level language such as C). They generally correspond to what are often known as abstract data types. PostgreSQL can only operate on such types through functions provided by the user and only understands the behavior of such types to the extent that the user describes them. Base types are further subdivided into scalar and array types. For each scalar type, a corresponding array type is automatically created that can hold variable-size arrays of that scalar type. Composite Types Composite types, or row types, are created whenever the user creates a table. It is also possible to use to define a stand-alone composite type with no associated table. A composite type is simply a list of types with associated field names. A value of a composite type is a row or record of field values. The user can access the component fields from SQL queries. Refer to for more information on composite types. Domains A domain is based on a particular base type and for many purposes is interchangeable with its base type. However, a domain can have constraints that restrict its valid values to a subset of what the underlying base type would allow. Domains can be created using the SQL command . Their creation and use is not discussed in this chapter. Pseudo-Types There are a few pseudo-types for special purposes. Pseudo-types cannot appear as columns of tables or attributes of composite types, but they can be used to declare the argument and result types of functions. This provides a mechanism within the type system to identify special classes of functions. lists the existing pseudo-types. Polymorphic Types polymorphic type polymorphic function type polymorphic function polymorphic Five pseudo-types of special interest are anyelement, anyarray, anynonarray, anyenum, and anyrange, which are collectively called polymorphic types. Any function declared using these types is said to be a polymorphic function. A polymorphic function can operate on many different data types, with the specific data type(s) being determined by the data types actually passed to it in a particular call. Polymorphic arguments and results are tied to each other and are resolved to a specific data type when a query calling a polymorphic function is parsed. Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type. And similarly, positions declared as anyrange must all be the same range type. Furthermore, if there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions. Similarly, if there are positions declared anyrange and others declared anyelement, the actual range type in the anyrange positions must be a range whose subtype is the same type appearing in the anyelement positions. anynonarray is treated exactly the same as anyelement, but adds the additional constraint that the actual type must not be an array type. anyenum is treated exactly the same as anyelement, but adds the additional constraint that the actual type must be an enum type. Thus, when more than one argument position is declared with a polymorphic type, the net effect is that only certain combinations of actual argument types are allowed. For example, a function declared as equal(anyelement, anyelement) will take any two input values, so long as they are of the same data type. When the return value of a function is declared as a polymorphic type, there must be at least one argument position that is also polymorphic, and the actual data type supplied as the argument determines the actual result type for that call. For example, if there were not already an array subscripting mechanism, one could define a function that implements subscripting as subscript(anyarray, integer) returns anyelement. This declaration constrains the actual first argument to be an array type, and allows the parser to infer the correct result type from the actual first argument's type. Another example is that a function declared as f(anyarray) returns anyenum will only accept arrays of enum types. Note that anynonarray and anyenum do not represent separate type variables; they are the same type as anyelement, just with an additional constraint. For example, declaring a function as f(anyelement, anyenum) is equivalent to declaring it as f(anyenum, anyenum): both actual arguments have to be the same enum type. A variadic function (one taking a variable number of arguments, as in ) can be polymorphic: this is accomplished by declaring its last parameter as VARIADIC anyarray. For purposes of argument matching and determining the actual result type, such a function behaves the same as if you had written the appropriate number of anynonarray parameters. &xfunc; &xaggr; &xtypes; &xoper; &xindex; Packaging Related Objects into an Extension extension A useful extension to PostgreSQL typically includes multiple SQL objects; for example, a new data type will require new functions, new operators, and probably new index operator classes. It is helpful to collect all these objects into a single package to simplify database management. PostgreSQL calls such a package an extension. To define an extension, you need at least a script file that contains the SQL commands to create the extension's objects, and a control file that specifies a few basic properties of the extension itself. If the extension includes C code, there will typically also be a shared library file into which the C code has been built. Once you have these files, a simple command loads the objects into your database. The main advantage of using an extension, rather than just running the SQL script to load a bunch of loose objects into your database, is that PostgreSQL will then understand that the objects of the extension go together. You can drop all the objects with a single command (no need to maintain a separate uninstall script). Even more useful, pg_dump knows that it should not dump the individual member objects of the extension — it will just include a CREATE EXTENSION command in dumps, instead. This vastly simplifies migration to a new version of the extension that might contain more or different objects than the old version. Note however that you must have the extension's control, script, and other files available when loading such a dump into a new database. PostgreSQL will not let you drop an individual object contained in an extension, except by dropping the whole extension. Also, while you can change the definition of an extension member object (for example, via CREATE OR REPLACE FUNCTION for a function), bear in mind that the modified definition will not be dumped by pg_dump. Such a change is usually only sensible if you concurrently make the same change in the extension's script file. (But there are special provisions for tables containing configuration data; see below.) The extension mechanism also has provisions for packaging modification scripts that adjust the definitions of the SQL objects contained in an extension. For example, if version 1.1 of an extension adds one function and changes the body of another function compared to 1.0, the extension author can provide an update script that makes just those two changes. The ALTER EXTENSION UPDATE command can then be used to apply these changes and track which version of the extension is actually installed in a given database. The kinds of SQL objects that can be members of an extension are shown in the description of . Notably, objects that are database-cluster-wide, such as databases, roles, and tablespaces, cannot be extension members since an extension is only known within one database. (Although an extension script is not prohibited from creating such objects, if it does so they will not be tracked as part of the extension.) Also notice that while a table can be a member of an extension, its subsidiary objects such as indexes are not directly considered members of the extension. Another important point is that schemas can belong to extensions, but not vice versa: an extension as such has an unqualified name and does not exist within any schema. The extension's member objects, however, will belong to schemas whenever appropriate for their object types. It may or may not be appropriate for an extension to own the schema(s) its member objects are within. Extension Files control file The command relies on a control file for each extension, which must be named the same as the extension with a suffix of .control, and must be placed in the installation's SHAREDIR/extension directory. There must also be at least one SQL script file, which follows the naming pattern extension--version.sql (for example, foo--1.0.sql for version 1.0 of extension foo). By default, the script file(s) are also placed in the SHAREDIR/extension directory; but the control file can specify a different directory for the script file(s). The file format for an extension control file is the same as for the postgresql.conf file, namely a list of parameter_name = value assignments, one per line. Blank lines and comments introduced by # are allowed. Be sure to quote any value that is not a single word or number. A control file can set the following parameters: directory (string) The directory containing the extension's SQL script file(s). Unless an absolute path is given, the name is relative to the installation's SHAREDIR directory. The default behavior is equivalent to specifying directory = 'extension'. default_version (string) The default version of the extension (the one that will be installed if no version is specified in CREATE EXTENSION). Although this can be omitted, that will result in CREATE EXTENSION failing if no VERSION option appears, so you generally don't want to do that. comment (string) A comment (any string) about the extension. Alternatively, the comment can be set by means of the command in the script file. encoding (string) The character set encoding used by the script file(s). This should be specified if the script files contain any non-ASCII characters. Otherwise the files will be assumed to be in the database encoding. module_pathname (string) The value of this parameter will be substituted for each occurrence of MODULE_PATHNAME in the script file(s). If it is not set, no substitution is made. Typically, this is set to $libdir/shared_library_name and then MODULE_PATHNAME is used in CREATE FUNCTION commands for C-language functions, so that the script files do not need to hard-wire the name of the shared library. requires (string) A list of names of extensions that this extension depends on, for example requires = 'foo, bar'. Those extensions must be installed before this one can be installed. superuser (boolean) If this parameter is true (which is the default), only superusers can create the extension or update it to a new version. If it is set to false, just the privileges required to execute the commands in the installation or update script are required. relocatable (boolean) An extension is relocatable if it is possible to move its contained objects into a different schema after initial creation of the extension. The default is false, i.e. the extension is not relocatable. See below for more information. schema (string) This parameter can only be set for non-relocatable extensions. It forces the extension to be loaded into exactly the named schema and not any other. See below for more information. In addition to the primary control file extension.control, an extension can have secondary control files named in the style extension--version.control. If supplied, these must be located in the script file directory. Secondary control files follow the same format as the primary control file. Any parameters set in a secondary control file override the primary control file when installing or updating to that version of the extension. However, the parameters directory and default_version cannot be set in a secondary control file. An extension's SQL script files can contain any SQL commands, except for transaction control commands (BEGIN, COMMIT, etc) and commands that cannot be executed inside a transaction block (such as VACUUM). This is because the script files are implicitly executed within a transaction block. An extension's SQL script files can also contain lines beginning with \echo, which will be ignored (treated as comments) by the extension mechanism. This provision is commonly used to throw an error if the script file is fed to psql rather than being loaded via CREATE EXTENSION (see example script below). Without that, users might accidentally load the extension's contents as loose objects rather than as an extension, a state of affairs that's a bit tedious to recover from. While the script files can contain any characters allowed by the specified encoding, control files should contain only plain ASCII, because there is no way for PostgreSQL to know what encoding a control file is in. In practice this is only an issue if you want to use non-ASCII characters in the extension's comment. Recommended practice in that case is to not use the control file comment parameter, but instead use COMMENT ON EXTENSION within a script file to set the comment. Extension Relocatability Users often wish to load the objects contained in an extension into a different schema than the extension's author had in mind. There are three supported levels of relocatability: A fully relocatable extension can be moved into another schema at any time, even after it's been loaded into a database. This is done with the ALTER EXTENSION SET SCHEMA command, which automatically renames all the member objects into the new schema. Normally, this is only possible if the extension contains no internal assumptions about what schema any of its objects are in. Also, the extension's objects must all be in one schema to begin with (ignoring objects that do not belong to any schema, such as procedural languages). Mark a fully relocatable extension by setting relocatable = true in its control file. An extension might be relocatable during installation but not afterwards. This is typically the case if the extension's script file needs to reference the target schema explicitly, for example in setting search_path properties for SQL functions. For such an extension, set relocatable = false in its control file, and use @extschema@ to refer to the target schema in the script file. All occurrences of this string will be replaced by the actual target schema's name before the script is executed. The user can set the target schema using the SCHEMA option of CREATE EXTENSION. If the extension does not support relocation at all, set relocatable = false in its control file, and also set schema to the name of the intended target schema. This will prevent use of the SCHEMA option of CREATE EXTENSION, unless it specifies the same schema named in the control file. This choice is typically necessary if the extension contains internal assumptions about schema names that can't be replaced by uses of @extschema@. The @extschema@ substitution mechanism is available in this case too, although it is of limited use since the schema name is determined by the control file. In all cases, the script file will be executed with initially set to point to the target schema; that is, CREATE EXTENSION does the equivalent of this: SET LOCAL search_path TO @extschema@; This allows the objects created by the script file to go into the target schema. The script file can change search_path if it wishes, but that is generally undesirable. search_path is restored to its previous setting upon completion of CREATE EXTENSION. The target schema is determined by the schema parameter in the control file if that is given, otherwise by the SCHEMA option of CREATE EXTENSION if that is given, otherwise the current default object creation schema (the first one in the caller's search_path). When the control file schema parameter is used, the target schema will be created if it doesn't already exist, but in the other two cases it must already exist. If any prerequisite extensions are listed in requires in the control file, their target schemas are appended to the initial setting of search_path. This allows their objects to be visible to the new extension's script file. Although a non-relocatable extension can contain objects spread across multiple schemas, it is usually desirable to place all the objects meant for external use into a single schema, which is considered the extension's target schema. Such an arrangement works conveniently with the default setting of search_path during creation of dependent extensions. Extension Configuration Tables Some extensions include configuration tables, which contain data that might be added or changed by the user after installation of the extension. Ordinarily, if a table is part of an extension, neither the table's definition nor its content will be dumped by pg_dump. But that behavior is undesirable for a configuration table; any data changes made by the user need to be included in dumps, or the extension will behave differently after a dump and reload. pg_extension_config_dump To solve this problem, an extension's script file can mark a table it has created as a configuration table, which will cause pg_dump to include the table's contents (not its definition) in dumps. To do that, call the function pg_extension_config_dump(regclass, text) after creating the table, for example CREATE TABLE my_config (key text, value text); SELECT pg_catalog.pg_extension_config_dump('my_config', ''); Any number of tables can be marked this way. When the second argument of pg_extension_config_dump is an empty string, the entire contents of the table are dumped by pg_dump. This is usually only correct if the table is initially empty as created by the extension script. If there is a mixture of initial data and user-provided data in the table, the second argument of pg_extension_config_dump provides a WHERE condition that selects the data to be dumped. For example, you might do CREATE TABLE my_config (key text, value text, standard_entry boolean); SELECT pg_catalog.pg_extension_config_dump('my_config', 'WHERE NOT standard_entry'); and then make sure that standard_entry is true only in the rows created by the extension's script. More complicated situations, such as initially-provided rows that might be modified by users, can be handled by creating triggers on the configuration table to ensure that modified rows are marked correctly. You can alter the filter condition associated with a configuration table by calling pg_extension_config_dump again. (This would typically be useful in an extension update script.) The only way to mark a table as no longer a configuration table is to dissociate it from the extension with ALTER EXTENSION ... DROP TABLE. Extension Updates One advantage of the extension mechanism is that it provides convenient ways to manage updates to the SQL commands that define an extension's objects. This is done by associating a version name or number with each released version of the extension's installation script. In addition, if you want users to be able to update their databases dynamically from one version to the next, you should provide update scripts that make the necessary changes to go from one version to the next. Update scripts have names following the pattern extension--oldversion--newversion.sql (for example, foo--1.0--1.1.sql contains the commands to modify version 1.0 of extension foo into version 1.1). Given that a suitable update script is available, the command ALTER EXTENSION UPDATE will update an installed extension to the specified new version. The update script is run in the same environment that CREATE EXTENSION provides for installation scripts: in particular, search_path is set up in the same way, and any new objects created by the script are automatically added to the extension. If an extension has secondary control files, the control parameters that are used for an update script are those associated with the script's target (new) version. The update mechanism can be used to solve an important special case: converting a loose collection of objects into an extension. Before the extension mechanism was added to PostgreSQL (in 9.1), many people wrote extension modules that simply created assorted unpackaged objects. Given an existing database containing such objects, how can we convert the objects into a properly packaged extension? Dropping them and then doing a plain CREATE EXTENSION is one way, but it's not desirable if the objects have dependencies (for example, if there are table columns of a data type created by the extension). The way to fix this situation is to create an empty extension, then use ALTER EXTENSION ADD to attach each pre-existing object to the extension, then finally create any new objects that are in the current extension version but were not in the unpackaged release. CREATE EXTENSION supports this case with its FROM old_version option, which causes it to not run the normal installation script for the target version, but instead the update script named extension--old_version--target_version.sql. The choice of the dummy version name to use as old_version is up to the extension author, though unpackaged is a common convention. If you have multiple prior versions you need to be able to update into extension style, use multiple dummy version names to identify them. ALTER EXTENSION is able to execute sequences of update script files to achieve a requested update. For example, if only foo--1.0--1.1.sql and foo--1.1--2.0.sql are available, ALTER EXTENSION will apply them in sequence if an update to version 2.0 is requested when 1.0 is currently installed. PostgreSQL doesn't assume anything about the properties of version names: for example, it does not know whether 1.1 follows 1.0. It just matches up the available version names and follows the path that requires applying the fewest update scripts. (A version name can actually be any string that doesn't contain -- or leading or trailing -.) Sometimes it is useful to provide downgrade scripts, for example foo--1.1--1.0.sql to allow reverting the changes associated with version 1.1. If you do that, be careful of the possibility that a downgrade script might unexpectedly get applied because it yields a shorter path. The risky case is where there is a fast path update script that jumps ahead several versions as well as a downgrade script to the fast path's start point. It might take fewer steps to apply the downgrade and then the fast path than to move ahead one version at a time. If the downgrade script drops any irreplaceable objects, this will yield undesirable results. To check for unexpected update paths, use this command: SELECT * FROM pg_extension_update_paths('extension_name'); This shows each pair of distinct known version names for the specified extension, together with the update path sequence that would be taken to get from the source version to the target version, or NULL if there is no available update path. The path is shown in textual form with -- separators. You can use regexp_split_to_array(path,'--') if you prefer an array format. Extension Example Here is a complete example of an SQL-only extension, a two-element composite type that can store any type of value in its slots, which are named k and v. Non-text values are automatically coerced to text for storage. The script file pair--1.0.sql looks like this: (LEFTARG = text, RIGHTARG = anyelement, PROCEDURE = pair); CREATE OPERATOR ~> (LEFTARG = anyelement, RIGHTARG = text, PROCEDURE = pair); CREATE OPERATOR ~> (LEFTARG = anyelement, RIGHTARG = anyelement, PROCEDURE = pair); CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, PROCEDURE = pair); ]]> The control file pair.control looks like this: # pair extension comment = 'A key/value pair data type' default_version = '1.0' relocatable = true While you hardly need a makefile to install these two files into the correct directory, you could use a Makefile containing this: EXTENSION = pair DATA = pair--1.0.sql PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) This makefile relies on PGXS, which is described in . The command make install will install the control and script files into the correct directory as reported by pg_config. Once the files are installed, use the command to load the objects into any particular database. Extension Building Infrastructure pgxs If you are thinking about distributing your PostgreSQL extension modules, setting up a portable build system for them can be fairly difficult. Therefore the PostgreSQL installation provides a build infrastructure for extensions, called PGXS, so that simple extension modules can be built simply against an already installed server. PGXS is mainly intended for extensions that include C code, although it can be used for pure-SQL extensions too. Note that PGXS is not intended to be a universal build system framework that can be used to build any software interfacing to PostgreSQL; it simply automates common build rules for simple server extension modules. For more complicated packages, you might need to write your own build system. To use the PGXS infrastructure for your extension, you must write a simple makefile. In the makefile, you need to set some variables and include the global PGXS makefile. Here is an example that builds an extension module named isbn_issn, consisting of a shared library containing some C code, an extension control file, a SQL script, and a documentation text file: MODULES = isbn_issn EXTENSION = isbn_issn DATA = isbn_issn--1.0.sql DOCS = README.isbn_issn PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) The last three lines should always be the same. Earlier in the file, you assign variables or add custom make rules. Set one of these three variables to specify what is built: MODULES list of shared-library objects to be built from source files with same stem (do not include library suffixes in this list) MODULE_big a shared library to build from multiple source files (list object files in OBJS) PROGRAM an executable program to build (list object files in OBJS) The following variables can also be set: EXTENSION extension name(s); for each name you must provide an extension.control file, which will be installed into prefix/share/extension MODULEDIR subdirectory of prefix/share into which DATA and DOCS files should be installed (if not set, default is extension if EXTENSION is set, or contrib if not) DATA random files to install into prefix/share/$MODULEDIR DATA_built random files to install into prefix/share/$MODULEDIR, which need to be built first DATA_TSEARCH random files to install under prefix/share/tsearch_data DOCS random files to install under prefix/doc/$MODULEDIR SCRIPTS script files (not binaries) to install into prefix/bin SCRIPTS_built script files (not binaries) to install into prefix/bin, which need to be built first REGRESS list of regression test cases (without suffix), see below REGRESS_OPTS additional switches to pass to pg_regress EXTRA_CLEAN extra files to remove in make clean PG_CPPFLAGS will be added to CPPFLAGS PG_LIBS will be added to PROGRAM link line SHLIB_LINK will be added to MODULE_big link line PG_CONFIG path to pg_config program for the PostgreSQL installation to build against (typically just pg_config to use the first one in your PATH) Put this makefile as Makefile in the directory which holds your extension. Then you can do make to compile, and then make install to install your module. By default, the extension is compiled and installed for the PostgreSQL installation that corresponds to the first pg_config program found in your PATH. You can use a different installation by setting PG_CONFIG to point to its pg_config program, either within the makefile or on the make command line. You can also run make in a directory outside the source tree of your extension, if you want to keep the build directory separate. This procedure is also called a VPATHVPATH build. Here's how: mkdir build_dir cd build_dir make -f /path/to/extension/source/tree/Makefile make -f /path/to/extension/source/tree/Makefile install Alternatively, you can set up a directory for a VPATH build in a similar way to how it is done for the core code. One way to do this is using the core script config/prep_buildtree. Once this has been done you can build by setting the make variable USE_VPATH like this: make USE_VPATH=/path/to/extension/source/tree make USE_VPATH=/path/to/extension/source/tree install This procedure can work with a greater variety of directory layouts. The scripts listed in the REGRESS variable are used for regression testing of your module, which can be invoked by make installcheck after doing make install. For this to work you must have a running PostgreSQL server. The script files listed in REGRESS must appear in a subdirectory named sql/ in your extension's directory. These files must have extension .sql, which must not be included in the REGRESS list in the makefile. For each test there should also be a file containing the expected output in a subdirectory named expected/, with the same stem and extension .out. make installcheck executes each test script with psql, and compares the resulting output to the matching expected file. Any differences will be written to the file regression.diffs in diff -c format. Note that trying to run a test that is missing its expected file will be reported as trouble, so make sure you have all expected files. The easiest way to create the expected files is to create empty files, then do a test run (which will of course report differences). Inspect the actual result files found in the results/ directory, then copy them to expected/ if they match what you expect from the test.