Updated to OpenStack Build stuff.

author: Monty Taylor <mordred@inaugust.com> 2013-07-11 16:24:50 -0400
committer: Monty Taylor <mordred@inaugust.com> 2013-07-11 16:54:46 -0400
commit: a71799ea2cbfa1354c786991a85b610753f5474c (patch)
tree: 11381f15bdd75736cef5cd50978fe9af9560cf7c /doc/source/historical
parent: 85fe71617f301e2163a97108aaa70de5bfb6b4a5 (diff)
download: sqalchemy-migrate-a71799ea2cbfa1354c786991a85b610753f5474c.tar.gz
8 files changed, 465 insertions, 0 deletions
diff --git a/doc/source/historical/ProjectDesignDecisionsAutomation.trac b/doc/source/historical/ProjectDesignDecisionsAutomation.trac
new file mode 100644
index 0000000..2f1c8b7
--- /dev/null
+++ b/doc/source/historical/ProjectDesignDecisionsAutomation.trac
@@ -0,0 +1,26 @@
+There are many migrations that don't require a lot of thought - for example, if we add a column to a table definition, we probably want to have an "ALTER TABLE...ADD COLUMN" statement show up in our migration.
+
+The difficulty lies in the automation of changes where the requirements aren't obvious. What happens when you add a unique constraint to a column whose data is not already unique? What happens when we split an existing table in two? Completely automating database migrations is not possible.
+
+That said - we shouldn't have to hunt down and handwrite the ALTER TABLE statements for every new column; this is often just tedious. Many other common migration tasks require little serious thought; such tasks are ripe for automation. Any automation attempted, however, should not interfere with our ability to write scripts by hand if we so choose; our tool should ''not'' be centered around automation.
+
+
+Automatically generating the code for this sort of task seems like a good solution:
+ * It does not obstruct us from writing changes by hand; if we don't like the autogenerated code, delete it or don't generate it to begin with
+ * We can easily add other migration tasks to the autogenerated code
+ * We can see right away if the code is what we're expecting, or if it's wrong
+ * If the generated code is wrong, it is easily modified; we can use parts of the generated code, rather than being required to use either 100% or 0%
+ * Maintence, usually a problem with auto-generated code, is not an issue: old database migration scripts are not the subject of maintenance; the correct solution is usually a new migration script.
+
+
+Implementation is a problem: finding the 'diff' of two databases to determine what columns to add is not trivial. Fortunately, there exist tools that claim to do this for us: [http://sqlfairy.sourceforge.net/ SQL::Translator] and [http://xml2ddl.berlios.de/ XML to DDL] both claim to have this capability. 
+
+...
+
+All that said, this is ''not'' something I'm going to attempt during the Summer of Code.
+ * I'd have to rely tremendously on a tool I'm not at all familiar with
+ * Creates a risk of the project itself relying too much on the automation, a Bad Thing
+ * The project has a deadline and I have plenty else to do already
+ * Lots of people with more experience than me say this would take more time than it's worth
+
+It's something that might be considered for future work if this project is successful, though. 
+\ No newline at end of file
diff --git a/doc/source/historical/ProjectDesignDecisionsScriptFormat.trac b/doc/source/historical/ProjectDesignDecisionsScriptFormat.trac
new file mode 100644
index 0000000..b23c2d4
--- /dev/null
+++ b/doc/source/historical/ProjectDesignDecisionsScriptFormat.trac
@@ -0,0 +1,147 @@
+Important to our system is the API used for making database changes. 
+
+=== Raw SQL; .sql script ===
+Require users to write raw SQL. Migration scripts are .sql scripts (with database version information in a header comment). 
+
++ Familiar interface for experienced DBAs. 
+
++ No new API to learn[[br]]
+SQL is used elsewhere; many people know SQL already. Those who are still learning SQL will gain expertise not in the API of a specific tool, but in a language which will help them elsewhere. (On the other hand, those who are familiar with Python with no desire to learn SQL might find a Python API more intuitive.)
+
+- Difficult to extend when necessary[[br]]
+.sql scripts mean that we can't write new functions specific to our migration system when necessary. (We can't always assume that the DBMS supports functions/procedures.)
+
+- Lose the power of Python[[br]]
+Some things are possible in Python that aren't in SQL - for example, suppose we want to use some functions from our application in a migration script. (The user might also simply prefer Python.)
+
+- Loss of database independence.[[br]]
+There isn't much we can do to specify different actions for a particular DBMS besides copying the .sql file, which is obviously bad form. 
+
+=== Raw SQL; Python script ===
+Require users to write raw SQL. Migration scripts are python scripts whose API does little beyond specifying what DBMS(es) a particular statement should apply to. 
+
+For example, 
+{{{
+run("CREATE TABLE test[...]") # runs for all databases
+run("ALTER TABLE test ADD COLUMN varchar2[...]",oracle) # runs for Oracle only
+run("ALTER TABLE test ADD COLUMN varchar[...]",postgres|mysql) # runs for Postgres or MySQL only
+}}}
+
+We could also allow parts of a single statement to apply to a specific DBMS:
+{{{
+run("ALTER TABLE test ADD COLUMN"+sql("varchar",postgres|mysql)+sql("varchar2",oracle))
+}}}
+or, the same thing:
+{{{
+run("ALTER TABLE test ADD COLUMN"+sql("varchar",postgres|mysql,"varchar2",oracle))
+}}}
+
++ Allows the user to write migration scripts for multiple DBMSes.
+
+- The user must manage the conflicts between different databases themselves. [[br]]
+The user can write scripts to deal with conflicts between databases, but they're not really database-independent: the user has to deal with conflicts between databases; our system doesn't help them.
+
++ Minimal new API to learn. [[br]]
+There is a new API to learn, but it is extremely small, depending mostly on SQL DDL. This has the advantages of "no new API" in our first solution. 
+
+- More verbose than .sql scripts. 
+
+=== Raw SQL; automatic translation between each dialect ===
+Same as the above suggestion, but allow the user to specify a 'default' dialect of SQL that we'll interpret and whose quirks we'll deal with. 
+That is, write everything in SQL and try to automatically resolve the conflicts of different DBMSes.
+
+For example, take the following script:
+{{{
+engine=postgres
+
+run("""
+CREATE TABLE test (
+	id serial
+)
+""")
+}}}
+Running this on a Postgres database, surprisingly enough, would generate exactly what we typed:
+{{{
+CREATE TABLE test (
+	id serial
+)
+}}}
+
+Running it on a MySQL database, however, would generate something like
+{{{
+CREATE TABLE test (
+	id integer auto_increment
+)
+}}}
+
++ Database-independence issues of the above SQL solutions are resolved.[[br]] 
+Ideally, this solution would be as database-independent as a Python API for database changes (discussed next), but with all the advantages of writing SQL (no new API).
+
+- Difficult implementation[[br]]
+Obviously, this is not easy to implement - there is a great deal of parsing logic and a great many things that need to be accounted for. In addition, this is a complex operation; any implementation will likely have errors somewhere.
+
+It seems tools for this already exist; an effective tool would trivialize this implementation. I experimented a bit with [http://sqlfairy.sourceforge.net/ SQL::Translator] and [http://xml2ddl.berlios.de/ XML to DDL]; however, I had difficulties with both.
+
+- Database-specific features ensure that this cannot possibly be "complete". [[br]]
+For example, Postgres has an 'interval' type to represent times and (AFAIK) MySQL does not. 
+
+=== Database-independent Python API ===
+Create a Python API through which we may manage database changes. Scripts would be based on the existing SQLAlchemy API when possible.
+
+Scripts would look something like
+{{{
+# Create a table
+test_table = table('test'
+	,Column('id',Integer,notNull=True)
+)
+table.create()
+# Add a column to an existing table
+test_table.add_column('id',Integer,notNull=True)
+# Or, use a column object instead of its parameters
+test_table.add_column(Column('id',Integer,notNull=True))
+# Or, don't use a table object at all
+add_column('test','id',Integer,notNull=True)
+}}}
+This would use engines, similar to SQLAlchemy's, to deal with database-independence issues.
+
+We would, of course, allow users to write raw SQL if they wish. This would be done in the manner outlined in the second solution above; this allows us to write our entire script in SQL and ignore the Python API if we wish, or write parts of our solution in SQL to deal with specific databases. 
+
++ Deals with database-independence thoroughly and with minimal user effort.[[br]]
+SQLAlchemy-style engines would be used for this; issues of different DBMS syntax are resolved with minimal user effort. (Database-specific features would still need handwritten SQL.)
+
++ Familiar interface for SQLAlchemy users.[[br]]
+In addition, we can often cut-and-paste column definitions from SQLAlchemy tables, easing one particular task. 
+
+- Requires that the user learn a new API. [[br]]
+SQL already exists; people know it. SQL newbies might be more comfortable with a Python interface, but folks who already know SQL must learn a whole new API. (On the other hand, the user *can* write things in SQL if they wish, learning only the most minimal of APIs, if they are willing to resolve issues of database-independence themself.)
+
+- More difficult to implement than pure SQL solutions. [[br]]
+SQL already exists/has been tested. A new Python API does not/has not, and much of the work seems to consist of little more than reinventing the wheel.
+
+- Script behavior might change under different versions of the project.[[br]]
+...where .sql scripts behave the same regardless of the project's version. 
+
+=== Generate .sql scripts from a Python API ===
+Attempts to take the best of the first and last solutions. An API similar to the previous solution would be used, but rather than immediately being applied to the database, .sql scripts are generated for each type of database we're interested in. These .sql scripts are what's actually applied to the database.
+
+This would essentially allow users to skip the Python script step entirely if they wished, and write migration scripts in SQL instead, as in solution 1. 
+
++ Database-independence is an option, when needed. 
+
++ A familiar interface/an interface that can interact with other tools is an option, when needed. 
+
++ Easy to inspect the SQL generated by a script, to ensure it's what we're expecting. 
+
++ Migration scripts won't change behavior across different versions of the project. [[br]]
+Once a Python script is translated to a .sql script, its behavior is consistent across different versions of the project, unlike a pure Python solution. 
+
+- Multiple ways to do a single task: not Pythonic.[[br]]
+I never really liked that word - "Pythonic" - but it does apply here. Multiple ways to do a single task has the potential to cause confusion, especially in a large project if many people do the same task different ways. We have to support both ways of doing things, as well. 
+
+----
+
+'''Conclusion''': The last solution, generating .sql scripts from a Python API, seems to be best. 
+
+The first solution (.sql scripts) suffers from a lack of database-independence, but is familiar to experienced database developers, useful with other tools, and shows exactly what will be done to the database. The Python API solution has no trouble with database-independence, but suffers from other problems that the .sql solution doesn't. The last solution resolves both reasonably well. Multiple ways to do a single task might be called "not Pythonic", but IMO, the trade-off is worth this cost. 
+
+Automatic translation between different dialects of SQL might have potential for use in a solution, but existing tools for this aren't reliable enough, as far as I can tell. 
+\ No newline at end of file
diff --git a/doc/source/historical/ProjectDesignDecisionsVersioning.trac b/doc/source/historical/ProjectDesignDecisionsVersioning.trac
new file mode 100644
index 0000000..f06ba18
--- /dev/null
+++ b/doc/source/historical/ProjectDesignDecisionsVersioning.trac
@@ -0,0 +1,56 @@
+An important aspect of this project is database versioning. For migration scripts to be most useful, we need to know what version the database is: that is, has a particular migration script already been run?
+
+An option not discussed below is "no versioning"; that is, simply apply any script we're given, and rely on the user to ensure it's valid. This is entirely too error-prone to seriously consider, and takes a lot of the usefulness out of the proposed tool.
+
+
+=== Database-wide version numbers ===
+A single integer version number would specify the version of each database. This is stored in the database in a table, let's call it "schema"; each migration script is associated with a certain database version number.
+
++ Simple implementation[[br]]
+Of the 3 solutions presented here, this one is by far the simplest. 
+
++ Past success[[br]]
+Used in [http://www.rubyonrails.org/ Ruby on Rails' migrations].
+
+~ Can detect corrupt schemas, but requires some extra work and a *complete* set of migrations.[[br]]
+If we have a set of database migration scripts that build the database from the ground up, we can apply them in sequence to a 'dummy' database, dump a diff of the real and dummy schemas, and expect a valid schema to match the dummy schema.
+
+- Requires changes to the database schema.[[br]]
+Not a tremendous change - a single table with a single column and a single row - but a change nonetheless.
+
+=== Table/object-specific version numbers ===
+Each database "object" - usually tables, though we might also deal with other database objects, such as stored procedures or Postgres' sequences - would have a version associated with it, initially 1. These versions are stored in a table, let's call it "schema". This table has two columns: the name of the database object and its current version number.
+
++ Allows us to write migration scripts for a subset of the database.[[br]]
+If we have multiple people working on a very large database, we may want to write migration scripts for a section of the database without stepping on another person's work. This allows unrelated to
+
+- Requires changes to the database schema.
+Similar to the database-wide version number; the contents of the new table are more complex, but still shouldn't conflict with anything.
+
+- More difficult to implement than a database-wide version number.
+
+- Determining the version of database-specific objects (ie. stored procedures, functions) is difficult.
+
+- Ultimately gains nothing over the previous solution.[[br]]
+The intent here was to allow multiple people to write scripts for a single database, but if database-wide version numbers aren't assigned until the script is placed in the repository, we could already do this. 
+
+=== Version determined via introspection ===
+Each script has a schema associated with it, rather than a version number. The database schema is loaded, analyzed, and compared to the schema expected by the script. 
+
++ No modifications to the database are necessary for this versioning system.[[br]]
+The primary advantage here is that no changes to the database are required. 
+
+- Most difficult solution to implement, by far.[[br]]
+Comparing the state of every schema object in the database is much more complex than simply comparing a version number, especially since we need to do it in a database-independent way (ie. we can't just diff the dump of each schema). SQLAlchemy's reflection would certainly be very helpful, but this remains the most complex solution.
+
++ "Automatically" detects corrupt schemas.[[br]]
+A corrupt schema won't match any migration script.
+
+- Difficult to deal with corrupt schemas.[[br]]
+When version numbers are stored in the database, you have some idea of where an error occurred. Without this, we have no idea what version the database was in before corruption.
+
+- Potential ambiguity: what if two database migration scripts expect the same schema?
+
+----
+
+'''Conclusion''': database-wide version numbers are the best way to go. 
+\ No newline at end of file
diff --git a/doc/source/historical/ProjectDetailedDesign.trac b/doc/source/historical/ProjectDetailedDesign.trac
new file mode 100644
index 0000000..e295251
--- /dev/null
+++ b/doc/source/historical/ProjectDetailedDesign.trac
@@ -0,0 +1,29 @@
+This is very much a draft/brainstorm right now. It should be made prettier and thought about in more detail later, but it at least gives some idea of the direction we're headed right now. 
+----
+ * Two distinct tools; should not be coupled (can work independently):
+   * Versioning tool
+     * Command line tool; let's call it "samigrate"
+     * Organizes old migration scripts into repositories
+     * Runs groups of migration scripts on a database, updating it to a specified version/latest version
+     * Helps run various tests
+     * usage
+       * "samigrate create PATH": Create project migration-script repository
+         * We shouldn't have to enter the path for every other command. Use a hidden file
+         * (This means we can't move the repository after it's created. Oh well)
+       * "samigrate add SCRIPT [VERSION]": Add script to this project's repository; latest version
+         * If a .sql script: how to determine engine, operation (up/down)? Options:
+           * specify at the command line: "samigrate add SCRIPT UP_OR_DOWN ENGINE"
+           * naming convention: SCRIPT is named something like NAME.postgres.up.sql
+       * "samigrate upgrade CONNECTION_STRING [VERSION] [SCRIPT...]": connect to the specified database and upgrade (or downgrade) it to the specified version (default latest)
+         * If SCRIPT... specified: act like these scripts are in the repository (useful for testing?)
+       * "samigrate dump CONNECTION_STRING [VERSION] [SCRIPT...]": like update, but sends all sql to stdout instead of the db
+       * (Later: some more commands, to be used for script testing tools)
+   * Alchemy API extensions for altering schema
+     * Operations here are DB-independent
+     * Each database modification is a script that may use this API
+     * Can handwrite SQL for all databases or a single database
+     * upgrade()/downgrade() functions: need only one file for both operations
+       * sql scripts reqire either (2 files, *.up.sql;*.down.sql) or (don't use downgrade)
+     * usage
+       * "python NAME.py ENGINE up": upgrade sql > stdout
+       * "python NAME.py ENGINE down": downgrade sql > stdout
+\ No newline at end of file
diff --git a/doc/source/historical/ProjectGoals.trac b/doc/source/historical/ProjectGoals.trac
new file mode 100644
index 0000000..5879196
--- /dev/null
+++ b/doc/source/historical/ProjectGoals.trac
@@ -0,0 +1,50 @@
+== Goals ==
+
+=== DBMS-independent schema changes ===
+Many projects need to run on more than one DBMS. Similar changes need to be applied to both types of databases upon a schema change. The usual solution to database changes - .sql scripts with ALTER statements - runs into problems since different DBMSes have different dialects of SQL; we end up having to create a different script for each DBMS. This project will simplify this by providing an API, similar to the table definition API that already exists in SQLAlchemy, to alter a table independent of the DBMS being used, where possible. 
+
+This project will support all DBMSes currently supported by SQLAlchemy: SQLite, Postgres, MySQL, Oracle, and MS SQL. Adding support for more should be as possible as it is in SQLAlchemy.
+
+Many are already used to writing .sql scripts for database changes, aren't interested in learning a new API, and have projects where DBMS-independence isn't an issue. Writing SQL statements as part of a (Python) change script must be an option, of course. Writing change scripts as .sql scripts, eliminating Python scripts from the picture entirely, would be nice too, although this is a lower-priority goal. 
+
+=== Database versioning and change script organization ===
+Once we've accumulated a set of change scripts, it's important to know which ones have been applied/need to be applied to a particular database: suppose we need to upgrade a database that's extremenly out-of-date; figuring out the scripts to run by hand is tedious. Applying changes in the wrong order, or applying changes when they shouldn't be applied, is bad; attempting to manage all of this by hand inevitably leads to an accident. This project will be able to detect the version of a particular database and apply the scripts required to bring it up to the latest version, or up to any specified version number (given all change scripts required to reach that version number). 
+
+Sometimes we need to be able to revert a schema to an older version. There's no automatic way to do this without rebuilding the database from scratch, so our project will allow one to write scripts to downgrade the database as well as upgrade it. If such scripts have been written, we should be able to apply them in the correct order, just like upgrading. 
+
+Large projects inevitably accumulate a large number of database change scripts; it's important that we have a place to keep them. Once a script has been written, this project will deal with organizing it among existing change scripts, and the user will never have to look at it again. 
+
+=== Change testing ===
+It's important to test one's database changes before applying them to a production database (unless you happen to like disasters). Much testing is up to the user and can't be automated, but there's a few places we can help ensure at least a minimal level of schema integrity. A few examples are below; we could add more later. 
+
+Given an obsolete schema, a database change script, and an up-to-date schema known to be correct, this project will be able to ensure that applying the
+change script to the obsolete schema will result in an up-to-date schema - all without actually changing the obsolete database. Folks who have SQLAlchemy create their database using table.create() might find this useful; this is also useful for ensuring database downgrade scripts are correct. 
+
+Given a schema of a known version and a complete set of change scripts up to that version, this project will be able to detect if the schema matches its version. If a schema has gone through changes not present in migration scripts, this test will fail; if applying all scripts in sequence up to the specified version creates an identical schema, this test will succeed. Identifying that a schema is corrupt is sufficient; it would be nice if we could give a clue as to what's wrong, but this is lower priority. (Implementation: we'll probably show a diff of two schema dumps; this should be enough to tell the user what's gone wrong.) 
+
+== Non-Goals ==
+ie. things we will '''not''' try to do (at least, during the Summer of Code)
+
+=== Automatic generation of schema changes ===
+For example, one might define a table: 
+{{{
+CREATE TABLE person (
+   id integer,
+   name varchar(80)
+);
+}}}
+Later, we might add additional columns to the definition:
+{{{
+CREATE TABLE person (
+   id integer,
+   name varchar(80),
+   profile text
+);
+}}}
+It might be nice if a tool could look at both table definitions and spit out a change script; something like
+{{{
+ALTER TABLE person ADD COLUMN profile text;
+}}}
+This is a difficult problem for a number of reasons. I have no intention of tackling this problem as part of the Summer of Code. This project aims to give you a better way to write that ALTER statement and make sure it's applied correctly, not to write it for you.
+
+(Using an [http://sqlfairy.sourceforge.net/ existing] [http://xml2ddl.berlios.de/ tool] to add this sort of thing later might be worth looking into, but it will not be done during the Summer of Code. Among other reasons, methinks it's best to start with a system that isn't dependent on this sort of automation.)
+\ No newline at end of file
diff --git a/doc/source/historical/ProjectProposal.txt b/doc/source/historical/ProjectProposal.txt
new file mode 100644
index 0000000..e270800
--- /dev/null
+++ b/doc/source/historical/ProjectProposal.txt
@@ -0,0 +1,73 @@
+Evan Rosson
+
+Project
+---
+SQLAlchemy Schema Migration
+
+
+Synopsis
+---
+SQLAlchemy is an excellent object-relational database mapper for Python projects. Currently, it does a fine job of creating a database from scratch, but provides no tool to assist the user in modifying an existing database. This project aims to provide such a tool.
+
+
+Benefits
+---
+ Application requirements change; a database schema must be able to change with them. It's possible to write SQL scripts that make the proper modifications without any special tools, but this setup quickly becomes difficult to manage - when we need to apply multiple updates to a database, organize old migration scripts, or have a single application support more than one DBMS, a tool to support database changes becomes necessary. This tool will aid the creation of organizing migration scripts, applying multiple updates or removing updates to revert to an old version, and creating DBMS-independent migration scripts.
+
+ Writing one's schema migration scripts by hand often results in problems when dealing with multiple obsolete database instances - we must figure out what scripts are necessary to bring the database up-to-date. Database versioning tools are helpful for this task; this project will track the version of a particular database to determine what scripts are necessary to update an old schema.
+
+
+Description
+---
+ The migration system used by Ruby on Rails has had much success, and for good reason - the system is easy to understand, generally database-independent, as powerful as the application itself, and capable of dealing nicely with a schema with multiple instances of different versions. A migration system similar to that of Rails is a fine place to begin this project.
+
+ Each instance of the schema will have a version associated with it; this version is tracked using a single table with a single row and a single integer column. A set of changes to the database schema will increment the schema's version number; each migration script will be associated with a schema version.
+
+ A migration script will be written by the user, and consist of two functions:
+- upgrade(): brings an old database up-to-date, from version n-1 to version n
+- downgrade(): reverts an up-to-date database to the previous schema; an 'undo' for upgrade()
+
+ When applying multiple updates to an old schema instance, migration scripts are applied in sequence: when updating a schema to version n from version n-2, two migration scripts are run; n-2 => n-1 => n.
+
+ A command-line tool will create empty migration scripts (empty upgrade()/downgrade() functions), display the SQL that will be generated by a migration script for a particular DBMS, and apply migration scripts to a specified database.
+
+ This project will implement the command-line tool that manages the above functionality. This project will also extend SQLAlchemy with the functions necessary to construct DBMS-independent migration scripts: in particular, column creation/deletion/alteration and the ability to rename existing tables/indexes/columns will be implemented. We'll also need a way to write raw SQL for a specific DBMS/set of DBMSes for situations where our abstraction doesn't fit a script's requirements. The creation/deletion of existing tables and indexes are operations already provided by SQLAlchemy.
+
+
+ On DBMS support - I intend to support MySQL, Postgres, SQLite, Oracle, and MS-SQL by the end of the project. (Update: I previously omitted support for Oracle and MS-SQL because I don't have access to the full version of each; I wasn't aware Oracle Lite and MS-SQL Express were available for free.) The system will be abstracted in such a way that adding support for other databases will not be any more difficult than adding support for them in SQLAlchemy.
+
+
+Schedule
+---
+This project will be my primary activity this summer. Unfortunately, I am in school when things begin, until June 9, but I can still begin the project during that period. I have no other commitments this summer - I can easily make up any lost time.
+I will be spending my spare time this summer further developing my online game (discussed below), but this has no deadline and will not interfere with the project proposed here.
+
+
+I'll begin by familiarizing myself with the internals of SQLAlchemy and creating a detailed plan for the project. This plan will be reviewed by the current SQLAlchemy developers and other potential users, and will be modified based on their feedback. This will be completed no later than May 30, one week after SoC begins.
+
+Development will follow, in this order:
+- The database versioning system. This will manage the creation and application of (initially empty) migration scripts. Complete by June 16.
+ - Access the database; read/update the schema's version number
+ - Apply a single (empty) script to the database
+ - Apply a set of (empty) scripts to upgrade/downgrade the database to a specified version; examine all migration scripts and apply all to update the database to the latest version available
+- An API for table/column alterations, to make the above system useful. Complete by August 11.
+ - Implement an empty API - does nothing at this point, but written in such a way that syntax for each supported DBMS may be added as a module. Completed June 26-30, the mid-project review deadline.
+ - Implement/test the above API for a single DBMS (probably Postgres, as I'm familiar with it). Users should be able to test the 'complete' application with this DBMS.
+ - Implement the database modification API for other supported databases
+
+All development will have unit tests written where appropriate. Unit testing the SQL generated for each DBMS will be particularly important.
+
+The project will finish with various wrap-up activities, documentation, and some final tests, to be completed by the project deadline.
+
+
+About me
+---
+I am a 3rd year BS Computer Science student; Cal Poly, San Luis Obispo, California, USA; currently applying for a Master's degree in CS from the same school. I've taken several classes dealing with databases, though much of what I know on the subject is self-taught. Outside of class, I've developed a browser-based online game, Zeal, at http://zealgame.com ; it has been running for well over a year and gone through many changes. It has taught me firsthand the importance of using appropriate tools and designing one's application well early on (largely through the pain that follows when you don't); I've learned a great many other things from the experience as well.
+
+One recurring problem I've had with this project is dealing with changes to the database schema. I've thought much about how I'd like to see this solved, but hadn't done much to implement it.
+
+I'm now working on another project that will be making use of SQLAlchemy: it fits many of my project's requirements, but lacks a migration tool that will be much needed. This presents an opportunity for me to make my first contribution to open source - I've long been interested in open source software and use it regularly, but haven't contributed to any until now. I'm particularly interested in the application of this tool with the TurboGears framework, as this project was inspired by a suggestion the TurboGears mailing list and I'm working on a project using TurboGears - but there is no reason to couple an SQLAlchemy enhancement with TurboGears; this project may be used by anyone who uses SQLAlchemy.
+
+
+Further information: 
+http://evan.zealgame.com/soc
diff --git a/doc/source/historical/RepositoryFormat.trac b/doc/source/historical/RepositoryFormat.trac
new file mode 100644
index 0000000..57f6bf6
--- /dev/null
+++ b/doc/source/historical/RepositoryFormat.trac
@@ -0,0 +1,56 @@
+This plan has several problems and has been modified; new plan is discussed in wiki:RepositoryFormat2
+
+----
+
+One problem with [http://www.rubyonrails.org/ Ruby on Rails'] (very good) schema migration system is the behavior of scripts that depend on outside sources; ie. the application. If those change, there's no guarantee that such scripts will behave as they did before, and you'll get strange results. 
+
+For example, suppose one defines a SQLAlchemy table:
+{{{
+users = Table('users', metadata, 
+    Column('user_id', Integer, primary_key = True),
+    Column('user_name', String(16), nullable = False),
+    Column('password', String(20), nullable = False)
+)
+}}}
+and creates it in a change script:
+{{{
+from project import table
+
+def upgrade():
+  table.users.create()
+}}}
+
+Suppose we later add a column to this table. We write an appropriate change script:
+{{{
+from project import table
+
+def upgrade():
+  # This syntax isn't set in stone yet
+  table.users.add_column('email_address', String(60), key='email')
+}}}
+...and change our application's table definition:
+{{{
+users = Table('users', metadata, 
+    Column('user_id', Integer, primary_key = True),
+    Column('user_name', String(16), nullable = False),
+    Column('password', String(20), nullable = False),
+    Column('email_address', String(60), key='email')  #new column
+)
+}}}
+
+Modifying the table definition changes how our first script behaves - it will create the table with the new column. This might work if we only apply change scripts to a few database which are always kept up to date (or very close), but we'll run into errors eventually if our migration scripts' behavior isn't consistent.
+
+----
+
+One solution is to generate .sql files from a Python change script at the time it's added to a repository. The sql generated by the script for each database is set in stone at this point; changes to outside files won't affect it.
+
+This limits what change scripts are capable of - we can't write dynamic SQL; ie., we can't do something like this:
+{{{
+for row in db.execute("select id from table1"):
+  db.execute("insert into table2 (table1_id, value) values (:id,42)",**row)
+}}}
+But SQL is usually powerful enough to where the above is rarely necessary in a migration script:
+{{{
+db.execute("insert into table2 select id,42 from table1")
+}}}
+This is a reasonable solution. The limitations aren't serious (everything possible in a traditional .sql script is still possible), and change scripts are much less prone to error.
diff --git a/doc/source/historical/RepositoryFormat2.trac b/doc/source/historical/RepositoryFormat2.trac
new file mode 100644
index 0000000..496478c
--- /dev/null
+++ b/doc/source/historical/RepositoryFormat2.trac
@@ -0,0 +1,28 @@
+My original plan for Migrate's RepositoryFormat had several problems:
+
+ * Bind parameters: We needed to bind parameters into statements to get something suitable for an .sql file. For some types of parameters, there's no clean way to do this without writing an entire parser - too great a cost for this project. There's a reason why SQLAlchemy's logs display the statement and its parameters separately: the binding is done at a lower level than we have access to. 
+ * Failure: Discussed in #17, the old format had no easy way to find the Python statements associated with an SQL error. This makes it difficult to debug scripts.
+
+A new format will be used to solve this problem instead.
+Similar to our previous solution, where one .sql file was created per version/operation/DBMS (version_1.upgrade.postgres.sql, for example), one file will be created per version/operation/DBMS here.
+These files will contain the following information:
+
+ * The dialect used to perform the logging. Particularly,
+   * The paramstyle expected by the dbapi
+   * The DBMS this log applies to
+ * Information on each logged SQL statement, each of which contains:
+   * The text of the statement
+   * Parameters to be bound to the statement
+   * A Python stack trace at the point the statement was logged - this allows us to tell what Python statements are associated with an SQL statement when there's an error
+
+These files will be created by pickling a Python object with the above information. 
+
+Such files may be executed by loading the log and having SQLAlchemy execute them as it might have before. 
+
+Good:
+ * Since the statements and bind parameters are stored separately and executed as SQLAlchemy would normally execute them, one problem discussed above is eliminated.
+ * Storing the stack trace at the point each statement was logged allows us to identify what Python statements are responsible for an SQL error. This makes it much easier for users to debug their scripts.
+
+Bad:
+ * It's less trivial to commit .sql scripts to our repository, since they're no longer used internally. This isn't a huge loss, and .sql commits can still be implemented later if need be.
+ * There's some danger of script behavior changing if changes are made to the dbapi the script is associated with. The primary place where problems would occur is during parameter binding, but the chance of this changing significantly isn't large. The danger of changes in behavior due to changes in the user's application is not affected.
+\ No newline at end of file
author	Monty Taylor <mordred@inaugust.com>	2013-07-11 16:24:50 -0400
committer	Monty Taylor <mordred@inaugust.com>	2013-07-11 16:54:46 -0400
commit	a71799ea2cbfa1354c786991a85b610753f5474c (patch)
tree	11381f15bdd75736cef5cd50978fe9af9560cf7c /doc/source/historical
parent	85fe71617f301e2163a97108aaa70de5bfb6b4a5 (diff)
download	sqalchemy-migrate-a71799ea2cbfa1354c786991a85b610753f5474c.tar.gz