diff options
-rw-r--r-- | doc/development/migration_style_guide.md | 133 | ||||
-rw-r--r-- | doc/development/what_requires_downtime.md | 237 | ||||
-rw-r--r-- | doc/update/README.md | 17 | ||||
-rw-r--r-- | lib/gitlab/database/migration_helpers.rb | 242 | ||||
-rw-r--r-- | lib/gitlab/database/multi_threaded_migration.rb | 52 | ||||
-rw-r--r-- | spec/lib/gitlab/database/migration_helpers_spec.rb | 388 | ||||
-rw-r--r-- | spec/lib/gitlab/database/multi_threaded_migration_spec.rb | 41 | ||||
-rw-r--r-- | spec/models/concerns/ignorable_column_spec.rb | 38 |
8 files changed, 1071 insertions, 77 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md index 587922d0136..3e8b709c18f 100644 --- a/doc/development/migration_style_guide.md +++ b/doc/development/migration_style_guide.md @@ -4,28 +4,53 @@ When writing migrations for GitLab, you have to take into account that these will be ran by hundreds of thousands of organizations of all sizes, some with many years of data in their database. -In addition, having to take a server offline for a an upgrade small or big is -a big burden for most organizations. For this reason it is important that your -migrations are written carefully, can be applied online and adhere to the style guide below. +In addition, having to take a server offline for a a upgrade small or big is a +big burden for most organizations. For this reason it is important that your +migrations are written carefully, can be applied online and adhere to the style +guide below. -Migrations should not require GitLab installations to be taken offline unless -_absolutely_ necessary - see the ["What Requires Downtime?"](what_requires_downtime.md) -page. If a migration requires downtime, this should be clearly mentioned during -the review process, as well as being documented in the monthly release post. For -more information, see the "Downtime Tagging" section below. +Migrations are **not** allowed to require GitLab installations to be taken +offline unless _absolutely necessary_. Downtime assumptions should be based on +the behaviour of a migration when performed using PostgreSQL, as various +operations in MySQL may require downtime without there being alternatives. + +When downtime is necessary the migration has to be approved by: + +1. The VP of Engineering +1. A Backend Lead +1. A Database Specialist + +An up-to-date list of people holding these titles can be found at +<https://about.gitlab.com/team/>. + +The document ["What Requires Downtime?"](what_requires_downtime.md) specifies +various database operations, whether they require downtime and how to +work around that whenever possible. When writing your migrations, also consider that databases might have stale data -or inconsistencies and guard for that. Try to make as little assumptions as possible -about the state of the database. +or inconsistencies and guard for that. Try to make as few assumptions as +possible about the state of the database. + +Please don't depend on GitLab-specific code since it can change in future +versions. If needed copy-paste GitLab code into the migration to make it forward +compatible. + +## Commit Guidelines -Please don't depend on GitLab specific code since it can change in future versions. -If needed copy-paste GitLab code into the migration to make it forward compatible. +Each migration **must** be added in its own commit with a descriptive commit +message. If a commit adds a migration it _should only_ include the migration and +any corresponding changes to `db/schema.rb`. This makes it easy to revert a +database migration without accidentally reverting other changes. ## Downtime Tagging Every migration must specify if it requires downtime or not, and if it should -require downtime it must also specify a reason for this. To do so, add the -following two constants to the migration class' body: +require downtime it must also specify a reason for this. This is required even +if 99% of the migrations won't require downtime as this makes it easier to find +the migrations that _do_ require downtime. + +To tag a migration, add the following two constants to the migration class' +body: * `DOWNTIME`: a boolean that when set to `true` indicates the migration requires downtime. @@ -50,12 +75,53 @@ from a migration class. ## Reversibility -Your migration should be reversible. This is very important, as it should +Your migration **must be** reversible. This is very important, as it should be possible to downgrade in case of a vulnerability or bugs. In your migration, add a comment describing how the reversibility of the migration was tested. +## Multi Threading + +Sometimes a migration might need to use multiple Ruby threads to speed up a +migration. For this to work your migration needs to include the module +`Gitlab::Database::MultiThreadedMigration`: + +```ruby +class MyMigration < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + include Gitlab::Database::MultiThreadedMigration +end +``` + +You can then use the method `with_multiple_threads` to perform work in separate +threads. For example: + +```ruby +class MyMigration < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + include Gitlab::Database::MultiThreadedMigration + + def up + with_multiple_threads(4) do + disable_statement_timeout + + # ... + end + end +end +``` + +Here the call to `disable_statement_timeout` will use the connection local to +the `with_multiple_threads` block, instead of re-using the global connection +pool. This ensures each thread has its own connection object, and won't time +out when trying to obtain one. + +**NOTE:** PostgreSQL has a maximum amount of connections that it allows. This +limit can vary from installation to installation. As a result it's recommended +you do not use more than 32 threads in a single migration. Usually 4-8 threads +should be more than enough. + ## Removing indices When removing an index make sure to use the method `remove_concurrent_index` instead @@ -78,7 +144,10 @@ end ## Adding indices -If you need to add an unique index please keep in mind there is possibility of existing duplicates. If it is possible write a separate migration for handling this situation. It can be just removing or removing with overwriting all references to these duplicates depend on situation. +If you need to add a unique index please keep in mind there is the possibility +of existing duplicates being present in the database. This means that should +always _first_ add a migration that removes any duplicates, before adding the +unique index. When adding an index make sure to use the method `add_concurrent_index` instead of the regular `add_index` method. The `add_concurrent_index` method @@ -90,17 +159,22 @@ so: ```ruby class MyMigration < ActiveRecord::Migration include Gitlab::Database::MigrationHelpers + disable_ddl_transaction! - def change + def up + add_concurrent_index :table, :column + end + def down + remove_index :table, :column if index_exists?(:table, :column) end end ``` ## Adding Columns With Default Values -When adding columns with default values you should use the method +When adding columns with default values you must use the method `add_column_with_default`. This method ensures the table is updated without requiring downtime. This method is not reversible so you must manually define the `up` and `down` methods in your migration class. @@ -123,6 +197,9 @@ class MyMigration < ActiveRecord::Migration end ``` +Keep in mind that this operation can easily take 10-15 minutes to complete on +larger installations (e.g. GitLab.com). As a result you should only add default +values if absolutely necessary. ## Integer column type @@ -147,13 +224,15 @@ add_column(:projects, :foo, :integer, default: 10, limit: 8) ## Testing -Make sure that your migration works with MySQL and PostgreSQL with data. An empty database does not guarantee that your migration is correct. +Make sure that your migration works with MySQL and PostgreSQL with data. An +empty database does not guarantee that your migration is correct. Make sure your migration can be reversed. ## Data migration -Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of using plain SQL you need to quote all input manually with `quote_string` helper. +Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of +using plain SQL you need to quote all input manually with `quote_string` helper. Example with Arel: @@ -177,3 +256,17 @@ select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(i execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})") end ``` + +If you need more complex logic you can define and use models local to a +migration. For example: + +```ruby +class MyMigration < ActiveRecord::Migration + class Project < ActiveRecord::Base + self.table_name = 'projects' + end +end +``` + +When doing so be sure to explicitly set the model's table name so it's not +derived from the class name or namespace. diff --git a/doc/development/what_requires_downtime.md b/doc/development/what_requires_downtime.md index bbcd26477f3..8da6ad684f5 100644 --- a/doc/development/what_requires_downtime.md +++ b/doc/development/what_requires_downtime.md @@ -2,7 +2,8 @@ When working with a database certain operations can be performed without taking GitLab offline, others do require a downtime period. This guide describes -various operations and their impact. +various operations, their impact, and how to perform them without requiring +downtime. ## Adding Columns @@ -41,50 +42,156 @@ information on how to use this method. ## Dropping Columns -On PostgreSQL you can safely remove an existing column without the need for -downtime. When you drop a column in PostgreSQL it's not immediately removed, -instead it is simply disabled. The data is removed on the next vacuum run. +Removing columns is tricky because running GitLab processes may still be using +the columns. To work around this you will need two separate merge requests and +releases: one to ignore and then remove the column, and one to remove the ignore +rule. -On MySQL this operation requires downtime. +### Step 1: Ignoring The Column -While database wise dropping a column may be fine on PostgreSQL this operation -still requires downtime because the application code may still be using the -column that was removed. For example, consider the following migration: +The first step is to ignore the column in the application code. This is +necessary because Rails caches the columns and re-uses this cache in various +places. This can be done by including the `IgnorableColumn` module into the +model, followed by defining the columns to ignore. For example, to ignore +`updated_at` in the User model you'd use the following: ```ruby -class MyMigration < ActiveRecord::Migration - def change - remove_column :projects, :dummy - end +class User < ActiveRecord::Base + include IgnorableColumn + + ignore_column :updated_at end ``` -Now imagine that the GitLab instance is running and actively uses the `dummy` -column. If we were to run the migration this would result in the GitLab instance -producing errors whenever it tries to use the `dummy` column. +Once added you should create a _post-deployment_ migration that removes the +column. Both these changes should be submitted in the same merge request. -As a result of the above downtime _is_ required when removing a column, even -when using PostgreSQL. +### Step 2: Removing The Ignore Rule + +Once the changes from step 1 have been released & deployed you can set up a +separate merge request that removes the ignore rule. This merge request can +simply remove the `ignore_column` line, and the `include IgnorableColumn` line +if no other `ignore_column` calls remain. ## Renaming Columns -Renaming columns requires downtime as running GitLab instances will continue -using the old column name until a new version is deployed. This can result -in the instance producing errors, which in turn can impact the user experience. +Renaming columns the normal way requires downtime as an application may continue +using the old column name during/after a database migration. To rename a column +without requiring downtime we need two migrations: a regular migration, and a +post-deployment migration. Both these migration can go in the same release. -## Changing Column Constraints +### Step 1: Add The Regular Migration + +First we need to create the regular migration. This migration should use +`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the +renaming. For example + +```ruby +# A regular migration in db/migrate +class RenameUsersUpdatedAtToUpdatedAtTimestamp < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + + disable_ddl_transaction! + + def up + rename_column_concurrently :users, :updated_at, :updated_at_timestamp + end + + def down + cleanup_concurrent_column_rename :users, :updated_at_timestamp, :updated_at + end +end +``` + +This will take care of renaming the column, ensuring data stays in sync, copying +over indexes and foreign keys, etc. + +**NOTE:** if a column contains 1 or more indexes that do not contain the name of +the original column, the above procedure will fail. In this case you will first +need to rename these indexes. -Generally changing column constraints requires checking all rows in the table to -see if they meet the new constraint, unless a constraint is _removed_. For -example, changing a column that previously allowed NULL values to not allow NULL -values requires the database to verify all existing rows. +### Step 2: Add A Post-Deployment Migration -The specific behaviour varies a bit between databases but in general the safest -approach is to assume changing constraints requires downtime. +The renaming procedure requires some cleaning up in a post-deployment migration. +We can perform this cleanup using +`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`: + +```ruby +# A post-deployment migration in db/post_migrate +class CleanupUsersUpdatedAtRename < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + + disable_ddl_transaction! + + def up + cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp + end + + def down + rename_column_concurrently :users, :updated_at_timestamp, :updated_at + end +end +``` + +## Changing Column Constraints + +Adding or removing a NOT NULL clause (or another constraint) can typically be +done without requiring downtime. However, this does require that any application +changes are deployed _first_. Thus, changing the constraints of a column should +happen in a post-deployment migration. ## Changing Column Types -This operation requires downtime. +Changing the type of a column can be done using +`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This +method works similarly to `rename_column_concurrently`. For example, let's say +we want to change the type of `users.username` from `string` to `text`. + +### Step 1: Create A Regular Migration + +A regular migration is used to create a new column with a temporary name along +with setting up some triggers to keep data in sync. Such a migration would look +as follows: + +```ruby +# A regular migration in db/migrate +class ChangeUsersUsernameStringToText < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + + disable_ddl_transaction! + + def up + change_column_type_concurrently :users, :username, :text + end + + def down + cleanup_concurrent_column_type_change :users, :username + end +end +``` + +### Step 2: Create A Post Deployment Migration + +Next we need to clean up our changes using a post-deployment migration: + +```ruby +# A post-deployment migration in db/post_migrate +class ChangeUsersUsernameStringToTextCleanup < ActiveRecord::Migration + include Gitlab::Database::MigrationHelpers + + disable_ddl_transaction! + + def up + cleanup_concurrent_column_type_change :users + end + + def down + change_column_type_concurrently :users, :username, :string + end +end +``` + +And that's it, we're done! ## Adding Indexes @@ -101,12 +208,19 @@ Migrations can take advantage of this by using the method ```ruby class MyMigration < ActiveRecord::Migration - def change + def up add_concurrent_index :projects, :column_name end + + def down + remove_index(:projects, :column_name) if index_exists?(:projects, :column_name) + end end ``` +Note that `add_concurrent_index` can not be reversed automatically, thus you +need to manually define `up` and `down`. + When running this on PostgreSQL the `CONCURRENTLY` option mentioned above is used. On MySQL this method produces a regular `CREATE INDEX` query. @@ -125,43 +239,54 @@ This operation is safe as there's no code using the table just yet. ## Dropping Tables -This operation requires downtime as application code may still be using the -table. +Dropping tables can be done safely using a post-deployment migration, but only +if the application no longer uses the table. ## Adding Foreign Keys -Adding foreign keys acquires an exclusive lock on both the source and target -tables in PostgreSQL. This requires downtime as otherwise the entire application -grinds to a halt for the duration of the operation. +Adding foreign keys usually works in 3 steps: + +1. Start a transaction +1. Run `ALTER TABLE` to add the constraint(s) +1. Check all existing data -On MySQL this operation also requires downtime _unless_ foreign key checks are -disabled. Because this means checks aren't enforced this is not ideal, as such -one should assume MySQL also requires downtime. +Because `ALTER TABLE` typically acquires an exclusive lock until the end of a +transaction this means this approach would require downtime. + +GitLab allows you to work around this by using +`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method +ensures that when PostgreSQL is used no downtime is needed. ## Removing Foreign Keys -This operation should not require downtime on both PostgreSQL and MySQL. +This operation does not require downtime. -## Updating Data +## Data Migrations -Updating data should generally be safe. The exception to this is data that's -being migrated from one version to another while the application still produces -data in the old version. +Data migrations can be tricky. The usual approach to migrate data is to take a 3 +step approach: -For example, imagine the application writes the string `'dog'` to a column but -it really is meant to write `'cat'` instead. One might think that the following -migration is all that is needed to solve this problem: +1. Migrate the initial batch of data +1. Deploy the application code +1. Migrate any remaining data -```ruby -class MyMigration < ActiveRecord::Migration - def up - execute("UPDATE some_table SET column = 'cat' WHERE column = 'dog';") - end -end -``` +Usually this works, but not always. For example, if a field's format is to be +changed from JSON to something else we have a bit of a problem. If we were to +change existing data before deploying application code we'll most likely run +into errors. On the other hand, if we were to migrate after deploying the +application code we could run into the same problems. + +If you merely need to correct some invalid data, then a post-deployment +migration is usually enough. If you need to change the format of data (e.g. from +JSON to something else) it's typically best to add a new column for the new data +format, and have the application use that. In such a case the procedure would +be: -Unfortunately this is not enough. Because the application is still running and -using the old value this may result in the table still containing rows where -`column` is set to `dog`, even after the migration finished. +1. Add a new column in the new format +1. Copy over existing data to this new column +1. Deploy the application code +1. In a post-deployment migration, copy over any remaining data -In these cases downtime _is_ required, even for rarely updated tables. +In general there is no one-size-fits-all solution, therefore it's best to +discuss these kind of migrations in a merge request to make sure they are +implemented in the best way possible. diff --git a/doc/update/README.md b/doc/update/README.md index 837b31abb97..7921d03d611 100644 --- a/doc/update/README.md +++ b/doc/update/README.md @@ -48,6 +48,23 @@ GitLab provides official Docker images for both Community and Enterprise editions. They are based on the Omnibus package and instructions on how to update them are in [a separate document][omnidocker]. +## Upgrading without downtime + +Starting with GitLab 9.1.0 it's possible to upgrade to a newer version of GitLab +without having to take your GitLab instance offline. However, for this to work +there are the following requirements: + +1. You can only upgrade 1 release at a time. For example, if 9.1.15 is the last + release of 9.1 then you can safely upgrade from that version to 9.2.0. + However, if you are running 9.1.14 you first need to upgrade to 9.1.15. +2. You have to use [post-deployment + migrations](../development/post_deployment_migrations.md). +3. You are using PostgreSQL. If you are using MySQL you will still need downtime + when upgrading. + +This applies to major, minor, and patch releases unless stated otherwise in a +release post. + ## Upgrading between editions GitLab comes in two flavors: [Community Edition][ce] which is MIT licensed, diff --git a/lib/gitlab/database/migration_helpers.rb b/lib/gitlab/database/migration_helpers.rb index 21f1b32d467..6dabbe0264c 100644 --- a/lib/gitlab/database/migration_helpers.rb +++ b/lib/gitlab/database/migration_helpers.rb @@ -89,7 +89,8 @@ module Gitlab ADD CONSTRAINT #{key_name} FOREIGN KEY (#{column}) REFERENCES #{target} (id) - ON DELETE #{on_delete} NOT VALID; + #{on_delete ? "ON DELETE #{on_delete}" : ''} + NOT VALID; EOF # Validate the existing constraint. This can potentially take a very @@ -258,6 +259,245 @@ module Gitlab raise error end end + + # Renames a column without requiring downtime. + # + # Concurrent renames work by using database triggers to ensure both the + # old and new column are in sync. However, this method will _not_ remove + # the triggers or the old column automatically; this needs to be done + # manually in a post-deployment migration. This can be done using the + # method `cleanup_concurrent_column_rename`. + # + # table - The name of the database table containing the column. + # old - The old column name. + # new - The new column name. + # type - The type of the new column. If no type is given the old column's + # type is used. + def rename_column_concurrently(table, old, new, type: nil) + if transaction_open? + raise 'rename_column_concurrently can not be run inside a transaction' + end + + trigger_name = rename_trigger_name(table, old, new) + quoted_table = quote_table_name(table) + quoted_old = quote_column_name(old) + quoted_new = quote_column_name(new) + + if Database.postgresql? + install_rename_triggers_for_postgresql(trigger_name, quoted_table, + quoted_old, quoted_new) + else + install_rename_triggers_for_mysql(trigger_name, quoted_table, + quoted_old, quoted_new) + end + + old_col = column_for(table, old) + new_type = type || old_col.type + + add_column(table, new, new_type, + limit: old_col.limit, + default: old_col.default, + null: old_col.null, + precision: old_col.precision, + scale: old_col.scale) + + update_column_in_batches(table, new, Arel::Table.new(table)[old]) + + copy_indexes(table, old, new) + copy_foreign_keys(table, old, new) + end + + # Changes the type of a column concurrently. + # + # table - The table containing the column. + # column - The name of the column to change. + # new_type - The new column type. + def change_column_type_concurrently(table, column, new_type) + temp_column = "#{column}_for_type_change" + + rename_column_concurrently(table, column, temp_column, type: new_type) + end + + # Performs cleanup of a concurrent type change. + # + # table - The table containing the column. + # column - The name of the column to change. + # new_type - The new column type. + def cleanup_concurrent_column_type_change(table, column) + temp_column = "#{column}_for_type_change" + + transaction do + # This has to be performed in a transaction as otherwise we might have + # inconsistent data. + cleanup_concurrent_column_rename(table, column, temp_column) + rename_column(table, temp_column, column) + end + end + + # Cleans up a concurrent column name. + # + # This method takes care of removing previously installed triggers as well + # as removing the old column. + # + # table - The name of the database table. + # old - The name of the old column. + # new - The name of the new column. + def cleanup_concurrent_column_rename(table, old, new) + trigger_name = rename_trigger_name(table, old, new) + + if Database.postgresql? + remove_rename_triggers_for_postgresql(table, trigger_name) + else + remove_rename_triggers_for_mysql(trigger_name) + end + + remove_column(table, old) + end + + # Performs a concurrent column rename when using PostgreSQL. + def install_rename_triggers_for_postgresql(trigger, table, old, new) + execute <<-EOF.strip_heredoc + CREATE OR REPLACE FUNCTION #{trigger}() + RETURNS trigger AS + $BODY$ + BEGIN + NEW.#{new} := NEW.#{old}; + RETURN NEW; + END; + $BODY$ + LANGUAGE 'plpgsql' + VOLATILE + EOF + + execute <<-EOF.strip_heredoc + CREATE TRIGGER #{trigger} + BEFORE INSERT OR UPDATE + ON #{table} + FOR EACH ROW + EXECUTE PROCEDURE #{trigger}() + EOF + end + + # Installs the triggers necessary to perform a concurrent column rename on + # MySQL. + def install_rename_triggers_for_mysql(trigger, table, old, new) + execute <<-EOF.strip_heredoc + CREATE TRIGGER #{trigger}_insert + BEFORE INSERT + ON #{table} + FOR EACH ROW + SET NEW.#{new} = NEW.#{old} + EOF + + execute <<-EOF.strip_heredoc + CREATE TRIGGER #{trigger}_update + BEFORE UPDATE + ON #{table} + FOR EACH ROW + SET NEW.#{new} = NEW.#{old} + EOF + end + + # Removes the triggers used for renaming a PostgreSQL column concurrently. + def remove_rename_triggers_for_postgresql(table, trigger) + execute("DROP TRIGGER #{trigger} ON #{table}") + execute("DROP FUNCTION #{trigger}()") + end + + # Removes the triggers used for renaming a MySQL column concurrently. + def remove_rename_triggers_for_mysql(trigger) + execute("DROP TRIGGER #{trigger}_insert") + execute("DROP TRIGGER #{trigger}_update") + end + + # Returns the (base) name to use for triggers when renaming columns. + def rename_trigger_name(table, old, new) + 'trigger_' + Digest::SHA256.hexdigest("#{table}_#{old}_#{new}").first(12) + end + + # Returns an Array containing the indexes for the given column + def indexes_for(table, column) + column = column.to_s + + indexes(table).select { |index| index.columns.include?(column) } + end + + # Returns an Array containing the foreign keys for the given column. + def foreign_keys_for(table, column) + column = column.to_s + + foreign_keys(table).select { |fk| fk.column == column } + end + + # Copies all indexes for the old column to a new column. + # + # table - The table containing the columns and indexes. + # old - The old column. + # new - The new column. + def copy_indexes(table, old, new) + old = old.to_s + new = new.to_s + + indexes_for(table, old).each do |index| + new_columns = index.columns.map do |column| + column == old ? new : column + end + + # This is necessary as we can't properly rename indexes such as + # "ci_taggings_idx". + unless index.name.include?(old) + raise "The index #{index.name} can not be copied as it does not "\ + "mention the old column. You have to rename this index manually first." + end + + name = index.name.gsub(old, new) + + options = { + unique: index.unique, + name: name, + length: index.lengths, + order: index.orders + } + + # These options are not supported by MySQL, so we only add them if + # they were previously set. + options[:using] = index.using if index.using + options[:where] = index.where if index.where + + unless index.opclasses.blank? + opclasses = index.opclasses.dup + + # Copy the operator classes for the old column (if any) to the new + # column. + opclasses[new] = opclasses.delete(old) if opclasses[old] + + options[:opclasses] = opclasses + end + + add_concurrent_index(table, new_columns, options) + end + end + + # Copies all foreign keys for the old column to the new column. + # + # table - The table containing the columns and indexes. + # old - The old column. + # new - The new column. + def copy_foreign_keys(table, old, new) + foreign_keys_for(table, old).each do |fk| + add_concurrent_foreign_key(fk.from_table, + fk.to_table, + column: new, + on_delete: fk.on_delete) + end + end + + # Returns the column for the given table and column name. + def column_for(table, name) + name = name.to_s + + columns(table).find { |column| column.name == name } + end end end end diff --git a/lib/gitlab/database/multi_threaded_migration.rb b/lib/gitlab/database/multi_threaded_migration.rb new file mode 100644 index 00000000000..7ae5a4c17c8 --- /dev/null +++ b/lib/gitlab/database/multi_threaded_migration.rb @@ -0,0 +1,52 @@ +module Gitlab + module Database + module MultiThreadedMigration + MULTI_THREAD_AR_CONNECTION = :thread_local_ar_connection + + # This overwrites the default connection method so that every thread can + # use a thread-local connection, while still supporting all of Rails' + # migration methods. + def connection + Thread.current[MULTI_THREAD_AR_CONNECTION] || + ActiveRecord::Base.connection + end + + # Starts a thread-pool for N threads, along with N threads each using a + # single connection. The provided block is yielded from inside each + # thread. + # + # Example: + # + # with_multiple_threads(4) do + # execute('SELECT ...') + # end + # + # thread_count - The number of threads to start. + # + # join - When set to true this method will join the threads, blocking the + # caller until all threads have finished running. + # + # Returns an Array containing the started threads. + def with_multiple_threads(thread_count, join: true) + pool = Gitlab::Database.create_connection_pool(thread_count) + + threads = Array.new(thread_count) do + Thread.new do + pool.with_connection do |connection| + begin + Thread.current[MULTI_THREAD_AR_CONNECTION] = connection + yield + ensure + Thread.current[MULTI_THREAD_AR_CONNECTION] = nil + end + end + end + end + + threads.each(&:join) if join + + threads + end + end + end +end diff --git a/spec/lib/gitlab/database/migration_helpers_spec.rb b/spec/lib/gitlab/database/migration_helpers_spec.rb index bc43bcf0714..a044b871730 100644 --- a/spec/lib/gitlab/database/migration_helpers_spec.rb +++ b/spec/lib/gitlab/database/migration_helpers_spec.rb @@ -338,4 +338,392 @@ describe Gitlab::Database::MigrationHelpers, lib: true do end end end + + describe '#rename_column_concurrently' do + context 'in a transaction' do + it 'raises RuntimeError' do + allow(model).to receive(:transaction_open?).and_return(true) + + expect { model.rename_column_concurrently(:users, :old, :new) }. + to raise_error(RuntimeError) + end + end + + context 'outside a transaction' do + let(:old_column) do + double(:column, + type: :integer, + limit: 8, + default: 0, + null: false, + precision: 5, + scale: 1) + end + + let(:trigger_name) { model.rename_trigger_name(:users, :old, :new) } + + before do + allow(model).to receive(:transaction_open?).and_return(false) + allow(model).to receive(:column_for).and_return(old_column) + + # Since MySQL and PostgreSQL use different quoting styles we'll just + # stub the methods used for this to make testing easier. + allow(model).to receive(:quote_column_name) { |name| name.to_s } + allow(model).to receive(:quote_table_name) { |name| name.to_s } + end + + context 'using MySQL' do + it 'renames a column concurrently' do + allow(Gitlab::Database).to receive(:postgresql?).and_return(false) + + expect(model).to receive(:install_rename_triggers_for_mysql). + with(trigger_name, 'users', 'old', 'new') + + expect(model).to receive(:add_column). + with(:users, :new, :integer, + limit: old_column.limit, + default: old_column.default, + null: old_column.null, + precision: old_column.precision, + scale: old_column.scale) + + expect(model).to receive(:update_column_in_batches) + + expect(model).to receive(:copy_indexes).with(:users, :old, :new) + expect(model).to receive(:copy_foreign_keys).with(:users, :old, :new) + + model.rename_column_concurrently(:users, :old, :new) + end + end + + context 'using PostgreSQL' do + it 'renames a column concurrently' do + allow(Gitlab::Database).to receive(:postgresql?).and_return(true) + + expect(model).to receive(:install_rename_triggers_for_postgresql). + with(trigger_name, 'users', 'old', 'new') + + expect(model).to receive(:add_column). + with(:users, :new, :integer, + limit: old_column.limit, + default: old_column.default, + null: old_column.null, + precision: old_column.precision, + scale: old_column.scale) + + expect(model).to receive(:update_column_in_batches) + + expect(model).to receive(:copy_indexes).with(:users, :old, :new) + expect(model).to receive(:copy_foreign_keys).with(:users, :old, :new) + + model.rename_column_concurrently(:users, :old, :new) + end + end + end + end + + describe '#cleanup_concurrent_column_rename' do + it 'cleans up the renaming procedure for PostgreSQL' do + allow(Gitlab::Database).to receive(:postgresql?).and_return(true) + + expect(model).to receive(:remove_rename_triggers_for_postgresql). + with(:users, /trigger_.{12}/) + + expect(model).to receive(:remove_column).with(:users, :old) + + model.cleanup_concurrent_column_rename(:users, :old, :new) + end + + it 'cleans up the renaming procedure for MySQL' do + allow(Gitlab::Database).to receive(:postgresql?).and_return(false) + + expect(model).to receive(:remove_rename_triggers_for_mysql). + with(/trigger_.{12}/) + + expect(model).to receive(:remove_column).with(:users, :old) + + model.cleanup_concurrent_column_rename(:users, :old, :new) + end + end + + describe '#change_column_type_concurrently' do + it 'changes the column type' do + expect(model).to receive(:rename_column_concurrently). + with('users', 'username', 'username_for_type_change', type: :text) + + model.change_column_type_concurrently('users', 'username', :text) + end + end + + describe '#cleanup_concurrent_column_type_change' do + it 'cleans up the type changing procedure' do + expect(model).to receive(:cleanup_concurrent_column_rename). + with('users', 'username', 'username_for_type_change') + + expect(model).to receive(:rename_column). + with('users', 'username_for_type_change', 'username') + + model.cleanup_concurrent_column_type_change('users', 'username') + end + end + + describe '#install_rename_triggers_for_postgresql' do + it 'installs the triggers for PostgreSQL' do + expect(model).to receive(:execute). + with(/CREATE OR REPLACE FUNCTION foo()/m) + + expect(model).to receive(:execute). + with(/CREATE TRIGGER foo/m) + + model.install_rename_triggers_for_postgresql('foo', :users, :old, :new) + end + end + + describe '#install_rename_triggers_for_mysql' do + it 'installs the triggers for MySQL' do + expect(model).to receive(:execute). + with(/CREATE TRIGGER foo_insert.+ON users/m) + + expect(model).to receive(:execute). + with(/CREATE TRIGGER foo_update.+ON users/m) + + model.install_rename_triggers_for_mysql('foo', :users, :old, :new) + end + end + + describe '#remove_rename_triggers_for_postgresql' do + it 'removes the function and trigger' do + expect(model).to receive(:execute).with('DROP TRIGGER foo ON bar') + expect(model).to receive(:execute).with('DROP FUNCTION foo()') + + model.remove_rename_triggers_for_postgresql('bar', 'foo') + end + end + + describe '#remove_rename_triggers_for_mysql' do + it 'removes the triggers' do + expect(model).to receive(:execute).with('DROP TRIGGER foo_insert') + expect(model).to receive(:execute).with('DROP TRIGGER foo_update') + + model.remove_rename_triggers_for_mysql('foo') + end + end + + describe '#rename_trigger_name' do + it 'returns a String' do + expect(model.rename_trigger_name(:users, :foo, :bar)). + to match(/trigger_.{12}/) + end + end + + describe '#indexes_for' do + it 'returns the indexes for a column' do + idx1 = double(:idx, columns: %w(project_id)) + idx2 = double(:idx, columns: %w(user_id)) + + allow(model).to receive(:indexes).with('table').and_return([idx1, idx2]) + + expect(model.indexes_for('table', :user_id)).to eq([idx2]) + end + end + + describe '#foreign_keys_for' do + it 'returns the foreign keys for a column' do + fk1 = double(:fk, column: 'project_id') + fk2 = double(:fk, column: 'user_id') + + allow(model).to receive(:foreign_keys).with('table').and_return([fk1, fk2]) + + expect(model.foreign_keys_for('table', :user_id)).to eq([fk2]) + end + end + + describe '#copy_indexes' do + context 'using a regular index using a single column' do + it 'copies the index' do + index = double(:index, + columns: %w(project_id), + name: 'index_on_issues_project_id', + using: nil, + where: nil, + opclasses: {}, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect(model).to receive(:add_concurrent_index). + with(:issues, + %w(gl_project_id), + unique: false, + name: 'index_on_issues_gl_project_id', + length: [], + order: []) + + model.copy_indexes(:issues, :project_id, :gl_project_id) + end + end + + context 'using a regular index with multiple columns' do + it 'copies the index' do + index = double(:index, + columns: %w(project_id foobar), + name: 'index_on_issues_project_id_foobar', + using: nil, + where: nil, + opclasses: {}, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect(model).to receive(:add_concurrent_index). + with(:issues, + %w(gl_project_id foobar), + unique: false, + name: 'index_on_issues_gl_project_id_foobar', + length: [], + order: []) + + model.copy_indexes(:issues, :project_id, :gl_project_id) + end + end + + context 'using an index with a WHERE clause' do + it 'copies the index' do + index = double(:index, + columns: %w(project_id), + name: 'index_on_issues_project_id', + using: nil, + where: 'foo', + opclasses: {}, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect(model).to receive(:add_concurrent_index). + with(:issues, + %w(gl_project_id), + unique: false, + name: 'index_on_issues_gl_project_id', + length: [], + order: [], + where: 'foo') + + model.copy_indexes(:issues, :project_id, :gl_project_id) + end + end + + context 'using an index with a USING clause' do + it 'copies the index' do + index = double(:index, + columns: %w(project_id), + name: 'index_on_issues_project_id', + where: nil, + using: 'foo', + opclasses: {}, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect(model).to receive(:add_concurrent_index). + with(:issues, + %w(gl_project_id), + unique: false, + name: 'index_on_issues_gl_project_id', + length: [], + order: [], + using: 'foo') + + model.copy_indexes(:issues, :project_id, :gl_project_id) + end + end + + context 'using an index with custom operator classes' do + it 'copies the index' do + index = double(:index, + columns: %w(project_id), + name: 'index_on_issues_project_id', + using: nil, + where: nil, + opclasses: { 'project_id' => 'bar' }, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect(model).to receive(:add_concurrent_index). + with(:issues, + %w(gl_project_id), + unique: false, + name: 'index_on_issues_gl_project_id', + length: [], + order: [], + opclasses: { 'gl_project_id' => 'bar' }) + + model.copy_indexes(:issues, :project_id, :gl_project_id) + end + end + + describe 'using an index of which the name does not contain the source column' do + it 'raises RuntimeError' do + index = double(:index, + columns: %w(project_id), + name: 'index_foobar_index', + using: nil, + where: nil, + opclasses: {}, + unique: false, + lengths: [], + orders: []) + + allow(model).to receive(:indexes_for).with(:issues, 'project_id'). + and_return([index]) + + expect { model.copy_indexes(:issues, :project_id, :gl_project_id) }. + to raise_error(RuntimeError) + end + end + end + + describe '#copy_foreign_keys' do + it 'copies foreign keys from one column to another' do + fk = double(:fk, + from_table: 'issues', + to_table: 'projects', + on_delete: :cascade) + + allow(model).to receive(:foreign_keys_for).with(:issues, :project_id). + and_return([fk]) + + expect(model).to receive(:add_concurrent_foreign_key). + with('issues', 'projects', column: :gl_project_id, on_delete: :cascade) + + model.copy_foreign_keys(:issues, :project_id, :gl_project_id) + end + end + + describe '#column_for' do + it 'returns a column object for an existing column' do + column = model.column_for(:users, :id) + + expect(column.name).to eq('id') + end + + it 'returns nil when a column does not exist' do + expect(model.column_for(:users, :kittens)).to be_nil + end + end end diff --git a/spec/lib/gitlab/database/multi_threaded_migration_spec.rb b/spec/lib/gitlab/database/multi_threaded_migration_spec.rb new file mode 100644 index 00000000000..6c45f13bb5a --- /dev/null +++ b/spec/lib/gitlab/database/multi_threaded_migration_spec.rb @@ -0,0 +1,41 @@ +require 'spec_helper' + +describe Gitlab::Database::MultiThreadedMigration do + let(:migration) do + Class.new { include Gitlab::Database::MultiThreadedMigration }.new + end + + describe '#connection' do + after do + Thread.current[described_class::MULTI_THREAD_AR_CONNECTION] = nil + end + + it 'returns the thread-local connection if present' do + Thread.current[described_class::MULTI_THREAD_AR_CONNECTION] = 10 + + expect(migration.connection).to eq(10) + end + + it 'returns the global connection if no thread-local connection was set' do + expect(migration.connection).to eq(ActiveRecord::Base.connection) + end + end + + describe '#with_multiple_threads' do + it 'starts multiple threads and yields the supplied block in every thread' do + output = Queue.new + + migration.with_multiple_threads(2) do + output << migration.connection.execute('SELECT 1') + end + + expect(output.size).to eq(2) + end + + it 'joins the threads when the join parameter is set' do + expect_any_instance_of(Thread).to receive(:join).and_call_original + + migration.with_multiple_threads(1) { } + end + end +end diff --git a/spec/models/concerns/ignorable_column_spec.rb b/spec/models/concerns/ignorable_column_spec.rb new file mode 100644 index 00000000000..dba9fe43327 --- /dev/null +++ b/spec/models/concerns/ignorable_column_spec.rb @@ -0,0 +1,38 @@ +require 'spec_helper' + +describe IgnorableColumn do + let :base_class do + Class.new do + def self.columns + # This method does not have access to "double" + [Struct.new(:name).new('id'), Struct.new(:name).new('title')] + end + end + end + + let :model do + Class.new(base_class) do + include IgnorableColumn + end + end + + describe '.columns' do + it 'returns the columns, excluding the ignored ones' do + model.ignore_column(:title) + + expect(model.columns.map(&:name)).to eq(%w(id)) + end + end + + describe '.ignored_columns' do + it 'returns a Set' do + expect(model.ignored_columns).to be_an_instance_of(Set) + end + + it 'returns the names of the ignored columns' do + model.ignore_column(:title) + + expect(model.ignored_columns).to eq(Set.new(%w(title))) + end + end +end |