diff options
author | Sean McGivern <sean@gitlab.com> | 2017-11-17 16:02:10 +0000 |
---|---|---|
committer | Sean McGivern <sean@gitlab.com> | 2017-11-17 16:02:10 +0000 |
commit | 634031222c309c3445909f3b95349d608b7bdf93 (patch) | |
tree | 72630b4a53ecbb91b3f70b7007dc38bbaab968cc | |
parent | a4aeda59a80a4fdcd60a08146c3a936641ec5002 (diff) | |
download | gitlab-ce-restrict-update-column-in-batches-for-large-tables.tar.gz |
Add computed update docs for update_column_in_batchesrestrict-update-column-in-batches-for-large-tables
-rw-r--r-- | doc/development/migration_style_guide.md | 38 | ||||
-rw-r--r-- | lib/gitlab/database/migration_helpers.rb | 9 |
2 files changed, 46 insertions, 1 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md index 9b8ab5da74e..a235dd74909 100644 --- a/doc/development/migration_style_guide.md +++ b/doc/development/migration_style_guide.md @@ -198,7 +198,43 @@ end Keep in mind that this operation can easily take 10-15 minutes to complete on larger installations (e.g. GitLab.com). As a result you should only add default -values if absolutely necessary. +values if absolutely necessary. There is a RuboCop cop that will fail if this +method is used on some tables that are very large on GitLab.com, which would +cause other issues. + +## Updating an existing column + +To update an existing column to a particular value, you can use +`update_column_in_batches` (`add_column_with_default` uses this internally to +fill in the default value). This will split the updates into batches, so we +don't update too many rows at in a single statement. + +This updates the column `foo` in the `projects` table to 10, where `some_column` +is `'hello'`: + +```ruby +update_column_in_batches(:projects, :foo, 10) do |table, query| + query.where(table[:some_column].eq('hello')) +end +``` + +To perform a computed update, the value can be wrapped in `Arel.sql`, so Arel +treats it as an SQL literal. The below example is the same as the one above, but +the value is set to the product of the `bar` and `baz` columns: + +```ruby +update_value = Arel.sql('bar * baz') + +update_column_in_batches(:projects, :foo, update_value) do |table, query| + query.where(table[:some_column].eq('hello')) +end +``` + +Like `add_column_with_default`, there is a RuboCop cop to detect usage of this +on large tables. In the case of `update_column_in_batches`, it may be acceptable +to run on a large table, as long as it is only updating a small subset of the +rows in the table, but do not ignore that without validating on the GitLab.com +staging environment - or asking someone else to do so for you - beforehand. ## Integer column type diff --git a/lib/gitlab/database/migration_helpers.rb b/lib/gitlab/database/migration_helpers.rb index 2c35da8f1aa..c276c3566b4 100644 --- a/lib/gitlab/database/migration_helpers.rb +++ b/lib/gitlab/database/migration_helpers.rb @@ -220,6 +220,15 @@ module Gitlab # column - The name of the column to update. # value - The value for the column. # + # The `value` argument is typically a literal. To perform a computed + # update, an Arel literal can be used instead: + # + # update_value = Arel.sql('bar * baz') + # + # update_column_in_batches(:projects, :foo, update_value) do |table, query| + # query.where(table[:some_column].eq('hello')) + # end + # # Rubocop's Metrics/AbcSize metric is disabled for this method as Rubocop # determines this method to be too complex while there's no way to make it # less "complex" without introducing extra methods (which actually will |