diff options
author | Yorick Peterse <yorickpeterse@gmail.com> | 2015-10-28 14:43:27 +0100 |
---|---|---|
committer | Yorick Peterse <yorickpeterse@gmail.com> | 2015-10-30 12:00:58 +0100 |
commit | 49c081b9f38e99bbc11e7132d87773749b5b39d5 (patch) | |
tree | 038096eebf861a2e3631667ef0c137a221f1759b /app/models/user.rb | |
parent | 0ea38dc519b86d2bd2e14f1df1baf0fffc043af6 (diff) | |
download | gitlab-ce-49c081b9f38e99bbc11e7132d87773749b5b39d5.tar.gz |
Improve performance of User.find_by_any_email
This query used to rely on a JOIN, effectively producing the following
SQL:
SELECT users.*
FROM users
LEFT OUTER JOIN emails ON emails.user_id = users.id
WHERE (users.email = X OR emails.email = X)
LIMIT 1;
The use of a JOIN means having to scan over all Emails and users, join
them together and then filter out the rows that don't match the criteria
(though this step may be taken into account already when joining).
In the new setup this query instead uses a sub-query, producing the
following SQL:
SELECT *
FROM users
WHERE id IN (select user_id FROM emails WHERE email = X)
OR email = X
LIMIT 1;
This query has the benefit that it:
1. Doesn't have to JOIN any rows
2. Only has to operate on a relatively small set of rows from the
"emails" table.
Since most users will only have a handful of Emails associated
(certainly not hundreds or even thousands) the size of the set returned
by the sub-query is small enough that it should not become problematic.
Performance of the old versus new version can be measured using the
following benchmark:
# Save this in ./bench.rb
require 'benchmark/ips'
email = 'yorick@gitlab.com'
def User.find_by_any_email_old(email)
user_table = arel_table
email_table = Email.arel_table
query = user_table.
project(user_table[Arel.star]).
join(email_table, Arel::Nodes::OuterJoin).
on(user_table[:id].eq(email_table[:user_id])).
where(user_table[:email].eq(email).or(email_table[:email].eq(email)))
find_by_sql(query.to_sql).first
end
Benchmark.ips do |bench|
bench.report 'original' do
User.find_by_any_email_old(email)
end
bench.report 'optimized' do
User.find_by_any_email(email)
end
bench.compare!
end
Running this locally using "bundle exec rails r bench.rb" produces the
following output:
Calculating -------------------------------------
original 1.000 i/100ms
optimized 93.000 i/100ms
-------------------------------------------------
original 11.103 (± 0.0%) i/s - 56.000
optimized 948.713 (± 5.3%) i/s - 4.743k
Comparison:
optimized: 948.7 i/s
original: 11.1 i/s - 85.45x slower
In other words, the new setup is 85x faster compared to the old setup,
at least when running this benchmark locally.
For GitLab.com these improvements result in User.find_by_any_email
taking only ~170 ms to run, instead of around 800 ms. While this is
"only" an improvement of about 4.5 times (instead of 85x) it's still
significantly better than before.
Fixes #3242
Diffstat (limited to 'app/models/user.rb')
-rw-r--r-- | app/models/user.rb | 18 |
1 files changed, 3 insertions, 15 deletions
diff --git a/app/models/user.rb b/app/models/user.rb index c72beacbf0f..924cb543fab 100644 --- a/app/models/user.rb +++ b/app/models/user.rb @@ -235,21 +235,9 @@ class User < ActiveRecord::Base # Find a User by their primary email or any associated secondary email def find_by_any_email(email) - user_table = arel_table - email_table = Email.arel_table - - # Use ARel to build a query: - query = user_table. - # SELECT "users".* FROM "users" - project(user_table[Arel.star]). - # LEFT OUTER JOIN "emails" - join(email_table, Arel::Nodes::OuterJoin). - # ON "users"."id" = "emails"."user_id" - on(user_table[:id].eq(email_table[:user_id])). - # WHERE ("user"."email" = '<email>' OR "emails"."email" = '<email>') - where(user_table[:email].eq(email).or(email_table[:email].eq(email))) - - find_by_sql(query.to_sql).first + User.reorder(nil). + where('id IN (SELECT user_id FROM emails WHERE email = :email) OR email = :email', email: email). + take end def filter(filter_name) |