<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/gitlab/gitlab-ce.git/spec/workers, branch frozen_string_lib_2</title>
<subtitle>gitlab.com: gitlab-org/gitlab-ce.git
</subtitle>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/'/>
<entry>
<title>Merge branch 'dm-process-commit-worker-n+1' into 'master'</title>
<updated>2019-08-16T20:26:38+00:00</updated>
<author>
<name>Stan Hu</name>
<email>stanhu@gmail.com</email>
</author>
<published>2019-08-16T20:26:38+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=971e040c00a14d7fbdfed5f45d6978a2c6b4f4f5'/>
<id>971e040c00a14d7fbdfed5f45d6978a2c6b4f4f5</id>
<content type='text'>
Look up upstream commits once before queuing ProcessCommitWorkers

Closes #65464

See merge request gitlab-org/gitlab-ce!31871</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Look up upstream commits once before queuing ProcessCommitWorkers

Closes #65464

See merge request gitlab-org/gitlab-ce!31871</pre>
</div>
</content>
</entry>
<entry>
<title>Expire project caches once per push instead of once per ref</title>
<updated>2019-08-16T19:53:56+00:00</updated>
<author>
<name>Stan Hu</name>
<email>stanhu@gmail.com</email>
</author>
<published>2019-08-16T19:53:56+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=f14647fdae4a07c3c59665576b70f847ab866c58'/>
<id>f14647fdae4a07c3c59665576b70f847ab866c58</id>
<content type='text'>
Previously `ProjectCacheWorker` would be scheduled once per ref, which
would generate unnecessary I/O and load on Sidekiq, especially if many
tags or branches were pushed at once. `ProjectCacheWorker` would expire
three items:

1. Repository size: This only needs to be updated once per push.
2. Commit count: This only needs to be updated if the default branch
   is updated.
3. Project method caches: This only needs to be updated if the default
   branch changes, but only if certain files change (e.g. README,
   CHANGELOG, etc.).

Because the third item requires looking at the actual changes in the
commit deltas, we schedule one `ProjectCacheWorker` to handle the first
two cases, and schedule a separate `ProjectCacheWorker` for the third
case if it is needed. As a result, this brings down the number of
`ProjectCacheWorker` jobs from N to 2.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/52046
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously `ProjectCacheWorker` would be scheduled once per ref, which
would generate unnecessary I/O and load on Sidekiq, especially if many
tags or branches were pushed at once. `ProjectCacheWorker` would expire
three items:

1. Repository size: This only needs to be updated once per push.
2. Commit count: This only needs to be updated if the default branch
   is updated.
3. Project method caches: This only needs to be updated if the default
   branch changes, but only if certain files change (e.g. README,
   CHANGELOG, etc.).

Because the third item requires looking at the actual changes in the
commit deltas, we schedule one `ProjectCacheWorker` to handle the first
two cases, and schedule a separate `ProjectCacheWorker` for the third
case if it is needed. As a result, this brings down the number of
`ProjectCacheWorker` jobs from N to 2.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/52046
</pre>
</div>
</content>
</entry>
<entry>
<title>Look up upstream commits once before queuing ProcessCommitWorkers</title>
<updated>2019-08-16T18:31:48+00:00</updated>
<author>
<name>Douwe Maan</name>
<email>douwe@selenight.nl</email>
</author>
<published>2019-08-16T18:26:31+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=97c2564ffac057f1830d008269f90afa9e12f815'/>
<id>97c2564ffac057f1830d008269f90afa9e12f815</id>
<content type='text'>
Instead of checking if a commit already exists in the upstream project
in its ProcessCommitWorker and bailing out if it does, we check the
existence of all commits in bulk in Git::BranchHooksService, so that we
can skip scheduling ProcessCommitWorker jobs for those commits
that already exist upstream entirely.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of checking if a commit already exists in the upstream project
in its ProcessCommitWorker and bailing out if it does, we check the
existence of all commits in bulk in Git::BranchHooksService, so that we
can skip scheduling ProcessCommitWorker jobs for those commits
that already exist upstream entirely.
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'sh-optimize-commit-deltas-post-receive' into 'master'</title>
<updated>2019-08-14T14:38:28+00:00</updated>
<author>
<name>Nick Thomas</name>
<email>nick@gitlab.com</email>
</author>
<published>2019-08-14T14:38:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=3cd40c4a807fef5f99c60853ab81f8729405e315'/>
<id>3cd40c4a807fef5f99c60853ab81f8729405e315</id>
<content type='text'>
Reduce Gitaly calls in PostReceive

Closes #65878

See merge request gitlab-org/gitlab-ce!31741</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reduce Gitaly calls in PostReceive

Closes #65878

See merge request gitlab-org/gitlab-ce!31741</pre>
</div>
</content>
</entry>
<entry>
<title>Add usage pings for source code pushes</title>
<updated>2019-08-13T22:33:16+00:00</updated>
<author>
<name>Igor</name>
<email>idrozdov@gitlab.com</email>
</author>
<published>2019-08-13T22:33:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=04598039a53c452df9b1a30404f2abbe46f8e229'/>
<id>04598039a53c452df9b1a30404f2abbe46f8e229</id>
<content type='text'>
Source Code Usage Ping for Create SMAU
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Source Code Usage Ping for Create SMAU
</pre>
</div>
</content>
</entry>
<entry>
<title>Rework retry strategy for remote mirrors</title>
<updated>2019-08-13T20:52:01+00:00</updated>
<author>
<name>Bob Van Landuyt</name>
<email>bob@gitlab.com</email>
</author>
<published>2019-08-13T20:52:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=452bc36d603ed89d3fa5e3409338dd905230bd2f'/>
<id>452bc36d603ed89d3fa5e3409338dd905230bd2f</id>
<content type='text'>
**Prevention of running 2 simultaneous updates**

Instead of using `RemoteMirror#update_status` and raise an error if
it's already running to prevent the same mirror being updated at the
same time we now use `Gitlab::ExclusiveLease` for that.

When we fail to obtain a lease in 3 tries, 30 seconds apart, we bail
and reschedule. We'll reschedule faster for the protected branches.

If the mirror already ran since it was scheduled, the job will be
skipped.

**Error handling: Remote side**

When an update fails because of a `Gitlab::Git::CommandError`, we
won't track this error in sentry, this could be on the remote side:
for example when branches have diverged.

In this case, we'll try 3 times scheduled 1 or 5 minutes apart.

In between, the mirror is marked as "to_retry", the error would be
visible to the user when they visit the settings page.

After 3 tries we'll mark the mirror as failed and notify the user.

We won't track this error in sentry, as it's not likely we can help
it.

The next event that would trigger a new refresh.

**Error handling: our side**

If an unexpected error occurs, we mark the mirror as failed, but we'd
still retry the job based on the regular sidekiq retries with
backoff. Same as we used to

The error would be reported in sentry, since its likely we need to do
something about it.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
**Prevention of running 2 simultaneous updates**

Instead of using `RemoteMirror#update_status` and raise an error if
it's already running to prevent the same mirror being updated at the
same time we now use `Gitlab::ExclusiveLease` for that.

When we fail to obtain a lease in 3 tries, 30 seconds apart, we bail
and reschedule. We'll reschedule faster for the protected branches.

If the mirror already ran since it was scheduled, the job will be
skipped.

**Error handling: Remote side**

When an update fails because of a `Gitlab::Git::CommandError`, we
won't track this error in sentry, this could be on the remote side:
for example when branches have diverged.

In this case, we'll try 3 times scheduled 1 or 5 minutes apart.

In between, the mirror is marked as "to_retry", the error would be
visible to the user when they visit the settings page.

After 3 tries we'll mark the mirror as failed and notify the user.

We won't track this error in sentry, as it's not likely we can help
it.

The next event that would trigger a new refresh.

**Error handling: our side**

If an unexpected error occurs, we mark the mirror as failed, but we'd
still retry the job based on the regular sidekiq retries with
backoff. Same as we used to

The error would be reported in sentry, since its likely we need to do
something about it.
</pre>
</div>
</content>
</entry>
<entry>
<title>Only expire tag cache once per push</title>
<updated>2019-08-13T16:04:30+00:00</updated>
<author>
<name>Stan Hu</name>
<email>stanhu@gmail.com</email>
</author>
<published>2019-08-13T16:04:30+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=e658f9603c99ca6a8ef4c0292b2cdab2916fb09e'/>
<id>e658f9603c99ca6a8ef4c0292b2cdab2916fb09e</id>
<content type='text'>
Previously each tag in a push would invoke the Gitaly `FindAllTags` RPC
since the tag cache would be invalidated with every tag.

We can eliminate those extraneous calls by expiring the tag cache once
in `PostReceive` and taking advantage of the cached tags.

Relates to https://gitlab.com/gitlab-org/gitlab-ce/issues/65795
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously each tag in a push would invoke the Gitaly `FindAllTags` RPC
since the tag cache would be invalidated with every tag.

We can eliminate those extraneous calls by expiring the tag cache once
in `PostReceive` and taking advantage of the cached tags.

Relates to https://gitlab.com/gitlab-org/gitlab-ce/issues/65795
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch '65803-invalidate-branches-cache-on-refresh' into 'master'</title>
<updated>2019-08-13T15:41:25+00:00</updated>
<author>
<name>Bob Van Landuyt</name>
<email>bob@gitlab.com</email>
</author>
<published>2019-08-13T15:41:25+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=3702ab7317533896c7455357dd6643181666f22b'/>
<id>3702ab7317533896c7455357dd6643181666f22b</id>
<content type='text'>
Only expire branch cache once per push

See merge request gitlab-org/gitlab-ce!31653</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only expire branch cache once per push

See merge request gitlab-org/gitlab-ce!31653</pre>
</div>
</content>
</entry>
<entry>
<title>Remove unused `BuildProcessWorker`</title>
<updated>2019-08-13T08:22:13+00:00</updated>
<author>
<name>Kamil Trzciński</name>
<email>ayufan@ayufan.eu</email>
</author>
<published>2019-08-13T08:22:13+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=1438119df4a8407ce0a62ae65a34ee51c3b710ca'/>
<id>1438119df4a8407ce0a62ae65a34ee51c3b710ca</id>
<content type='text'>
We migrated all logic to `PipelineProcessWorker`
and this worker become redundant.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We migrated all logic to `PipelineProcessWorker`
and this worker become redundant.
</pre>
</div>
</content>
</entry>
<entry>
<title>Reduce Gitaly calls in PostReceive</title>
<updated>2019-08-13T05:28:49+00:00</updated>
<author>
<name>Stan Hu</name>
<email>stanhu@gmail.com</email>
</author>
<published>2019-08-12T22:31:58+00:00</published>
<link rel='alternate' type='text/html' href='http://git.baserock.org/cgit/delta/gitlab/gitlab-ce.git/commit/?id=4e2bb4e5e7df1273a4d2fdd370b6c17a27c394d8'/>
<id>4e2bb4e5e7df1273a4d2fdd370b6c17a27c394d8</id>
<content type='text'>
This commit reduces I/O load and memory utilization during PostReceive
for the common case when no project hooks or services are set up.

We saw a Gitaly N+1 issue in `CommitDelta` when many tags or branches
are pushed. We can reduce this overhead in the common case because we
observe that most new projects do not have any Web hooks or services,
especially when they are first created. Previously, `BaseHooksService`
unconditionally iterated through the last 20 commits of each ref to
build the `push_data` structure. The `push_data` structured was used in
numerous places:

1. Building the push payload in `EventCreateService`
2. Creating a CI pipeline
3. Executing project Web or system hooks
4. Executing project services
5. As the return value of `BaseHooksService#execute`
6. `BranchHooksService#invalidated_file_types`

We only need to generate the full `push_data` for items 3, 4, and 6.

Item 1: `EventCreateService` only needs the last commit and doesn't
actually need the commit deltas.

Item 2: In addition, `Ci::CreatePipelineService` only needed a subset of
the parameters.

Item 5: The return value of `BaseHooksService#execute` also wasn't being
used anywhere.

Item 6: This is only used when pushing to the default branch, so if
many tags are pushed we can save significant I/O here.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/65878

Fic
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit reduces I/O load and memory utilization during PostReceive
for the common case when no project hooks or services are set up.

We saw a Gitaly N+1 issue in `CommitDelta` when many tags or branches
are pushed. We can reduce this overhead in the common case because we
observe that most new projects do not have any Web hooks or services,
especially when they are first created. Previously, `BaseHooksService`
unconditionally iterated through the last 20 commits of each ref to
build the `push_data` structure. The `push_data` structured was used in
numerous places:

1. Building the push payload in `EventCreateService`
2. Creating a CI pipeline
3. Executing project Web or system hooks
4. Executing project services
5. As the return value of `BaseHooksService#execute`
6. `BranchHooksService#invalidated_file_types`

We only need to generate the full `push_data` for items 3, 4, and 6.

Item 1: `EventCreateService` only needs the last commit and doesn't
actually need the commit deltas.

Item 2: In addition, `Ci::CreatePipelineService` only needed a subset of
the parameters.

Item 5: The return value of `BaseHooksService#execute` also wasn't being
used anywhere.

Item 6: This is only used when pushing to the default branch, so if
many tags are pushed we can save significant I/O here.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/65878

Fic
</pre>
</div>
</content>
</entry>
</feed>
