summaryrefslogtreecommitdiff
path: root/doc/administration
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration')
-rw-r--r--doc/administration/geo/disaster_recovery/background_verification.md3
-rw-r--r--doc/administration/geo/disaster_recovery/index.md3
-rw-r--r--doc/administration/geo/replication/faq.md3
-rw-r--r--doc/administration/geo/replication/geo_validation_tests.md3
-rw-r--r--doc/administration/geo/replication/troubleshooting.md59
-rw-r--r--doc/administration/geo/replication/usage.md4
-rw-r--r--doc/administration/geo/replication/version_specific_updates.md8
-rw-r--r--doc/administration/maintenance_mode/index.md3
-rw-r--r--doc/administration/reference_architectures/index.md29
9 files changed, 75 insertions, 40 deletions
diff --git a/doc/administration/geo/disaster_recovery/background_verification.md b/doc/administration/geo/disaster_recovery/background_verification.md
index caa806c92c8..d7db48bb6cf 100644
--- a/doc/administration/geo/disaster_recovery/background_verification.md
+++ b/doc/administration/geo/disaster_recovery/background_verification.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Automatic background verification **(PREMIUM SELF)**
@@ -89,8 +88,6 @@ in sync.
## Repository re-verification
-> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/8550) in GitLab 11.6.
-
Due to bugs or transient infrastructure failures, it is possible for Git
repositories to change unexpectedly without being marked for verification.
Geo constantly reverifies the repositories to ensure the integrity of the
diff --git a/doc/administration/geo/disaster_recovery/index.md b/doc/administration/geo/disaster_recovery/index.md
index bf28eb76ffd..7e7ace4ad01 100644
--- a/doc/administration/geo/disaster_recovery/index.md
+++ b/doc/administration/geo/disaster_recovery/index.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Disaster Recovery (Geo) **(PREMIUM SELF)**
@@ -503,7 +502,7 @@ secondary domain, like changing Git remotes and API URLs.
This command uses the changed `external_url` configuration defined
in `/etc/gitlab/gitlab.rb`.
-1. For GitLab 11.11 through 12.7 only, you may need to update the **primary**
+1. For GitLab 12.0 through 12.7, you may need to update the **primary**
node's name in the database. This bug has been fixed in GitLab 12.8.
To determine if you need to do this, search for the
diff --git a/doc/administration/geo/replication/faq.md b/doc/administration/geo/replication/faq.md
index e613a9b5670..12b3b382bf7 100644
--- a/doc/administration/geo/replication/faq.md
+++ b/doc/administration/geo/replication/faq.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Geo Frequently Asked Questions **(PREMIUM SELF)**
@@ -54,7 +53,7 @@ For more details, see the [supported Geo data types](datatypes.md).
## Can I `git push` to a **secondary** site?
-Yes! Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in GitLab 11.3.
+Pushing directly to a **secondary** site (for both HTTP and SSH, including Git LFS) is supported.
## How long does it take to have a commit replicated to a **secondary** site?
diff --git a/doc/administration/geo/replication/geo_validation_tests.md b/doc/administration/geo/replication/geo_validation_tests.md
index a4c2f156216..ce1bd8a9d3c 100644
--- a/doc/administration/geo/replication/geo_validation_tests.md
+++ b/doc/administration/geo/replication/geo_validation_tests.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Geo validation tests **(PREMIUM SELF)**
@@ -175,7 +174,7 @@ The following are PostgreSQL upgrade validation tests we performed.
[Test and validate PostgreSQL 10.0 upgrade for Geo](https://gitlab.com/gitlab-org/gitlab/-/issues/12092):
- Description: With the 12.0 release, GitLab required an upgrade to PostgreSQL 10.0. We tested
- various upgrade scenarios from GitLab 11.11.5 through to GitLab 12.1.8.
+ various upgrade scenarios up to GitLab 12.1.8.
- Outcome: Multiple issues were found when upgrading and addressed in follow-up issues.
- Follow up issues:
- [`gitlab-ctl` reconfigure fails on Redis node in multi-node Geo setup](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/4706).
diff --git a/doc/administration/geo/replication/troubleshooting.md b/doc/administration/geo/replication/troubleshooting.md
index 673d8388af1..958aa94d8f3 100644
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Troubleshooting Geo **(PREMIUM SELF)**
@@ -288,9 +287,8 @@ errors (indicated by `Database replication working? ... no` in the
### Message: `ERROR: replication slots can only be used if max_replication_slots > 0`?
This means that the `max_replication_slots` PostgreSQL variable needs to
-be set on the **primary** database. In GitLab 9.4, we have made this setting
-default to 1. You may need to increase this value if you have more
-**secondary** nodes.
+be set on the **primary** database. This setting defaults to 1. You may need to
+increase this value if you have more **secondary** nodes.
Be sure to restart PostgreSQL for this to take effect. See the
[PostgreSQL replication setup](../setup/database.md#postgresql-replication) guide for more details.
@@ -676,6 +674,59 @@ promotion.
[previous snippet](#design-repository-failures-on-mirrored-projects-and-project-imports) to
determine the actual replication status of Design repositories.
+### Sync failure message: "Verification failed with: Error during verification: File is not checksummable"
+
+Until GitLab 14.6, certain data types which were missing on the Geo primary site were marked as "synced" on Geo secondary sites. This was because from the perspective of Geo secondary sites, the state matched the primary site and nothing more could be done on secondary sites.
+
+Secondaries would regularly try to sync these files again via the "verification" feature:
+
+- Verification fails since the file doesn't exist.
+- The file is marked "sync failed".
+- Sync is retried.
+- The file is marked "sync succeeded".
+- The file is marked "needs verification".
+- Repeat until the file is available again on the primary site.
+
+This can be confusing to troubleshoot, since the registry entries are moved through a logical loop by various background jobs. Also, `last_sync_failure` and `verification_failure` are empty after "sync succeeded" but before verification is retried.
+
+If you see sync failures repeatedly and alternately increase, while successes decrease and vice versa, this is a problem of missing files on the primary site. You can confirm this by searching `geo.log` on secondary sites for `File is not checksummable` affecting the same files over and over.
+
+After confirming this is the problem, the files on the primary site need to be fixed. Some possible causes:
+
+- An NFS share became unmounted.
+- A disk died or became corrupted.
+- Someone unintentionally deleted a file or directory.
+- Bugs in GitLab application:
+ - A file was moved when it shouldn't have been moved.
+ - A file wasn't moved when it should have been moved.
+ - A wrong path was generated in the code.
+- A non-atomic backup was restored.
+- Services or servers or network infrastructure was interrupted/restarted during use.
+
+The appropriate action sometimes depends on the cause. For example, you can remount an NFS share. Often, a root cause may not be apparent or not useful to discover. If you have regular backups, then it may be expedient to look through them and pull files from there.
+
+In some cases, a file may be determined to be of low value, and so it may be worth deleting the record.
+
+Geo itself is an excellent mitigation for files missing on the primary. If a file disappears on the primary but it was already synced to the secondary, then you can grab the secondary's file. In cases like this, the `File is not checksummable` error will not occur on Geo secondary sites, and only the primary will log this error.
+
+This problem is more likely to show up in Geo secondary sites which were set up long after the original GitLab site. In this case, Geo is only surfacing an existing problem.
+
+This behavior affects only the following data types through GitLab 14.6:
+
+| Data type | From version |
+| ------------------------ | ------------ |
+| Package Registry | 13.10 |
+| Pipeline Artifacts | 13.11 |
+| Terraform State Versions | 13.12 |
+| Infrastructure Registry | 14.0 |
+| External MR diffs | 14.6 |
+| LFS Objects | 14.6 |
+| Pages Deployments | 14.6 |
+| Uploads | 14.6 |
+| CI Job Artifacts | 14.6 |
+
+[Since GitLab 14.7, files which are missing on the primary site are now treated as sync failures](https://gitlab.com/gitlab-org/gitlab/-/issues/348745) in order to make Geo visibly surface data loss risks. The sync/verification loop is therefore short-circuited. `last_sync_failure` is now set to `The file is missing on the Geo primary site`.
+
## Fixing errors during a failover or when promoting a secondary to a primary node
The following are possible errors that might be encountered during failover or
diff --git a/doc/administration/geo/replication/usage.md b/doc/administration/geo/replication/usage.md
index f3c8f6ac759..b1183e56cd0 100644
--- a/doc/administration/geo/replication/usage.md
+++ b/doc/administration/geo/replication/usage.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
<!-- Please update EE::GitLab::GeoGitAccess::GEO_SERVER_DOCS_URL if this file is moved) -->
@@ -11,7 +10,8 @@ type: howto
After you set up the [database replication and configure the Geo nodes](../index.md#setup-instructions), use your closest GitLab site as you would do with the primary one.
-You can push directly to a **secondary** site (for both HTTP, SSH including Git LFS), and the request will be proxied to the primary site instead ([introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in GitLab 11.3).
+You can push directly to a **secondary** site (for both HTTP, SSH including
+Git LFS), and the request will be proxied to the primary site instead.
Example of the output you will see when pushing to a **secondary** site:
diff --git a/doc/administration/geo/replication/version_specific_updates.md b/doc/administration/geo/replication/version_specific_updates.md
index 883e335ff94..d3a132a6666 100644
--- a/doc/administration/geo/replication/version_specific_updates.md
+++ b/doc/administration/geo/replication/version_specific_updates.md
@@ -2,7 +2,6 @@
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
-type: howto
---
# Version-specific update instructions **(PREMIUM SELF)**
@@ -378,10 +377,3 @@ WARNING:
This version is affected by a [bug that results in new LFS objects not being
replicated to Geo secondary nodes](https://gitlab.com/gitlab-org/gitlab/-/issues/32696).
The issue is fixed in GitLab 12.1. Be sure to upgrade to GitLab 12.1 or later.
-
-## Updating to GitLab 11.11
-
-WARNING:
-This version is affected by a [bug that results in new LFS objects not being
-replicated to Geo secondary nodes](https://gitlab.com/gitlab-org/gitlab/-/issues/32696).
-The issue is fixed in GitLab 12.1. Be sure to upgrade to GitLab 12.1 or later.
diff --git a/doc/administration/maintenance_mode/index.md b/doc/administration/maintenance_mode/index.md
index 2d17062e955..50c0f0ecc63 100644
--- a/doc/administration/maintenance_mode/index.md
+++ b/doc/administration/maintenance_mode/index.md
@@ -193,7 +193,8 @@ Replication and verification continues to work but proxied Git pushes to primary
### Secure features
-Features that depend on creating issues or creating or approving Merge Requests, do not work.
+Features that depend on creating issues or creating or approving merge requests,
+do not work.
Exporting a vulnerability list from a Vulnerability Report page does not work.
diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md
index 6bf35ba6e22..81f07f304bc 100644
--- a/doc/administration/reference_architectures/index.md
+++ b/doc/administration/reference_architectures/index.md
@@ -208,19 +208,16 @@ Note the following about the testing process:
- We aim to have a "test smart" approach where architectures tested have a good range that can also apply to others. Testing focuses on 10k Omnibus on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids.
- Testing is done publicly and all results are shared.
-Τhe following table details the testing done against the reference architectures along with the frequency and results.
-
-| Reference Architecture | Tests Run<sup>1</sup> |
-|------------------------|----------------------------------------------------------------------------------------------------------------------|
-| 1k | [Omnibus - Daily (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/1k)<sup>2</sup> |
-| 2k | [Omnibus - Daily (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/2k)<sup>2</sup> |
-| 3k | [Omnibus - Weekly (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/3k)<sup>2</sup> |
-| 5k | [Omnibus - Weekly (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/5k)<sup>2</sup> |
-| 10k | [Omnibus - Daily (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/10k)<sup>2</sup><br/>[Omnibus - Ad-Hoc (GCP, AWS, Azure)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k)<br/><br/>[Cloud Native Hybrid - Ad-Hoc (GCP, AWS)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k-Cloud-Native-Hybrid) |
-| 25k | [Omnibus - Weekly (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/25k)<sup>2</sup><br/>[Omnibus - Ad-Hoc (Azure)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/25k) |
-| 50k | [Omnibus - Weekly (GCP)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/50k)<sup>2</sup><br/>[Omnibus - Ad-Hoc (AWS)](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/50k) |
-
-Note that:
-
-1. The list above is non exhaustive. Additional testing is continuously evaluated and iterated on, and the table is updated regularly.
-1. The Omnibus reference architectures are VM-based only and testing has shown that they perform similarly on equivalently specced hardware regardless of Cloud Provider or if run on premises.
+Τhe following table details the testing done against the reference architectures along with the frequency and results. Note that this list above is non exhaustive. Additional testing is continuously evaluated and iterated on, and the table is updated accordingly.
+
+| Reference<br/>Architecture<br/>Size | Bare-Metal | GCP | AWS | Azure |
+|-----------------------------|------------|-----|-----|-------|
+| 1k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/1k)<sup>1</sup> | - | - |
+| 2k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/2k)<sup>1</sup> | - | - |
+| 3k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/3k)<sup>1</sup> | - | - |
+| 5k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/5k)<sup>1</sup> | - | - |
+| 10k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Daily](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/10k)<sup>1</sup> <br/> [Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k) <br/> [Cloud Native Hybrid - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k-Cloud-Native-Hybrid) | [Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k) <br/> [Cloud Native Hybrid - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k-Cloud-Native-Hybrid) | [Standard - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k) |
+| 25k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/25k)<sup>1</sup> | - | [Standard - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/25k) |
+| 50k | <i>Refer to GCP<sup>1</sup><i/> | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/50k)<sup>1</sup> | [Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/50k) | - |
+
+1. The Standard Reference Architectures are designed to be platform agnostic, with everything being run on VMs via [Omnibus GitLab](https://docs.gitlab.com/omnibus/). While testing occurs primarily on GCP, ad-hoc testing has shown that they perform similarly on equivalently specced hardware on other Cloud Providers or if run on premises (bare-metal).