summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorDouwe Maan <douwe@gitlab.com>2017-09-28 17:26:16 +0000
committerDouwe Maan <douwe@gitlab.com>2017-09-28 17:26:16 +0000
commitae03a52f0927f5f0881e3269faba90028e6d808b (patch)
tree56e4500d4e3ae5e67b16367d06898da389c35ba2 /doc
parent0ab2ff72a4968ebf9f6e1eb41b46edbe3b5486ef (diff)
parentf4de14d71f425dc14ee5837d96f4e9f42c7cc239 (diff)
downloadgitlab-ce-ae03a52f0927f5f0881e3269faba90028e6d808b.tar.gz
Merge branch 'hashed-storage-migration-path' into 'master'
Hashed storage migration path Closes gitlab-ee#3118 See merge request gitlab-org/gitlab-ce!14067
Diffstat (limited to 'doc')
-rw-r--r--doc/administration/raketasks/storage.md107
-rw-r--r--doc/administration/repository_storage_types.md69
2 files changed, 176 insertions, 0 deletions
diff --git a/doc/administration/raketasks/storage.md b/doc/administration/raketasks/storage.md
new file mode 100644
index 00000000000..bac8fa4bd9d
--- /dev/null
+++ b/doc/administration/raketasks/storage.md
@@ -0,0 +1,107 @@
+# Repository Storage Rake Tasks
+
+This is a collection of rake tasks you can use to help you list and migrate
+existing projects from Legacy storage to the new Hashed storage type.
+
+You can read more about the storage types [here][storage-types].
+
+## List projects on Legacy storage
+
+To have a simple summary of projects using **Legacy** storage:
+
+**Omnibus Installation**
+
+```bash
+gitlab-rake gitlab:storage:legacy_projects
+```
+
+**Source Installation**
+
+```bash
+rake gitlab:storage:legacy_projects
+
+```
+
+------
+
+To list projects using **Legacy** storage:
+
+**Omnibus Installation**
+
+```bash
+gitlab-rake gitlab:storage:list_legacy_projects
+```
+
+**Source Installation**
+
+```bash
+rake gitlab:storage:list_legacy_projects
+
+```
+
+## List projects on Hashed storage
+
+To have a simple summary of projects using **Hashed** storage:
+
+**Omnibus Installation**
+
+```bash
+gitlab-rake gitlab:storage:hashed_projects
+```
+
+**Source Installation**
+
+```bash
+rake gitlab:storage:hashed_projects
+
+```
+
+------
+
+To list projects using **Hashed** storage:
+
+**Omnibus Installation**
+
+```bash
+gitlab-rake gitlab:storage:list_hashed_projects
+```
+
+**Source Installation**
+
+```bash
+rake gitlab:storage:list_hashed_projects
+
+```
+
+## Migrate existing projects to Hashed storage
+
+Before migrating your existing projects, you should
+[enable hashed storage][storage-migration] for the new projects as well.
+
+This task will schedule all your existing projects to be migrated to the
+**Hashed** storage type:
+
+**Omnibus Installation**
+
+```bash
+gitlab-rake gitlab:storage:migrate_to_hashed
+```
+
+**Source Installation**
+
+```bash
+rake gitlab:storage:migrate_to_hashed
+
+```
+
+You can monitor the progress in the _Admin > Monitoring > Background jobs_ screen.
+There is a specific Queue you can watch to see how long it will take to finish: **project_migrate_hashed_storage**
+
+After it reaches zero, you can confirm every project has been migrated by running the commands above.
+If you find it necessary, you can run this migration script again to schedule missing projects.
+
+Any error or warning will be logged in the sidekiq log file.
+
+
+[storage-types]: ../repository_storage_types.md
+[storage-migration]: ../repository_storage_types.md#how-to-migrate-to-hashed-storage
diff --git a/doc/administration/repository_storage_types.md b/doc/administration/repository_storage_types.md
new file mode 100644
index 00000000000..fa882bbe28a
--- /dev/null
+++ b/doc/administration/repository_storage_types.md
@@ -0,0 +1,69 @@
+# Repository Storage Types
+
+> [Introduced][ce-28283] in GitLab 10.0.
+
+## Legacy Storage
+
+Legacy Storage is the storage behavior prior to version 10.0. For historical reasons, GitLab replicated the same
+mapping structure from the projects URLs:
+
+ * Project's repository: `#{namespace}/#{project_name}.git`
+ * Project's wiki: `#{namespace}/#{project_name}.wiki.git`
+
+This structure made simple to migrate from existing solutions to GitLab and easy for Administrators to find where the
+repository is stored.
+
+On the other hand this has some drawbacks:
+
+Storage location will concentrate huge amount of top-level namespaces. The impact can be reduced by the introduction of [multiple storage paths][storage-paths].
+
+Because Backups are a snapshot of the same URL mapping, if you try to recover a very old backup, you need to verify
+if any project has taken the place of an old removed project sharing the same URL. This means that `mygroup/myproject`
+from your backup may not be the same original project that is today in the same URL.
+
+Any change in the URL will need to be reflected on disk (when groups / users or projects are renamed). This can add a lot
+of load in big installations, and can be even worst if they are using any type of network based filesystem.
+
+Last, for GitLab Geo, this storage type means we have to synchronize the disk state, replicate renames in the correct
+order or we may end-up with wrong repository or missing data temporarily.
+
+## Hashed Storage
+
+Hashed Storage is the new storage behavior we are rolling out with 10.0. It's not enabled by default yet, but we
+encourage everyone to try-it and take the time to fix any script you may have that depends on the old behavior.
+
+Instead of coupling project URL and the folder structure where the repository will be stored on disk, we are coupling
+a hash, based on the project's ID.
+
+This makes the folder structure immutable, and therefore eliminates any requirement to synchronize state from URLs to
+disk structure. This means that renaming a group, user or project will cost only the database transaction, and will take
+effect immediately.
+
+The hash also helps to spread the repositories more evenly on the disk, so the top-level directory will contain less
+folders than the total amount of top-level namespaces.
+
+Hash format is based on hexadecimal representation of SHA256: `SHA256(project.id)`.
+Top-level folder uses first 2 characters, followed by another folder with the next 2 characters. They are both stored in
+a special folder `@hashed`, to co-exist with existing Legacy projects:
+
+```ruby
+# Project's repository:
+"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
+
+# Wiki's repository:
+"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
+```
+
+This new format also makes possible to restore backups with confidence, as when restoring a repository from the backup,
+you will never mistakenly restore a repository in the wrong project (considering the backup is made after the migration).
+
+### How to migrate to Hashed Storage
+
+In GitLab, go to **Admin > Settings**, find the **Repository Storage** section and select
+"_Create new projects using hashed storage paths_".
+
+To migrate your existing projects to the new storage type, check the specific [rake tasks].
+
+[ce-28283]: https://gitlab.com/gitlab-org/gitlab-ce/issues/28283
+[rake tasks]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage
+[storage-paths]: repository_storage_types.md