diff options
author | Douwe Maan <douwe@gitlab.com> | 2017-09-28 17:26:16 +0000 |
---|---|---|
committer | Douwe Maan <douwe@gitlab.com> | 2017-09-28 17:26:16 +0000 |
commit | ae03a52f0927f5f0881e3269faba90028e6d808b (patch) | |
tree | 56e4500d4e3ae5e67b16367d06898da389c35ba2 /doc | |
parent | 0ab2ff72a4968ebf9f6e1eb41b46edbe3b5486ef (diff) | |
parent | f4de14d71f425dc14ee5837d96f4e9f42c7cc239 (diff) | |
download | gitlab-ce-ae03a52f0927f5f0881e3269faba90028e6d808b.tar.gz |
Merge branch 'hashed-storage-migration-path' into 'master'
Hashed storage migration path
Closes gitlab-ee#3118
See merge request gitlab-org/gitlab-ce!14067
Diffstat (limited to 'doc')
-rw-r--r-- | doc/administration/raketasks/storage.md | 107 | ||||
-rw-r--r-- | doc/administration/repository_storage_types.md | 69 |
2 files changed, 176 insertions, 0 deletions
diff --git a/doc/administration/raketasks/storage.md b/doc/administration/raketasks/storage.md new file mode 100644 index 00000000000..bac8fa4bd9d --- /dev/null +++ b/doc/administration/raketasks/storage.md @@ -0,0 +1,107 @@ +# Repository Storage Rake Tasks + +This is a collection of rake tasks you can use to help you list and migrate +existing projects from Legacy storage to the new Hashed storage type. + +You can read more about the storage types [here][storage-types]. + +## List projects on Legacy storage + +To have a simple summary of projects using **Legacy** storage: + +**Omnibus Installation** + +```bash +gitlab-rake gitlab:storage:legacy_projects +``` + +**Source Installation** + +```bash +rake gitlab:storage:legacy_projects + +``` + +------ + +To list projects using **Legacy** storage: + +**Omnibus Installation** + +```bash +gitlab-rake gitlab:storage:list_legacy_projects +``` + +**Source Installation** + +```bash +rake gitlab:storage:list_legacy_projects + +``` + +## List projects on Hashed storage + +To have a simple summary of projects using **Hashed** storage: + +**Omnibus Installation** + +```bash +gitlab-rake gitlab:storage:hashed_projects +``` + +**Source Installation** + +```bash +rake gitlab:storage:hashed_projects + +``` + +------ + +To list projects using **Hashed** storage: + +**Omnibus Installation** + +```bash +gitlab-rake gitlab:storage:list_hashed_projects +``` + +**Source Installation** + +```bash +rake gitlab:storage:list_hashed_projects + +``` + +## Migrate existing projects to Hashed storage + +Before migrating your existing projects, you should +[enable hashed storage][storage-migration] for the new projects as well. + +This task will schedule all your existing projects to be migrated to the +**Hashed** storage type: + +**Omnibus Installation** + +```bash +gitlab-rake gitlab:storage:migrate_to_hashed +``` + +**Source Installation** + +```bash +rake gitlab:storage:migrate_to_hashed + +``` + +You can monitor the progress in the _Admin > Monitoring > Background jobs_ screen. +There is a specific Queue you can watch to see how long it will take to finish: **project_migrate_hashed_storage** + +After it reaches zero, you can confirm every project has been migrated by running the commands above. +If you find it necessary, you can run this migration script again to schedule missing projects. + +Any error or warning will be logged in the sidekiq log file. + + +[storage-types]: ../repository_storage_types.md +[storage-migration]: ../repository_storage_types.md#how-to-migrate-to-hashed-storage diff --git a/doc/administration/repository_storage_types.md b/doc/administration/repository_storage_types.md new file mode 100644 index 00000000000..fa882bbe28a --- /dev/null +++ b/doc/administration/repository_storage_types.md @@ -0,0 +1,69 @@ +# Repository Storage Types + +> [Introduced][ce-28283] in GitLab 10.0. + +## Legacy Storage + +Legacy Storage is the storage behavior prior to version 10.0. For historical reasons, GitLab replicated the same +mapping structure from the projects URLs: + + * Project's repository: `#{namespace}/#{project_name}.git` + * Project's wiki: `#{namespace}/#{project_name}.wiki.git` + +This structure made simple to migrate from existing solutions to GitLab and easy for Administrators to find where the +repository is stored. + +On the other hand this has some drawbacks: + +Storage location will concentrate huge amount of top-level namespaces. The impact can be reduced by the introduction of [multiple storage paths][storage-paths]. + +Because Backups are a snapshot of the same URL mapping, if you try to recover a very old backup, you need to verify +if any project has taken the place of an old removed project sharing the same URL. This means that `mygroup/myproject` +from your backup may not be the same original project that is today in the same URL. + +Any change in the URL will need to be reflected on disk (when groups / users or projects are renamed). This can add a lot +of load in big installations, and can be even worst if they are using any type of network based filesystem. + +Last, for GitLab Geo, this storage type means we have to synchronize the disk state, replicate renames in the correct +order or we may end-up with wrong repository or missing data temporarily. + +## Hashed Storage + +Hashed Storage is the new storage behavior we are rolling out with 10.0. It's not enabled by default yet, but we +encourage everyone to try-it and take the time to fix any script you may have that depends on the old behavior. + +Instead of coupling project URL and the folder structure where the repository will be stored on disk, we are coupling +a hash, based on the project's ID. + +This makes the folder structure immutable, and therefore eliminates any requirement to synchronize state from URLs to +disk structure. This means that renaming a group, user or project will cost only the database transaction, and will take +effect immediately. + +The hash also helps to spread the repositories more evenly on the disk, so the top-level directory will contain less +folders than the total amount of top-level namespaces. + +Hash format is based on hexadecimal representation of SHA256: `SHA256(project.id)`. +Top-level folder uses first 2 characters, followed by another folder with the next 2 characters. They are both stored in +a special folder `@hashed`, to co-exist with existing Legacy projects: + +```ruby +# Project's repository: +"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git" + +# Wiki's repository: +"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git" +``` + +This new format also makes possible to restore backups with confidence, as when restoring a repository from the backup, +you will never mistakenly restore a repository in the wrong project (considering the backup is made after the migration). + +### How to migrate to Hashed Storage + +In GitLab, go to **Admin > Settings**, find the **Repository Storage** section and select +"_Create new projects using hashed storage paths_". + +To migrate your existing projects to the new storage type, check the specific [rake tasks]. + +[ce-28283]: https://gitlab.com/gitlab-org/gitlab-ce/issues/28283 +[rake tasks]: raketasks/storage.md#migrate-existing-projects-to-hashed-storage +[storage-paths]: repository_storage_types.md |