diff options
Diffstat (limited to 'doc/topics/data_seeder.md')
-rw-r--r-- | doc/topics/data_seeder.md | 331 |
1 files changed, 331 insertions, 0 deletions
diff --git a/doc/topics/data_seeder.md b/doc/topics/data_seeder.md new file mode 100644 index 00000000000..19c0e05d8ed --- /dev/null +++ b/doc/topics/data_seeder.md @@ -0,0 +1,331 @@ +--- +stage: Manage +group: Foundations +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +description: Data Seeder test data harness created by the Test Data Working Group https://about.gitlab.com/company/team/structure/working-groups/demo-test-data/ +--- + +# GitLab Data Seeder + +GitLab Data Seeder (GDS) is a test data seeding harness, that can seed test data into a user or group namespace. + +The Data Seeder uses FactoryBot in the backend which makes maintenance extremely easy. When a Model is changed, +FactoryBot will already be reflected to account for the change. + +## Docker Setup + +See [Data Seeder Docker Demo](https://gitlab.com/-/snippets/2390362) + +## GDK Setup + +```shell +$ gdk start db +ok: run: services/postgresql: (pid n) 0s, normally down +ok: run: services/redis: (pid n) 74s, normally down +$ bundle install +Bundle complete! +$ bundle exec rake db:migrate +main: migrated +ci: migrated +``` + +### Run + +The `ee:gitlab:seed:data_seeder` Rake task takes two arguments. `:name` and `:namespace_id`. + +```shell +$ bundle exec rake "ee:gitlab:seed:data_seeder[data_seeder,1]" +Seeding Data for Administrator +``` + +#### `:name` + +Where `:name` is the file name. (This will reflect relative `.rb`, `.yml`, or `.json` files located in `ee/db/seeds/data_seeder`, or absolute paths to seed files) + +#### `:namespace_id` + +Where `:namespace_id` is the ID of the User or Group Namespace + +## Develop + +The Data Seeder uses FactoryBot definitions from `spec/factories` which ... + +1. Saves time on development +1. Are easy-to-read +1. Are easy to maintain +1. Do not rely on an API that may change in the future +1. Are always up-to-date +1. Execute on the lowest-level (`ActiveRecord`) possible to create data as quickly as possible + +> From the [FactoryBot README](https://github.com/thoughtbot/factory_bot#readme_) : `factory_bot` is a fixtures replacement with a straightforward definition syntax, support for multiple build +> strategies (saved instances, unsaved instances, attribute hashes, and stubbed objects), and support for multiple factories for the same class, including factory +> inheritance + +Factories reside in `spec/factories/*` and are fixtures for Rails models found in `app/models/*`. For example, For a model named `app/models/issue.rb`, the factory will +be named `spec/factories/issues.rb`. For a model named `app/models/project.rb`, the factory will be named `app/models/projects.rb`. + +There are currently three parsers that the GitLab Data Seeder supports. Ruby, YAML, and JSON. + +### Ruby + +All Ruby Seeds must define a `DataSeeder` class with a `#seed` instance method. You may structure your Ruby class as you wish. All FactoryBot [methods](https://www.rubydoc.info/gems/factory_bot/FactoryBot/Syntax/Methods) (`create`, `build`, `create_list`) will be included in the class automatically and may be called. + +The `DataSeeder` class will have the following instance variables defined upon seeding: + +- `@seed_file` - The `File` object. +- `@owner` - The owner of the seed data. +- `@name` - The name of the seed. This will be the seed file name without the extension. +- `@group` - The root group that all seeded data will be created under. + +```ruby +# frozen_string_literal: true + +class DataSeeder + def seed + my_group = create(:group, name: 'My Group', path: 'my-group-path', parent: @group) + my_project = create(:project, :public, name: 'My Project', namespace: my_group, creator: @owner) + end +end +``` + +### YAML + +The YAML Parser is a DSL that supports Factory definitions and allows you to seed data using a human-readable format. + +```yaml +name: My Seeder +groups: + - _id: my_group + name: My Group + path: my-group-path + +projects: + - _id: my_project + name: My Project + namespace_id: <%= groups.my_group.id %> + creator_id: <%= @owner.id %> + traits: + - public +``` + +### JSON + +The JSON Parser allows you to house seed files in JSON format. + +```json +{ + "name": "My Seeder", + "groups": [ + { "_id": "my_group", "name": "My Group", "path": "my-group-path" } + ], + "projects": [ + { + "_id": "my_project", + "name": "My Project", + "namespace_id": "<%= groups.my_group.id %>", + "creator_id": "<%= @owner.id %>", + "traits": ["public"] + } + ] +} +``` + +### Taxonomy of a Factory + +Factories consist of three main parts - the **Name** of the factory, the **Traits** and the **Attributes**. + +Given: `create(:iteration, :with_title, :current, title: 'My Iteration')` + +||| +|:-|:-| +| **:iteration** | This is the **Name** of the factory. The file name will be the plural form of this **Name** and reside under either `spec/factories/iterations.rb` or `ee/spec/factories/iterations.rb`. | +| **:with_title** | This is a **Trait** of the factory. [See how it's defined](https://gitlab.com/gitlab-org/gitlab/-/blob/9c2a1f98483921dd006d70fdaed316e21fc5652f/ee/spec/factories/iterations.rb#L21-23). | +| **:current** | This is a **Trait** of the factory. [See how it's defined](https://gitlab.com/gitlab-org/gitlab/-/blob/9c2a1f98483921dd006d70fdaed316e21fc5652f/ee/spec/factories/iterations.rb#L29-31). | +| **title: 'My Iteration'** | This is an **Attribute** of the factory that will be passed to the Model for creation. | + +### Examples + +In these examples, you will see an instance variable `@owner`. This is the `root` user (`User.first`). + +#### Create a Group + +```ruby +my_group = create(:group, name: 'My Group', path: 'my-group-path') +``` + +#### Create a Project + +```ruby +# create a Project belonging to a Group +my_project = create(:project, :public, name: 'My Project', namespace: my_group, creator: @owner) +``` + +#### Create an Issue + +```ruby +# create an Issue belonging to a Project +my_issue = create(:issue, title: 'My Issue', project: my_project, weight: 2) +``` + +#### Create an Iteration + +```ruby +# create an Iteration under a Group +my_iteration = create(:iteration, :with_title, :current, title: 'My Iteration', group: my_group) +``` + +### Frequently encountered issues + +#### ActiveRecord::RecordInvalid: Validation failed: Email has already been taken, Username has already been taken + +This is because, by default, our factories are written to backfill any data that is missing. For instance, when a project +is created, the project must have somebody that created it. If the owner is not specified, the factory attempts to create it. + +**How to fix** + +Check the respective Factory to find out what key is required. Usually `:author` or `:owner`. + +```ruby +# This throws ActiveRecord::RecordInvalid +create(:project, name: 'Throws Error', namespace: create(:group, name: 'Some Group')) + +# Specify the user where @owner is a [User] record +create(:project, name: 'No longer throws error', owner: @owner, namespace: create(:group, name: 'Some Group')) +create(:epic, group: create(:group), author: @owner) +``` + +#### `parsing id "my id" as "my_id"` + +See [specifying variables](#specify-a-variable) + +#### `id is invalid` + +Given that non-Ruby parsers parse IDs as Ruby Objects, the [naming conventions](https://docs.ruby-lang.org/en/2.0.0/syntax/methods_rdoc.html#label-Method+Names) of Ruby must be followed when specifying an ID. + +Examples of invalid IDs: + +- IDs that start with a number +- IDs that have special characters (-, !, $, @, `, =, <, >, ;, :) + +#### ActiveRecord::AssociationTypeMismatch: Model expected, got ... which is an instance of String + +This is currently a limitation for the seeder. + +See the issue for [allowing parsing of raw Ruby objects](https://gitlab.com/gitlab-org/gitlab/-/issues/403079). + +## YAML Factories + +### Generator to generate _n_ amount of records + +### [Group Labels](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/factories/labels.rb) + +```yaml +group_labels: + # Group Label with Name and a Color + - name: Group Label 1 + group_id: <%= @group.id %> + color: "#FF0000" +``` + +### [Group Milestones](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/factories/milestones.rb) + +```yaml +group_milestones: + # Past Milestone + - name: Past Milestone + group_id: <%= @group.id %> + group: + start_date: <%= 1.month.ago %> + due_date: <%= 1.day.ago %> + + # Ongoing Milestone + - name: Ongoing Milestone + group_id: <%= @group.id %> + group: + start_date: <%= 1.day.ago %> + due_date: <%= 1.month.from_now %> + + # Future Milestone + - name: Ongoing Milestone + group_id: <%= @group.id %> + group: + start_date: <%= 1.month.from_now %> + due_date: <%= 2.months.from_now %> +``` + +#### Quirks + +- You _must_ specify `group:` and have it be empty. This is because the Milestones factory will manipulate the factory in an `after(:build)`. If this is not present, the Milestone will not be associated properly with the Group. + +### [Epics](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/spec/factories/epics.rb) + +```yaml +epics: + # Simple Epic + - title: Simple Epic + group_id: <%= @group.id %> + author_id: <%= @owner.id %> + + # Epic with detailed Markdown description + - title: Detailed Epic + group_id: <%= @group.id %> + author_id: <%= @owner.id %> + description: | + # Markdown + + **Description** + + # Epic with dates + - title: Epic with dates + group_id: <%= @group.id %> + author_id: <%= @owner.id %> + start_date: <%= 1.day.ago %> + due_date: <%= 1.month.from_now %> +``` + +## Variables + +Each created factory can be assigned an identifier to be used in future seeding. + +You can specify an ID for any created factory that you may use later in the seed file. + +### Specify a variable + +You may pass an `_id` attribute on any factory to refer back to it later in non-Ruby parsers. + +Variables are under the factory definitions that they reside in. + +```yaml +--- +group_labels: + - _id: my_label #=> group_labels.my_label + +projects: + - _id: my_project #=> projects.my_project +``` + +Variables: + +NOTE: +It is not advised, but you may specify variables with spaces. These variables may be referred back to with underscores. + +### Referencing a variable + +Given a YAML seed file: + +```yaml +--- +group_labels: + - _id: my_group_label #=> group_labels.my_group_label + name: My Group Label + color: "#FF0000" + - _id: my_other_group_label #=> group_labels.my_other_group_label + color: <%= group_labels.my_group_label.color %> + +projects: + - _id: my_project #=> projects.my_project + name: My Project +``` + +When referring to a variable, the variable refers to the _already seeded_ models. In other words, the model's `id` attribute will +be populated. |