summaryrefslogtreecommitdiff
path: root/doc/development
diff options
context:
space:
mode:
authorYorick Peterse <yorickpeterse@gmail.com>2017-05-31 13:47:14 +0200
committerYorick Peterse <yorickpeterse@gmail.com>2017-05-31 13:59:00 +0200
commit3d160480c07cd967480716d7cfbe332dfc53507e (patch)
tree5081d3e23c0307e77ceb628c586d6ce26ad763e1 /doc/development
parent228926daee799c95e752a3c284c860e5bc60e528 (diff)
downloadgitlab-ce-3d160480c07cd967480716d7cfbe332dfc53507e.tar.gz
Document not storing serialized data
Diffstat (limited to 'doc/development')
-rw-r--r--doc/development/README.md1
-rw-r--r--doc/development/serializing_data.md84
2 files changed, 85 insertions, 0 deletions
diff --git a/doc/development/README.md b/doc/development/README.md
index 934c6849ff9..5989f755a4d 100644
--- a/doc/development/README.md
+++ b/doc/development/README.md
@@ -49,6 +49,7 @@
- [Adding database indexes](adding_database_indexes.md)
- [Post Deployment Migrations](post_deployment_migrations.md)
- [Foreign Keys & Associations](foreign_keys.md)
+- [Serializing Data](serializing_data.md)
## i18n
diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md
new file mode 100644
index 00000000000..2b56f48bc44
--- /dev/null
+++ b/doc/development/serializing_data.md
@@ -0,0 +1,84 @@
+# Serializing Data
+
+**Summary:** don't store serialized data in the database, use separate columns
+and/or tables instead.
+
+Rails makes it possible to store serialized data in JSON, YAML or other formats.
+Such a field can be defined as follows:
+
+```ruby
+class Issue < ActiveRecord::Model
+ serialize :custom_fields
+end
+```
+
+While it may be tempting to store serialized data in the database there are many
+problems with this. This document will outline these problems and provide an
+alternative.
+
+## Serialized Data Is Less Powerful
+
+When using a relational database you have the ability to query individual
+fields, change the schema, index data and so forth. When you use serialized data
+all of that becomes either very difficult or downright impossible. While
+PostgreSQL does offer the ability to query JSON fields it is mostly meant for
+very specialized use cases, and not for more general use. If you use YAML in
+turn there's no way to query the data at all.
+
+## Waste Of Space
+
+Storing serialized data such as JSON or YAML will end up wasting a lot of space.
+This is because these formats often include additional characters (e.g. double
+quotes or newlines) besides the data that you are storing.
+
+## Difficult To Manage
+
+There comes a time where you will need to add a new field to the serialized
+data, or change an existing one. Using serialized data this becomes difficult
+and very time consuming as the only way of doing so is to re-write all the
+stored values. To do so you would have to:
+
+1. Retrieve the data
+1. Parse it into a Ruby structure
+1. Mutate it
+1. Serialize it back to a String
+1. Store it in the database
+
+On the other hand, if one were to use regular columns adding a column would be
+as easy as this:
+
+```sql
+ALTER TABLE table_name ADD COLUMN column_name type;
+```
+
+Such a query would take very little to no time and would immediately apply to
+all rows, without having to re-write large JSON or YAML structures.
+
+Finally, there comes a time when the JSON or YAML structure is no longer
+sufficient and you need to migrate away from it. When storing only a few rows
+this may not be a problem, but when storing millions of rows such a migration
+can easily take hours or even days to complete.
+
+## Relational Databases Are Not Document Stores
+
+When storing data as JSON or YAML you're essentially using your database as if
+it were a document store (e.g. MongoDB), except you're not using any of the
+powerful features provided by a typical RDBMS _nor_ are you using any of the
+features provided by a typical document store (e.g. the ability to index fields
+of documents with variable fields). In other words, it's a waste.
+
+## Consistent Fields
+
+One argument sometimes made in favour of serialized data is having to store
+widely varying fields and values. Sometimes this is truly the case, and then
+perhaps it might make sense to use serialized data. However, in 99% of the cases
+the fields and types stored tend to be the same for every row. Even if there is
+a slight difference you can still use separate columns and just not set the ones
+you don't need.
+
+## The Solution
+
+The solution is very simple: just use separate columns and/or separate tables.
+This will allow you to use all the features provided by your database, it will
+make it easier to manage and migrate the data, you'll conserve space, you can
+index the data efficiently and so forth.