Question:
When I didn't know much about the issue of database structure and relationship between tables, I used to use JSON or comma-separated data in a certain field of any table in the database.
I actually remember a specific case where I needed to relate multiple images to a particular post. These images I saved a list in JSON
in a certain field called imagens
.
The structure was something like this
-----------------------------------------------
tabela
-----------------------------------------------
id | imagens | usuario_id
-----------------------------------------------
1 | ["image1.png", "image2.png"] | 55
2 | NULL | 56
However, at a certain moment when changes began to appear in this system, I was harshly criticized by a programmer who knew more about databases than I did (and was also teased) for having done so.
He explained to me that the data should look something like this:
--------------------------
tabela
--------------------------
id | usuario_id
--------------------------
1 | 55
2 | 56
----------------------------
imagens
----------------------------
post_id | url_imagem
1 | image1.png
1 | image2.png
Considering these two cases (relationship VS serialization saved in a field), can we say that, structurally speaking, using JSON
is bad practice?
I was criticized for using JSON
data in a table field instead of a relationship, but there is a case where I must/can use a serialization and save it in a database field, without this being considered a bad programming practice ?
Answer:
A normalized relational database assumes that each attribute (value of a column in a table) is atomic , that is, it cannot be broken into smaller pieces from the point of view of the relational model. From the point of view of the relational model , it's worth pointing out: no one would suggest that you store each digit of a number in a separate column, or each character of a string. If data is not used to form relationships between tables, that data can be considered for all atomic purposes.
In your example above this is not necessarily true: you have a list of images, ok, but can this list of images be considered atomic? Each image belongs to a single row of the table, there is no overlapping of images (ie the same image belongs to several different posts ), you will always be interested in the complete list and nothing else (ie it is not possible for you to solve, for example, take only the first image from the list, to make a thumbnail or cover image for example)?
If the answer is a definite "yes", there's no way you'd want parts of this list of images for anything, it's either the whole list or it's nothing, so no, it's not bad practice to store the list this way, whether using JSON, comma-separated strings, Blobs , etc. In the same way that your image is in an external file, and that file is seen by the relational model as an atomic thing (you don't reference "pieces" of that image in your model, just the entire image), the list of images – or even some something more complex and structured – it can be seen as an atomic entity, and therefore saved in a single attribute.
However, most of the time the answer is not a "yes": it's perfectly possible to conceive of use cases where an image is shared by more than one post , or where you want to retrieve only a subset of the post 's images for anything , or even if you want to associate with each image additional metadata (like type, size, creation date, etc). In this case, a normalized representation is usually ideal, and this alternative suggestion presented would correspond to a normal form.
As a rule, I would say, when in doubt, to adopt the normalized form: it is easier to denormalize a table later (if performance problems arise, for example) than to normalize one that started out denormalized. Unless you have good reasons to save a JSON in a table, avoid doing so. There are cases where JSON is the best representation for a dataset, but in that case a non-relational database – commonly referred to as NoSQL – may be a preferable option.