Question:
It is customary to add indexes to improve query performance. However, normally it is not recommended to go out adding indexes in everything that is column, only where the need is identified.
Why this recommendation? What kind of problems can unnecessary indexes cause?
Answer:
Basically it's because each new index requires more time to update the database on any changes that affect these indexes. I've seen people suggest making indexes for everything. With only 6 or 7 fields, for all combinations dozens to hundreds of indices are needed and they would all need to be updated even if you change just one field. The time spent can affect the scalability of the bank.
In some cases the consultation may be impaired as well. The occupied space will harm the cache of the data/indexes that are really important. In addition, accessing the index first and then accessing the data has a cost and it can be higher than accessing the data directly in certain patterns. And not always the system optimizer correctly detects whether it should do direct access.
Likewise, the automatic recommendations of SGDBs systems to create indexes are not always correct.
In addition, each new index is an extra resource to maintain . Violates the YAGNI . It even makes it difficult to upgrade the model of a database in the production system.
I've seen "reputable experts" say that the ideal is to create an index for everything and remove the ones you don't need. I can't imagine how this can make sense. It's the worst case of premature optimization I've ever seen. Every optimization should be created when you've measured it and it's clear that it's needed. Index exists fundamentally to optimize access to data.
The most complete database systems have many resources to obtain the necessary information for the correct decision. This information can help identify unnecessary indexes as well. And as it is a feature to optimize, you need to review it often, what may be good at one time may not be good in the future.
A typical example of an error in index creation is when it is used exclusively for sporadic reporting. The loss of generating a sporadic report without index optimization usually causes less damage than the index will do to the system as a whole.
But one of the worst exaggerations I've ever seen, and it happens a lot, is creating an index for colA+colB
and another for colA
. The first can be used to get the same optimization as the second one provides, making it unnecessary.
But something I always need to emphasize: measuring is the most important thing. I and even professionals much more experienced than me in the subject are always having surprises. What seems to help often hurts more.
To avoid misunderstanding the other answer (which was removed but I think the information is still relevant) by people who are learning about the subject I will put some points:
-
The performance loss caused by the creation of unnecessary indexes occurs in any operation that changes data contained in the index keys. It doesn't matter if it's
INSERT
,UPDATE
or evenDELETE
depending on the implementation. Changes that do not affect the index key do not cause performance issues. On database systems that use some form of MVCC any change ends up affecting at least the primary key index (which may be clustered ) as the data is copied to another location. -
SELECT
performance can also be negatively affected as mentioned above in my answer. It doesn't happen often, mainly because the planner tends to avoid using the index in these cases, but it can . -
Index creation, on the other hand, benefits any data access operation that the database planner identifies as useful. Usually this occurs in
WHERE
comparisons but not in any comparison (this is important). There are cases where the index does not help. Index is not magic, it has an organization that optimize some relationships, not all. It also occurs when usingORDER BY
,JOIN
and when using some aggregator functions. But it is good to be clear that only a measurement can guarantee that there will be gain. Of course, experience can make the database developer identify simple cases, but care must be taken. The good swimmer tends to risk and drown more than the swimmer who is afraid of unfamiliar water. General rules, as always, can do more harm than good. Always have to analyze the specific case with specific information for the case.Examples where the index probably won't help:
WHERE colA = colB / 2
orWHERE colA LIKE '%Silva%'
. -
In any case, the gain occurs in any of the data access operations that have some use of the clauses above. Of course in
UPDATE
andDELETE
onlyWHERE
andORDER BY
is relevant.
It is full of myths in software development and as a consequence we have a lot of software with problems because there is reading illusory information about the subjects. Leaving room for misunderstanding causes more harm than good.