sql – Are unnecessary indexes in the bank a problem?

Question:

It is customary to add indexes to improve query performance. However, it is usually not recommended to go out adding indexes on everything that is a column, only where the need is identified.

Why this recommendation? What kind of problems can unnecessary indexes cause?

Answer:

Basically it is because each new index requires more time to update the database in any change that affects these indexes. I've seen people suggest making indexes for everything. With only 6 or 7 fields, for all combinations tens to hundreds of indexes are needed and all would need to be updated even if you change only one field. The time taken can affect the scalability of the bank.

In some cases, the consultation may also be impaired. The occupied space will hurt the cache of data/indexes that really matter. Furthermore, accessing the index first and then accessing the data has a cost and it can be higher than accessing the data directly in certain patterns. And the system optimizer does not always correctly detect whether it should do direct access.

Likewise, the automatic recommendations of SGDBs systems to create indexes are not always correct.

In addition, each new index is one more resource to maintain . Violates the YAGNI . It even makes it difficult to upgrade the model of a database in the production system.

I've seen "reputable specialists" say that the ideal is to create an index for everything and take out the ones you don't need. I can't imagine how this can make sense. It's the worst case of premature optimization I've ever seen. Every optimization should be created when you've measured it and it's clear that it's needed. Index exists fundamentally to optimize access to data.

The most complete database systems have many features to get the information needed to make the right decision. This information can help identify unnecessary indexes as well. And as it's a resource to optimize, it's necessary to review frequently, what might be good in one moment may not be better in the future.

A typical example of an error in index creation is when it is used exclusively for sporadic reporting. The loss to generate a sporadic report without index optimization tends to do less damage than the index will do to the system as a whole.

But one of the worst exaggerations I've ever seen, and it happens a lot, is to create an index for colA+colB and another one for colA . The first can be used to get the same optimization the second provides, making it unnecessary.

But something I always need to emphasize: measuring is the most important thing. I and even professionals who are much more experienced in the subject are always having surprises. What seems to help often hurts more.

Good reference .


To avoid misunderstanding the other answer (which has been removed but I think the information is still relevant) by people who are learning about the subject I will put a few points:

  • The performance loss caused by the creation of unnecessary indexes is caused by any operation that changes data contained in the index keys. It doesn't matter if it's INSERT , UPDATE and even DELETE depending on the implementation. Changes that do not affect the index key do not cause performance issues. On database systems that use some form of MVCC any change ends up affecting at least the primary key index (which may be clustered ) as the data is copied to another location.

  • SELECT performance can also be negatively affected as mentioned above in my answer. It doesn't happen very often, mainly because the planner tends to avoid using the index in these cases, but it can .

  • Index creation benefits any data access operation that the database planner finds useful. Usually this happens in WHERE comparisons but not in any comparison (this is important). There are cases where the index doesn't help. Index is not magic, it has an organization that optimize some relationships, not all. It also occurs when using ORDER BY , JOIN and when using some aggregator functions. But it is good to be clear that only one measurement can guarantee that there will be gain. Of course, experience can make the database developer identify simple cases, but care must be taken. The good swimmer tends to take risks and drown more than the swimmer who is afraid of unknown water. General rules, as always, can do more harm than good. You always have to analyze the specific case with specific information for the case.

    Examples where the index probably won't help: WHERE colA = colB / 2 or WHERE colA LIKE '%Silva%' .

  • Anyway, the gain occurs in any of the data access operations that have some use of the above clauses. Of course, in UPDATE and DELETE only the WHERE and ORDER BY is relevant.

It's full of myths in software development and as a consequence we have a lot of software with problems because there is the reading of illusory information about the subjects. Leaving room for misunderstanding does more harm than good.

Scroll to Top