What is the INDEX index in MySQL for?

Question:

What is the INDEX index in MySQL for?

ALTER TABLE `tabela`
ADD INDEX `tabela_id_index` (`tabela_id` ASC);

Answer:

An easy way to think about indexes is to think about a book's table of contents, it exists to make it easier to find the contents of the book, so you find content easier than searching page by page, for this it depends on what the indexes are well laid out .


Benefits:

  • They are able to reduce the amount of data that the server would have to examine.
  • They are able to help the server to order (eg ORDER BY) and avoid creating temporary tables.

The use of "are able" is because not all indexes created will do this, because maybe they will never be used.


Types of Indexes:

These are the types of indexes that exist in the most common engines :

  • B-Tree: Most common of all types of indexes, basically except for the Archive Engine all support it, but with internal differences .
    • InnoDB Engine uses uncompressed B+Tree and are referenced by Primary Key .
    • MyISAM Engine uses B-Tree comprehension techniques and are referenced by their physical location.
    • NDB Cluster Engine uses T-Tree.
    • TokuDB uses a variation called Fractal Tree .

  • Hash: Its name explains, it is only useful for differentiation and equality ( IN() , = and <=> ), but it is not used for any other functionality.
    • MEMORY Engine is one of the only ones that explicitly support this type of index.
    • InnoDB Engine has adaptive hash indexes , when InnoDB identifies that some values ​​are accessed frequently it creates a hash index and uses it before the B-Tree, but this is fully automatic and there is no way to turn it off or create it.

  • R-Tree: Rarely used in MySQL, however used in PostgreSQL and PostGIS, it does not require the use of WHERE to benefit.
    • InnoDB Engine is one of the only ones that support this type of index.

  • Full-Text Index: Have a different feature and only used for MATCH AGAINST operations.
    • MyISAM Engine supports this type of index.
    • InnoDB Engine supports this type of index in the latest versions.

In addition, ScaleDB uses Patricia-Tries. InfiniDB and Infobright have their own architecture (?).


As the default is the B-Tree, that's exactly what I'll use as a base for everything here.

Limitations:

  • They are not suitable for readings that do not start from the left, for example SELECT * FROM Tabela WHERE Nome LIKE '%A' , to try to find all names that end with A , however you can use A% to find the one that starts with A .

  • Your query must execute the same order as defined index, creating a KEY(Nome, Sobrenome) must inform in this order. You can WHERE Nome = 'Inkeliz' , but you cannot use WHERE Sobrenome = 'Lima' . You must follow the order, ie WHERE Nome = 'Inkeliz' AND Sobrenome = 'Lima' , for example.

  • If you create a range , for example, WHERE Nome LIKE 'ANA%' AND Sobrenome = 'Lima' , the indexes will be used to find Ana, but it will not be used to find last names, this will still create a temporary table. This indicates that you prefer to put X > 1 , BETWEEN 1 AND 5 and LIKE as the last condition, that is, all equalities must come first (there are even "tricks"/"humbles" with IN() ), I'll detail this later .

Problems:

Isolate columns:

Consider this:

SELECT Nome FROM Usuarios WHERE ID + 1 = 10

See the problem? MySQL is not able to do this sum. It's easy for us to know that ID + 1 = 10 is the same as ID = 9 , but MySQL doesn't do that, as a result it will scan the entire database and add up all the IDs, ignoring any index.

More real example:

SELECT Nome FROM Acessos WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(UltimoAcesso) <= 7

Same problem, you'll run TO_DAYS() on everything and then subtract and then compare it to see if it's less than 7. Indexes are not miraculous in these cases and will be ignoreable.

Create range :

In MySQL using CAN IN() is better than BETWEEN , but beware of a large number of conditions, for example:

SELECT Nome FROM Usuario WHERE Genero IN ('h', 'm') AND Idade IN (18, 19, 20, 21, 22, 23, 24, 25) AND UltimaAcao > DATE_SUB(NOW(), INTERVAL 7 DAY);

This would have 2 * 8 possibilities and then it would theoretically be better than:

SELECT Nome FROM Usuario WHERE Genero IN ('h', 'm') AND Idade BETWEEN 18 AND 25 AND UltimaAcao > DATE_SUB(NOW(), INTERVAL 7 DAY);

This is because using BETWEEN you would generate two ranges, the first by Idade and the second by UltimaAcao , while in IN() is equality. This still raises another question, which of the two has greater relevance?! But, why use Genero IN ('h', 'm') , assuming they are the only options .

The workaround of IN() comes from the need to specify all columns, so if you do: KEY(Genero, Pais, Idade) you will always have to inform the Gender, Country and Age. That's why we use the IN() , that is, if there is only h and m then we make a Genero IN ('h', 'm') , that way we "ignore" the Genero and we can filter by Country, or make a new IN() with all the country lists and we move to the age. *The age at last, again, because it has the potential power to create a range .

If we used:

SELECT Nome FROM Usuario WHERE Idade > 18 AND Genero = 'h' AND Pais = 'BR'

It would obligatorily ignore the Genero and Pais indexes, because the first one, Idade creates a range .

Relevance and order:

Now which one has more relevance, consider this:

SELECT * FROM Usuario WHERE Ativo = 0 AND id =  10;
SELECT * FROM Usuario WHERE id = 10;
SELECT * FROM Usuario WHERE Ativo IN (0, 1) AND id = 10;

The first and last could use KEY(Ativo, id) , but would be incompatible with the second. But we could also do:

SELECT * FROM Usuario WHERE id = 10 AND Ativo = 0;
SELECT * FROM Usuario WHERE id = 10;
SELECT * FROM Usuario WHERE id = 10 AND Ativo IN (0,1);

So we could use the KEY(id, Ativo) , they would all use indexes, in this case it tends to be better, because the Asset, being binary will return more data than the id which tends to be unique, so the id has greater relevance and this has great impact.

Another thing is to know if you won't do this:

SELECT * FROM Usuario WHERE id > 10 AND Ativo = 0;
SELECT * FROM Usuario WHERE Ativo = 0;

Because if these queries are used the KEY(Ativo, id) could be better, but again it depends on the case, so using Ativo IN() even before you just select the id .

Another problem is creating multiple indexes separated by each column, this is terrible and rarely useful.

Separate columns:

Creating separate indexes can be good if you use:

SELECT * FROM tabela WHERE Nome = 'Inkeliz' OR Idade = 18

If you create separate indexes, Nome and Idade MySQL will make a UNION of the two results, using each separate index. However do this:

 SELECT * FROM tabela WHERE Nome = 'Inkeliz' AND Idade = 18

It will not benefit from both of the indexes, it will only choose to use one of them, not both. Most of the time creating split indexes is not good , in short.

ORDER BY and GROUP BY also benefit from indexes, but as long as you don't create a range , because in this case it falls into a filesort , there's no way, except to make a RAID-0 of SSD .


Indexes are complex, there are two books about it, unfortunately they don't cover MySQL 5.6 and MySQL 5.7, and apparently there is no new version to be released , they are:

High Performance MySQL: Optimization, Backups, and Replication , has a version translated by Altas Books (but it's even older and there are several complaints about translation failures), in it that mentions the use of the IN() workaround, it's not my authorship (and gave a good RAM reduction , deleting repeated indexes ).

Relational Database Index Design and the Optimizers , almost a book only only indexes, this is still in line to be read, the book above indicates this too. 🙂

Scroll to Top