mysql – How to optimize pagination from large amounts of data?

Question:

There is a large sample of data (50 million), it is necessary to display the first 10 records from it by the most recent date (after which another 10 records should be loaded). The difficulty is that the data is grouped into 10 sections (each has from 2-3 to 10 million records) and you only need to display the data from this section.

SELECT * FROM posts WHERE type=1 ORDER BY updated DESC LIMIT 10;

For this query, EXPLAIN shows that MySQL first selects several million rows that match the desired type, and then sorts them by date.

The table has a composite index (type, updated), but it doesn't help much. MySQL only uses the first part of it.

Ditto when displaying subsequent pages

SELECT * FROM posts WHERE type=1 AND updated<'offset' ORDER BY updated DESC LIMIT 10;

How can I optimize this query?

Answer:

As a result, I came to a solution at the level of application logic. The main database constantly contains 0.5-1 million of the most recent records, and the rest are transferred to the archive. The archiving script once a day transfers old records to the archive. Initially, users are given only the most recent data from the main database (99% of queries), and if they are not enough, then a search is made in the archive.

PS In addition, if the user started scrolling through the feed, he made a conclusion not 10, but 50 entries. When scrolling, the client issues data from the request 1 time, and places the remaining 4 packages into an array and issues them as needed. When the customer data runs out, a new request is made. I don't output 50 at once, so as not to slow down the browser (there are a lot of graphics). Something similar is found in vk and a number of other large sites.

PPS The final solution at the MySQL level is as follows. Conducted manually testing with samples based on a hardcoded (use index) simple or composite index. The best performance turned out to be the use of a composite index (for example, (type, update)), the use of which is hardcoded in the application code, depending on the specific type of selection (by default, in some cases, MySQL chooses not the most productive index). I was surprised by the discrepancy between the data produced by EXPLAIN and the real performance indicators. For example, a simple index (EXPLAIN shows rows 10) performed hundreds of times slower than a composite index with rows of several million records.

PPPS In general, the problem was in the wrong choice by the MySQL engine of the index on which the search was carried out (in some cases, only a simple index was used, when it was better to use a composite index, and in some cases, the search was carried out immediately on 2 indices with the union of the results). By writing USE INDEX manually (for each specific case), the performance increased many times over.

Scroll to Top