php – What is the most efficient way to select items from a table?

Question:

Having the following table:

| id | name | email | coutry |
+----+------+-------+--------+

What is the most efficient (fastest) way to select ALL rows in the table knowing that the table may have 1 row or millions of rows

  • do a loop and do a SELECT per row?
  • to cycle with N SELECT 's of X rows by SELECT ?
  • to make a SELECT from all the rows of the table?
  • other way? which?

Note: Using a limit of X rows per SELECT how many rows will I get if I have 10,000 records? what if you have 1 million? Is there any algorithm?

I've limited MySQL and innodb so as not to make the question too broad.

[EDIT]

Knowing that these lines will be used in PHP to export to an .xml file with all the fields in the table.

Answer:

General Data Processing Considerations

First of all, it is important to mention that in most cases there is no way of processing data that is faster for any volume of data.

When performing application performance tuning, it is usually necessary to analyze how, when and in what context the most critical points of the system are used. Based on this analysis, it will be possible to identify solutions ranging from adjustments to the database (de-normalization, creation of indexes, changing data types), through the way in which data is retrieved (cursors, buffers , sorting) and even the use of local or distributed caches .

Anyway, the solution depends on many variables and there is no general rule.

Retrieving a single line

Small tables hardly need any optimization. You can read it in its entirety and it won't affect the overall performance of the application.

Unless, of course, it's read thousands of times a second. In this case you can keep a copy of the data in memory.

Now, if you wanted to retrieve a single row out of thousands, the best solution is to have a good index that exactly follows your WHERE clause.

Consider the following query:

select * from Pessoa where tipo_pessoa = ? and CPF = ?

In this case, it would be ideal to have an index on both the tipo_pessoa and CPF fields.

Retrieving all rows from a table

Reading an entire table at once, without having an idea of ​​the number of records, is a challenge.

The solutions are basically divided into two types: Full Reading and Partial Reading .

Full Reading

Reading the entire table is the most suitable solution if there is enough memory for it . After all, you won't have to go back and forth to the database to retrieve new values.

However, depending on what's in the table, a volume of 1 million records will likely take up more memory than we want.

At first we can even come to the conclusion that we can have all these records in memory. However, in most cases, multiple users will be accessing the system at the same time. Here comes the question: how many users do we want to serve?

Suppose the registers occupy 100 Megabytes of memory. If we have a server with 1 Gigabyte free, making a very simple account, then our system would serve 10 users well. Beyond that, paging memory to disk would likely degrade the program's performance until it becomes unusable.

Another problem is that loading all the data into memory takes time. The user would feel a difference between a system that reads 1 million records and then writes them all at once and another that sends partial data, even if the total time is slightly longer. This is why it ends up being necessary to do paging on many systems.

Therefore, a complete read of all data from large tables is often inadequate for web systems.

partial reading

To avoid the problem with memory and response time, there is then the alternative of partially reading the data.

There is not only one way to do this.

Recovering blocks with LIMIT

One approach is to perform multiple queries that retrieve different blocks of records using LIMIT .

This means that you must define a block size and run multiple queries consecutively. An example of executed queries is:

select * from Pessoa where tipo_pessoa = ? and CPF = ? LIMIT 0, 10
select * from Pessoa where tipo_pessoa = ? and CPF = ? LIMIT 10, 10
select * from Pessoa where tipo_pessoa = ? and CPF = ? LIMIT 20, 10
select * from Pessoa where tipo_pessoa = ? and CPF = ? LIMIT 30, 10
....

The problem with this approach is that there is overhead between each execution. The database will have one more work to execute each block.

Full query, partial results

Another way to retrieve partial results is to use a query that selects all the records in the table, but does not return them all to the database at once.

This prevents the database from having to create multiple result sets and also from having to load all the data into memory.

The idea is that there is a kind of cursor in the database that reads the data as we retrieve it.

In client code (PHP), as we use the registers, we discard the variables so that they can be deallocated from memory.

The weakness of this approach is keeping a resource open on the server longer.

Full and Partial Reads in PHP

The PHP Manual has a topic on Buffered result sets and Unbuffered result sets .

Buffered result sets are results that load all returned rows into memory.

Example:

$res = $mysqli->query("SELECT id FROM test ORDER BY id ASC");
for ($row_no = $res->num_rows - 1; $row_no >= 0; $row_no--) {
    $res->data_seek($row_no);
    $row = $res->fetch_assoc();
    echo " id = " . $row['id'] . "\n";
}

Since everything is in memory, you can move to any position in the result vector using the data_seek(n) method.

On the other hand, unbuffered result sets cycle through the results without storing them in memory, being indicated when there is not enough memory available.

Example:

$mysqli->real_query("SELECT id FROM test ORDER BY id ASC");
$res = $mysqli->use_result();

echo "Result set order...\n";
while ($row = $res->fetch_assoc()) {
    echo " id = " . $row['id'] . "\n";
}

Case study

A few days ago, here where I work, an application was suffering from performance problems when reading the result of a query that returned approximately 30,000 rows, with subsequent generation of a text file. The original runtime was approximately 20 minutes.

First, we checked that the query itself wasn't the problem. We found some problems, but nothing that justified the 20 minutes, as it took approximately 1 minute.

Second, we checked the generation of the file and also ruled out that it was the problem there.

Third, we also verified that the data was all loaded in memory, but it was an acceptable load for the server in use.

Anyway, we identified that the problem was in the reading time of the query results for Java.

Researching about the Driver for Oracle Database, we saw that by default it buffers 10 records of each query. This means that when traversing the 30,000 records, the Driver transfers 10 at a time.

We changed the buffer parameter to 100 and the performance improved a lot. Now instead of having 3000 calls to the bank (10 out of 10), there were only 300 (100 out of 100). We did several tests and arrived at the value of 300 buffer registers for that environment. More or less than that made performance worse. Also, the results were not the same in other environments with different amount of available memory.

Final time dropped to 2 minutes. With the same code, just modifying a certain parameter, we completely change the scenario of that functionality.

Unfortunately, I didn't find a parameter similar to the one mentioned above for PHP.

Other tips

In addition to everything mentioned above, some tips might be helpful:

  • Select as few columns as possible in your query to minimize memory usage and transfer time
  • Always sort the data so that it uses an index
  • It is possible to count how many records the table has and use an alternative algorithm depending on the situation

Conclusion

There are several mechanisms for reading a lot of data.

However, the only absolute conclusion is that testing will be necessary to determine the best solution. Don't forget to consider the environment where the system will run (CPU, memory, disk, etc.).

Scroll to Top