Question:
I'm studying database and ended up getting to the subject of replacing pages like LRU and MRU. The operating system already does this normally, why does the DB need to do its own paging?
Answer:
Because the goals are different. The operating system creates byte pages, the database creates data pages for a specific purpose. He knows better what's in there, his pages have a specific format, so much so that it is common that not all pages are the same, each type of page has a specific data structure. He needs to have more control over how they are handled. It is generally optimized for performance.
OS tends to use a not very efficient linked list for the database access pattern.
-
Of course, unless the DB bypasses the filesystem (it used to be common, it's not anymore), it also uses OS paging at a lower level. A DB page can span multiple OS pages or an OS page can contain multiple DB pages, it depends on what is best for each situation.
There are banks that use OS paging primarily. Those who chose to have their own paging needed an extra level of control over their memory. It has a more suitable replacement algorithm ( MRU, LRU, LRU-k, etc. ). The database algorithm knows what should be active and can prioritize what is most important for that case (indexes, especially primaries, need to be available with priority). Remembering that OS paging is still used to compose DB pages.
In general the disk pages match the memory pages in the case of the DB. As they can be different sizes than what is used in the OS (to suit the access pattern), it makes perfect sense to have a control of your own.
There are banks that prefer to leave the cache of pages entirely up to the OS, others use both levels. This can even make it difficult to optimize memory consumption since the data is in two different locations, but it can provide more security and flexibility.
-
Depending on the DB implementation, it is not possible to manipulate directly on OS pages. It is not always necessary to manipulate the data that the OS can play on disk. Each implementation has its specificity and will handle it in the way that suits you best. The DB needs to decide for itself what should go to "disk" or not.
-
With a more specialized algorithm, it is possible to get more performance and scalability, as well as more flexibility and reliability.
-
With its own system it is possible to abstract the conditions of the different operating systems. And it can adapt when a condition no longer meets expectations and needs.