[TIL-6] Why pagination is slow?
Pagination is a process that is used to divide large data into smaller discrete pages, and this process is also known as paging.
Pagination is commonly used by web applications and can be seen on Google. When we search for something on Google, it shows the results on the separated page; this is the main idea of the pagination.
Why pagination is slow?
I tried to do some experiments in MySQL DB with 1milion data in a local network without any indexes (mock data can be found here https://github.com/davidasync/why-pagination-slow).
/**
Query time: 4 millisecond(s), Number of affected records: 10
-> Limit: 10 row(s) (cost=97693.75 rows=10) (actual time=0.096..0.331 rows=10 loops=1)
-> Table scan on sample_table (cost=97693.75 rows=939663) (actual time=0.062..0.143 rows=10 loops=1)
*/
EXPLAIN ANALYZE SELECT * FROM `sample_table` LIMIT 10 OFFSET 0;
Actually, pagination is not that slow if the page is not that deep. The query above is a sample query to get data from `sample_table` on the 1st page. It only took 4ms with 10 rows scan.
/**
Query time: 442 millisecond(s), Number of affected records: 10
-> Limit/Offset: 10/499990 row(s) (cost=97693.75 rows=10) (actual time=7435.439..7435.614 rows=10 loops=1)
-> Table scan on sample_table (cost=97693.75 rows=939663) (actual time=0.054..3959.741 rows=500000 loops=1)
*/
EXPLAIN ANALYZE SELECT * FROM `sample_table` LIMIT 10 OFFSET 499990;
On page 499990, the performance is worse. It took 442ms with 500000 rows scan.
/**
Query time: 1.236 second(s), Number of affected records: 10
-> Limit/Offset: 10/999990 row(s) (cost=97693.75 rows=0) (actual time=16690.944..16691.143 rows=10 loops=1)
-> Table scan on sample_table (cost=97693.75 rows=939663) (actual time=0.093..9096.920 rows=1000000 loops=1)
*/
EXPLAIN ANALYZE SELECT * FROM `sample_table` LIMIT 10 OFFSET 999990;
On the last page which is 999990, it took 1.236 seconds with a million rows scan. That means the DB needs to do a full scan to get the last data.
Based on that experiments, we can see that it took more than 1seconds for only 10 data on the last page even in a local network without any concurrent query.
It happened because the DB have to do a million rows scan to get the last 10 data.
Are other database technologies also face this problem?
- MariaDB https://mariadb.com/kb/en/pagination-optimization/, although MariaDB is quite similar to Mysql it also suggests approaching pagination in a different way.
- Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.10/paginate-search-results.html, one of the famous NoSQL databases for search engines, elastic search limit pagination query to 10000. Even worst with sharding.
- Apache Solr https://solr.apache.org/guide/6_6/pagination-of-results.html#performance-problems-with-deep-paging, another NoSQL database for search engines also has a problem with “Deep Page”, same with elasticsearch, even worst with sharding.
- MongoDB https://medium.com/swlh/mongodb-pagination-fast-consistent-ece2a97070f3, also has a problem with pagination
How to solve this problem?
“If someone wishes to fly, will that let him grow wings? I don’t think so. You don’t change yourself. You change how you approach the problem” — Sora, No Game No Life
As far as I know, no one can resolve this problem. Even Google and Amazon cannot resolve this problem. Let’s explore how they approach this problem.
The best place to hide a dead body is page 2 of Google search results
If you search a keyword on Google, it will show how many results they found. For example, the picture on the left-hand side shows the keyword “How pagination works” has 25.000.000 results.
But as the picture on the right-hand side shows, if you try to go to the next page, again and again, the pagination stops at page 40
And if for most people (at least for me), if I can’t find what I want on the 1st page of google, I will change the search keyword without checking on the 2nd page.
Amazon also has the same approach as google. There are 6000 results for the keyword Java. But it only stops at page 7.
/**
Query time: 6 millisecond(s), Number of affected records: 1
-> Limit: 10 row(s) (cost=2.36 rows=10) (actual time=0.113..0.578 rows=10 loops=1)
-> Filter: (sample_table.id > 999990) (cost=2.36 rows=10) (actual time=0.078..0.369 rows=10 loops=1)
-> Index range scan on sample_table using PRIMARY over (999990 < id) (cost=2.36 rows=10) (actual time=0.044..0.157 rows=10 loops=1)
*/
EXPLAIN ANALYZE SELECT * FROM `sample_table` where ID > 999990 order by ID limit 10
Another approach is recommended by some references (https://mariadb.com/kb/en/pagination-optimization/ & https://medium.com/swlh/mongodb-pagination-fast-consistent-ece2a97070f3)
Instead of dividing the data into chunks of pages, they suggest approaching it by using infinite scroll.
For every query, the client should give a unique sortable ID. The query above it uses ID as a unique sortable ID, and it only took 6ms with 10 rows scan to get the last page.
But as we might know, the infinite scroll has several drawbacks. One of them is your content can’t have a footer like Facebook & Twitter.
References
- https://www.sqlshack.com/pagination-in-sql-server/
- https://mariadb.com/kb/en/pagination-optimization/
- https://stackoverflow.com/questions/34110504/optimize-query-with-offset-on-large-table/34291099
- https://www.elastic.co/guide/en/elasticsearch/reference/7.10/paginate-search-results.html
- https://solr.apache.org/guide/6_6/pagination-of-results.html#performance-problems-with-deep-paging
- https://medium.com/swlh/mongodb-pagination-fast-consistent-ece2a97070f3
- https://www.youtube.com/watch?v=53MNk1hmRTw&ab_channel=ProgrammerZamanNow
- https://dev.mysql.com/blog-archive/mysql-explain-analyze
- https://github.com/davidasync/why-pagination-slow