Thursday, April 26, 2007

Pagination strategies

Pagination

How do I limit the number of rows displayed in the results page?

Often a database query will yield hundreds or thousands of rows of data. Trying to display all of them in one webpage could be very slow and resource intensive. Worse, it results in a poor user experience for the person who actually has to scroll through all of this data.

Well-designed websites will return a small subset of the data and provide links to move to the next or last set.

The easiest way to accomplish this is by constraining the number of rows returned from the database query in the first place.

* MySQL uses the 'LIMIT' keyword for this.
* PostgreSQL uses the 'LIMIT' and 'OFFSET' keywords for this.
* Oracle uses the 'ROWNUM' keyword. It doesn't work like LIMIT and OFFSET. You will probably find it strange.
* DB2 uses the 'ROWNUMBER' 'OVER' combination.

A typical query in MySQL would look something like:

SELECT * FROM my_table LIMIT 10, 50

This will yield rows 51 through 60.
Another way to limit the number of records returned is to use the

java.sql.Statement.setMaxRows(int)

Because this is part of the JDBC API, it works for all servers.

It should be noted that with both approaches it is undefined which rows are returned. If you want particular rows returned, you should add an "order ..." clause so that the order of rows is defined.
Javaranch has a forum dedicated to JDBC/SQL issues where you can go to read more about formulating SQL statements: JDBC forum
Here are some notes from other conversations on pagination:

I. Repeat the query for each new request.
Algorithm
Client requests page 1
Execute the query
Return rows for page 1
Client requests page 2
Execute the query
Skip rows for page 1
Return rows for page 2
Variation
Most RDBMS vendors support returning a specified range of row numbers.
Efficiency varies. May not help much when combined with "order by"
Pro/Con
Pro: Stateless. Nothing is cached between requests.
Pro: Simple. Every request is the same.
Pro: Up to date with any inserts or deletes in the database.
Con: Database and time intensive. Might be repeating large,
expensive queries - if users really do ever page fwd.
Con: Not guaranteed consistent. If other processes are inserting and
deleting rows, paging fwd or bkwd might skip rows. Fwd then Bkwd
might not give the same rows.
Notes We have used this with mainframe requests where even the nastiest
queries were fast and the server had no options for storing state.
We have used row number schemes for client requests, and the highest
key on a page for page forward only action.
II. Hold the query results on the server
Algorithm
Client requests page 1
Execute query
Cache complete result set or just PKeys
Return rows for page 1
Client requests page 2
Get rows from cache or get PKeys from cache, rows from database
Return rows for page 2
Pros/Cons
Pro: Does NOT repeat the query
Pro: Self consistent - fwd then bkwd will give identical results
Pro: Can share cached results if two users have identical requests
Con: Big cache in memory or someplace close to it
Con: Complexity for cache, time-out
III. Hold a scrolling cursor on the server
Algorithm
Client requests page 1
Execute query with a cursor
Fetch & return rows for page 1
Client requests page 2
Fetch & return rows for page 2
Pro/Con
Pro: Does not repeat the query
Pro: Very small state - just a connection & cursor
Pro: Self consistent
Con: More open connections. Might hold open one db connection
per user
Con: What does this do to the db? Doesn't it cache rows not yet
fetched?
Con: Complexity for time-out
IV. Hold results in a temporary table
Algorithm
Client requests page 1
Execute the big query
For each row of results
insert the row into a temp table, keyed on session id
Maybe do this in a stored proc?
Page through the temp table using I, II or III.
Pro/Con
Pro: The big query is only done once
Con: Initial hit time - adds a lot of inserts
Con: Need to clean up that temp table some time
Variation
Select directly into a temp table if your db allows it

No comments: