Thursday, April 26, 2007

Pagination strategies

Pagination

How do I limit the number of rows displayed in the results page?

Often a database query will yield hundreds or thousands of rows of data. Trying to display all of them in one webpage could be very slow and resource intensive. Worse, it results in a poor user experience for the person who actually has to scroll through all of this data.

Well-designed websites will return a small subset of the data and provide links to move to the next or last set.

The easiest way to accomplish this is by constraining the number of rows returned from the database query in the first place.

* MySQL uses the 'LIMIT' keyword for this.
* PostgreSQL uses the 'LIMIT' and 'OFFSET' keywords for this.
* Oracle uses the 'ROWNUM' keyword. It doesn't work like LIMIT and OFFSET. You will probably find it strange.
* DB2 uses the 'ROWNUMBER' 'OVER' combination.

A typical query in MySQL would look something like:

SELECT * FROM my_table LIMIT 10, 50

This will yield rows 51 through 60.
Another way to limit the number of records returned is to use the

java.sql.Statement.setMaxRows(int)

Because this is part of the JDBC API, it works for all servers.

It should be noted that with both approaches it is undefined which rows are returned. If you want particular rows returned, you should add an "order ..." clause so that the order of rows is defined.
Javaranch has a forum dedicated to JDBC/SQL issues where you can go to read more about formulating SQL statements: JDBC forum
Here are some notes from other conversations on pagination:

I. Repeat the query for each new request.
Algorithm
Client requests page 1
Execute the query
Return rows for page 1
Client requests page 2
Execute the query
Skip rows for page 1
Return rows for page 2
Variation
Most RDBMS vendors support returning a specified range of row numbers.
Efficiency varies. May not help much when combined with "order by"
Pro/Con
Pro: Stateless. Nothing is cached between requests.
Pro: Simple. Every request is the same.
Pro: Up to date with any inserts or deletes in the database.
Con: Database and time intensive. Might be repeating large,
expensive queries - if users really do ever page fwd.
Con: Not guaranteed consistent. If other processes are inserting and
deleting rows, paging fwd or bkwd might skip rows. Fwd then Bkwd
might not give the same rows.
Notes We have used this with mainframe requests where even the nastiest
queries were fast and the server had no options for storing state.
We have used row number schemes for client requests, and the highest
key on a page for page forward only action.
II. Hold the query results on the server
Algorithm
Client requests page 1
Execute query
Cache complete result set or just PKeys
Return rows for page 1
Client requests page 2
Get rows from cache or get PKeys from cache, rows from database
Return rows for page 2
Pros/Cons
Pro: Does NOT repeat the query
Pro: Self consistent - fwd then bkwd will give identical results
Pro: Can share cached results if two users have identical requests
Con: Big cache in memory or someplace close to it
Con: Complexity for cache, time-out
III. Hold a scrolling cursor on the server
Algorithm
Client requests page 1
Execute query with a cursor
Fetch & return rows for page 1
Client requests page 2
Fetch & return rows for page 2
Pro/Con
Pro: Does not repeat the query
Pro: Very small state - just a connection & cursor
Pro: Self consistent
Con: More open connections. Might hold open one db connection
per user
Con: What does this do to the db? Doesn't it cache rows not yet
fetched?
Con: Complexity for time-out
IV. Hold results in a temporary table
Algorithm
Client requests page 1
Execute the big query
For each row of results
insert the row into a temp table, keyed on session id
Maybe do this in a stored proc?
Page through the temp table using I, II or III.
Pro/Con
Pro: The big query is only done once
Con: Initial hit time - adds a lot of inserts
Con: Need to clean up that temp table some time
Variation
Select directly into a temp table if your db allows it

Optimistic and pessimistic locking

Pessimistic locking assumes that two or more users will often attempt to update the same record at the same time or is used when lost changes are not acceptable. Ususally a lock column is added to the row. A timestamp works well for reasons I will explain below.

Optimistic locking assumes that two or more users will rarely try to update the same record at the same time. The basic implementation is to store the state of the row in memory and on update, use that sate to check whether it was updated since it was read. If it has changed, the users updates are invalid and cannot be persisted. One way to do this is to set a last modified field (best when implemented by the DBMS.)

The main disadvantage to pessimistic locking is that even if a user never updates a record, having it locked prevents other users from updating it. If a user's system crashes or a record is otherwise left locked indefinitely, you have to provide a way to unlock the record. This feature can be abused by users so you may have to restrict access to it. Using a timestamp for the pessimistic lock helps users determine if the record is really being used by another user e.g. if the record has been locked for a week, it's probably safe to unlock the record. One more thing, you need to make sure users cannot update if their lock was broken.

The main disadvantage to optimistic locking should be obvious. However, it does not have the problem listed above and is much simpler to implement.

Hibernate and Oracle pagination

Hibernate and Oracle pagination gotcha If you're a Hibenate user and are looking to support paging of results (for a web table or whatever), you've probably stumbled across this pagination class on the Hibernate blog : http://blog.hibernate.org/cgi-bin/blosxom.cgi/2004/08/14#pagination . For your convenience, I'll post their helper class here.

public class Page {
private List results;
private int pageSize;
private int page;

public Page(Query query, int page, int pageSize) {

this.page = page;
this.pageSize = pageSize;
results = query.setFirstResult(page * pageSize)
.setMaxResults(pageSize+1)
.list();
}

public boolean isNextPage() {
return results.size() > pageSize;
}

public boolean isPreviousPage() {
return page > 0;
}

public List getList() {
return isNextPage() ?
results.subList(0, pageSize-1) :
results;
}
}


public Page getPosts(int page) {
return new Page(
session.createQuery("from Posts p order by p.date desc")
page,
40
);
}

I don't want to be accused of plagiarism, so if anyone has a problem let me know and I'll gladly remove the above code from my blog.

Moving on, the author states that the basic requirement is to be able to page through data without fetching all the rows. While many applications would like to display the total number of rows in a table displaying paged data, this helper Page class intentionally does not issue a separate count query.

The Page class above looks nice and elegant where the number of rows retrieved is one more than the page size (See the constructor of the Page class). The idea behind retrieving the extra row being that it enables us to determine if we're at the last page of the paged dataset by comparing the resultset size with the user specified page size. So consider a query that would return 19 rows. If the user specifies a page size of 10, then the max results is set to 11 and when the user is on the first page, 11 records are retrieved and the test results.size() > pageSize returns 'true' indicating that we're not yet on the last page. When the user goes to the second page, only 9 rows are returned and the test results.size() > pageSize returns 'false' indicating we're at the last page.

Simple enough, so whats the problem you ask?

Hibernate genarates an SQL that uses inline views and the Oracle pseudo column ROWNUM to limit the number of rows retrieved. The problem using this technique is that first n rows retrieved by Oracle are not necessarily the same as the first n rows of a n+1 fetch. As a result the same records may be displayed on the first and second page.

I have a "Lane" class that has a many-to-one association with a "Location" class and I've mapped this to my Oracle schema using Hibernate.

In the screenshot below, the left window shows the results of the query generated by hibernate when fetching the first 10 rows (page one). Notice the order of the name column (which is a unique column) values. In the middle window you see the results of execution with max results = 11 (ie page_size + 1). Notice that the 11th row is not the last record so the results of returning the first 10 records using setMaxResults(10) and setMaxResults(11) are different. The rightmost window shows the results of the query used to display the second page (using Oracle dialect). You'll see that lane with name "17" also appears on the second page.


I don't believe this is an Oracle bug since it does not guarantee the return of rows in the same order in the absence of a sort criteria on a unique column. In this case there was an order by on a non unique field (state) so Oracle happens to return the rows in a different order probably due to the fact that the number of rows to be returned in each case was different. Had there been a secondary order by clause on a unique key as a tie breaker, (like lane_id), the ordering is the same. So one could always add the primary key field or unique field to the HQL to resolve this. For example :

"from Posts p order by p.date desc, p.id asc"

There's another issue with Hibernates getLimitString() string generated with the Oracle9Dialect. I am using Oracle 9. In the screenshot below, the queries do not fetch the extra record in each page. The left window shows the results of the first page. Note that the first page has a lane with name "9". The middle pane shows the results of the third page using sql generated by the OracleDialect. The results look fine. The right most pane shows the results of the query generated by the Oracle9Dialect. This should be the dialect that I should be using since I'm on Oracle 9, however you'll notice that lane with name "9" appears on page 3 as well.


So be careful when you chose an Oracle dialect! I had created a Hibernate issue HHH-1381 in Jan 2006 but haven't seen any activity. But the fact that is hasn't been rejected gives me some hope. Ofcouse I wouldn't be surprised if that issue mysteriously gets rejected later today. Joking :)

Difference between ClassNotFoundException and NoClassDefFoundError

As Java developers we've all seen java.lang.ClassNotFoundException and java.lang.NoClassDefFoundError at some point and spend hours trying to track issues with Classloader, CLASSPATH and missing classes.

A ClassNotFoundException is thrown when the reported class is not found by the ClassLoader. This typically means that the class is missing from the CLASSPATH. It could also mean that the class in question is trying to be loaded from another class which was loaded in a parent classloader and hence the class from the child classloader is not visible. This is sometimes the case when working in more complex environments like an App Server (WebSphere is infamous for such classloader issues).

People often tend to confuse java.lang.NoClassDefFoundError with java.lang.ClassNotFoundException however there's an important distinction. For example an exception (an error really since java.lang.NoClassDefFoundError is a subclass of java.lang.Error) like

java.lang.NoClassDefFoundError:
org/apache/activemq/ActiveMQConnectionFactory

does not mean that the ActiveMQConnectionFactory class is not in the CLASSPATH. Infact its quite the opposite. It means that the class ActiveMQConnectionFactory was found by the ClassLoader however when trying to load the class, it ran into an error reading the class definition. This typically happens when the class in question has static blocks or members which use a Class that's not found by the ClassLoader. So to find the culprit, view the source of the class in question (ActiveMQConnectionFactory in this case) and look for code using static blocks or static members. If you don't have access the the source, then simply decompile it using JAD.

On examining the code, say you find a line of code like below, make sure that the class SomeClass in in your CLASSPATH.

private static SomeClass foo = new SomeClass();

Tip : To find out which jar a class belongs to, you can use the web site jarFinder. This allows you to specify a class name using wildcards and it searches for the class in its database of jars. jarhoo allows you to do the same thing but its no longer free to use.

If you would like to locate the which jar a class belongs to in a local path, you can use a utility like jarscan. You just specify the class you'd like to locate and the root directory path where you'd like it to start searching for the class in jars and zip files.