Jan 31, 2010

Kill your application with Hibernate subselect!

If you want to kill your application silently the best way is to use Hibernate 'subselect' fetching strategy. Lets discuss the issue in details. Imagine that you have entities 'Post' and 'Comment' in the domain model with corresponding tables in the database:

public class Post {
    private Long id;
    private String title;
    private String body;
    private Set<Comment> comments;
    ...
}

public class Comment {
    private Long id;
    private String text;
    ...
}


Of course you use 'one to many' relationship to map relations between posts and comments (using 'post_id' column from comments table). It doesn't matter what mapping type you prefer (annotations of XML based). All seems good at this moment and you decide to choose 'subselect' fetching strategy for comments collection. You also create a number of unit tests to make sure that you mapping is well done and works fine. Now you want to create method to find last 10 posts for your home page. You decide to use Criteria API for this purpose:

List posts = session.createCriteria(Post.class)
     .addOrder(Order.desc("id"))
     .setMaxResults(10)
     .list();


You again create some unit tests to make sure that method works as expected. But of course you don't review SQL queries in Hibernate logs to check if all is as you expect to be. Hibernate generates following SQL queries for you (example from PostgreSQL database):

select post0_.id as id0_, 
post0_.title as title0_, 
post0_.body as body0_ 
from Post post0_ 
order by
post0_.id desc
limit 10;

select
comment0_.post_id as post5_1_,
comment0_.id as id1_0_,
comment0_.text as text1_0_
from Comment comment0_ 
where comment0_.post_id in (
   select post0_.id 
   from Post post0_
); 


Pay attention to the second query, especially to its subquery part. Limitation is not included in the subquery! What is going on here? It seems that Hibernate is going to load full comments table in memory and then select from them related to top posts selected before. Crazy behavior! Initially, when database is empty or there is low amount of comments, your application will work well. But every time somebody opens your home page with top blogs shown query loads all comments in memory, so you have performance penalty. This penalty is proportional to the number of comments in the database. Once (may be after some monthes of usage in production) you will find that all memory allocated to your application is filled and garbage collector eats 100% of CPU time. Thanks god for profilers, especially for VisualVM. It is hard to believe that such small issue in Hibernate may cause such dramatic effects. There is an opened issue in Hibernate bug tracker, but it has minor priority. We need to keep living with it, so another approach should be applied. The best way to avoid these issues is to use 'batch select' fetching strategy with lazy loading (or without it depending on application needs). Be careful and develop with pleasure!