Oct 19, 2009

Defining batch size for batch fetching in Hibernate

One of the Hibernate performance tuning ways when you need to work with parent/children relationships is to use lazy loading collections with batch fetching. This way allows you to perform far less than n+1 SQL queries to initialize your entities. To enable this mode you just need to specify batch-size attribute in XML mapping for collection or mark it with @BatchSize annotation in Java code. But how to define appropriate batch size? You need to understand how it works internally to do it well.

From examples you may see that if there are 25 objects in the database and batch size is set to 10 then Hibernate will perform 3 SQL queries: 10, 10, 5 items. But it is not so simple in the case of larger batch size. Hibernate internally creates an array of batch sizes using following strategy: if batch size <= 10 (for example 5) then it fills array with numbers from 1 to batch size ([1,2,3,4,5] for 5), else (for example 50) it fills array with numbers from 1 to 10 and integer parts of division batch size by powers of 2 ([1,2,3,4,5,6,7,8,9,10,12,25,50] for 50). So, now lets see how many SQL queries will Hibernate perform in case when batch size is 50 and there are 38 records in the database. The answer is 3: 25, 12, 1. Are you surprised? Hibernate doesn't create too many JDBC prepared statements for batch fetching so it performs querying using array of batch sizes. So, when you know the truth how to define the best batch size for your application? The answer is simple and relies on the mathematical theory: you should use powers of 2 multiplied by 10 (for example 10, 20, 40, 80, etc.) because each positive integer may be represented as a sum of powers of 2 (for example 13 = 8 + 4 + 1). If you select 40 instead of 50 in the previous example you will see benefit for example when number of records is 23: 2 SQL queries (20, 3) instead of 3 SQL queries (12, 10, 1). Of course if you know that a number of records in the database will always be small enough then use smallest batch size from recommended - 10. If you don't know how many records will be in the database then switch on logs for SQL queries and analyze how many of them are performed to define batch size from the recommended formula. I hope this will help you make Hibernate more productive. Develop with pleasure!

No comments: