Oct 19, 2009

Defining batch size for batch fetching in Hibernate

One of the Hibernate performance tuning ways when you need to work with parent/children relationships is to use lazy loading collections with batch fetching. This way allows you to perform far less than n+1 SQL queries to initialize your entities. To enable this mode you just need to specify batch-size attribute in XML mapping for collection or mark it with @BatchSize annotation in Java code. But how to define appropriate batch size? You need to understand how it works internally to do it well.

From examples you may see that if there are 25 objects in the database and batch size is set to 10 then Hibernate will perform 3 SQL queries: 10, 10, 5 items. But it is not so simple in the case of larger batch size. Hibernate internally creates an array of batch sizes using following strategy: if batch size <= 10 (for example 5) then it fills array with numbers from 1 to batch size ([1,2,3,4,5] for 5), else (for example 50) it fills array with numbers from 1 to 10 and integer parts of division batch size by powers of 2 ([1,2,3,4,5,6,7,8,9,10,12,25,50] for 50). So, now lets see how many SQL queries will Hibernate perform in case when batch size is 50 and there are 38 records in the database. The answer is 3: 25, 12, 1. Are you surprised? Hibernate doesn't create too many JDBC prepared statements for batch fetching so it performs querying using array of batch sizes. So, when you know the truth how to define the best batch size for your application? The answer is simple and relies on the mathematical theory: you should use powers of 2 multiplied by 10 (for example 10, 20, 40, 80, etc.) because each positive integer may be represented as a sum of powers of 2 (for example 13 = 8 + 4 + 1). If you select 40 instead of 50 in the previous example you will see benefit for example when number of records is 23: 2 SQL queries (20, 3) instead of 3 SQL queries (12, 10, 1). Of course if you know that a number of records in the database will always be small enough then use smallest batch size from recommended - 10. If you don't know how many records will be in the database then switch on logs for SQL queries and analyze how many of them are performed to define batch size from the recommended formula. I hope this will help you make Hibernate more productive. Develop with pleasure!

Oct 18, 2009

Ideal project video

Recently I have viewed video presentation from the InfoQ about one team experience report. This team uses a lot of great engineering and collaboration practices, always experiments and analyzes results of their work. Its really only one right way to build products quickly, within the budget and with the highest level of quality. They always communicate with customers and gather feedback from them to build product according to the real needs of the market. Try this way and you will success!

Oct 14, 2009

XP injection at ITjam

In September Kiev became a little more Agile because of the largest Agile conference in the Eastern Europe. On Agilee I have presented "People factor as failure reason of Agile adoption". This presentation is about people and requirements for them from Agile world, how everybody may improve his skills and become Agile team member.

The next conference I'm going to participate in is ITjam. I will present "Kanban VS Scrum" and "XP injection" with my friend Aleksey Solntsev. From first presentation your will learn more about Kanban, its principals and compare it to Scrum. In the second presentation we are going to show on the real project how to inject XP engineering practices like unit testing, TDD, CI, automated build, code review, etc. You will find how to simplify your daily work and start producing better products. Registration is free, so you are welcome!

Oct 13, 2009

Agile Coaching site launched

Some weeks ago I have launched Agile Coaching web site. It contains a lot of trainings available for ordering and participating, presentations from different conferences and video materials. I'm going to continuously fill this site with new useful information. I hope that you find something interesting for you there. Develop with pleasure!

Manage dependent projects in Maven

If you have been used Maven for a long time you probably know that it supports both project inheritance and aggregation. There is detailed description with samples on Maven site how both of them work. But sometimes you have really separated projects one of them depends on another, but both of them are continuously in development phase. So you need to use latest build artifacts of the base project without wasting time to run all its tests. There are some ways to resolve this issue. Lets say that we have project B witch depends on project A. Both of them contain many modules and base pom.xml files to build their own hierarchy.

So the first natural way to resolve the issue is to build projects in the right order manually (A and then B). But if project A was not changed from the last time then you will just waste your build time. Also in this case you should use some commands instead of one (it may be problematic for your CI server). Maven has release plugin that may help you. You just need to setup internal (company or project level) Maven repository and configure releasing policy for your project build. There are a lot of players on the market of Maven repositories: Archiva, Nexus, Proximity, Artifactory, etc. All of them supports local releasing of project artifacts. When setting releasing policy for the project you should decide if you will support only stable versions or snapshots as well. In case of snapshots Maven may generate build time version of artifacts after each release build. After all parts are configured you may use release plugin commands to publish project artifacts to the internal repository and make them available for all local repositories. So you need to build project A only when changes are made (for example on each commit to VCS). When you build project B Maven will automatically check for new versions of the project A artifacts and download them to the local repository if changes was found.

But sometimes your project contains modules that are not written in Java and are very platform dependent (for example install C++ library code on each build). In this case the previous solution can't be applied. You need to change main pom.xml of the project B to add information about dependency on project A, but you want to apply it only if project A sources are available on your machine. Use following profile for this purpose:

<profile>
<id>Build A project</id>

<modules>
<module>${a.relative.path}</module>
</modules>

<activation>
<property>
<name>a.relative.path</name>
</property>
</activation>
</profile>

This profile will be activated only if system property a.relative.path is passed. Note that value of this property should be related to root of project B (for example ../../a). Maven will build project A as part of each project B build. But how to avoid running tests in the project B? We use aggregation so no settings are inherited from the project B pom.xml except system properties. So, lets use them. Add following profile to the project A base pom.xml:

<profile>
<id>Disable tests for external build</id>
<properties>
<maven.test.skip>true</maven.test.skip>
</properties>
<activation>
<property>
<name>a.test.skip</name>
</property>
</activation>
</profile>

This profile will be activated when a.test.skip system variable is present and will disable all tests execution like -Dmaven.test.skip does. If you need additional settings for quick build of project A set them using system properties as well. Now you may build both projects with one Maven command: mvn -Da.relative.path=../../a -Da.tesk.skip clean install. If you don't have access to the project A sources then rely on the current version of the artifacts from local repository and build phase of the project A will be skipped.
Maven is very powerful build tool and even complex scenarios may be implemented using it. Develop with pleasure.