Big Data/Analytics Zone is brought to you in partnership with:

I am the founder and CEO of Data Geekery GmbH, located in Zurich, Switzerland. With our company, we have been selling database products and services around Java and SQL since 2013. Ever since my Master's studies at EPFL in 2006, I have been fascinated by the interaction of Java and SQL. Most of this experience I have obtained in the Swiss E-Banking field through various variants (JDBC, Hibernate, mostly with Oracle). I am happy to share this knowledge at various conferences, JUGs, in-house presentations and on our blog. Lukas is a DZone MVB and is not an employee of DZone and has posted 249 posts at DZone. You can read more from them at their website. View Full User Profile

Hibernate, and Querying 50k Records. Too Much?

03.11.2013
| 17943 views |
  • submit to reddit

An interesting read, showing that Hibernate can quickly get to its limits when it comes to querying 50k records – a relatively small number of records for a sophisticated database:

http://koenserneels.blogspot.ch/2013/03/bulk-fetching-with-hibernate.html

Of course, Hibernate can generally deal with such situations, but you have to start tuning Hibernate, digging into its more advanced features. Makes one think whether second-level caching, flushing, evicting, and all that stuff should really be the default behaviour…?

Published at DZone with permission of Lukas Eder, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Barry Smith replied on Mon, 2013/03/11 - 11:39am

Realistically, dealing with large data sets is probably not the best use of Hibernate. Hibernate is great for letting you write business logic with mostly transparent persistence - that's its sweet spot.

Batch processing and ad-hoc querying type operations are best handled closer to the database.

Wal Rus replied on Mon, 2013/03/11 - 12:51pm

I don't know if Hibernate has a sweet spot. There isn't a spot for it in my books. 

Mika Koivisto replied on Tue, 2013/03/12 - 4:30pm

If you are fetching 50k results in a single query you are probably doing something wrong. 

Laabidi Raissi replied on Wed, 2013/03/13 - 1:28pm

I worked in a project where I was extracting data from XML and CSV files and save it in database.

There was multiple kinds of files to deal with, and many table relationships to check on in every operation.

One of the files, had over 400k entries to be saved into DB. When using default Hibernate settings, we got OutOfMemory near 12k entries inserted/updated.

Once we used the batch-size config variable (50) and session.fluch(), we could treat the entire file (400k) in less than 15 minutes.

Lukas Eder replied on Wed, 2013/03/13 - 1:43pm in response to: Laabidi Raissi

400k entries? I'll batch load you that in less than 10 seconds ;-)

Laabidi Raissi replied on Wed, 2013/03/13 - 2:54pm in response to: Lukas Eder

Well, that would be very simple if using any ETL or batch frameworks. But, in that project, there were many constraints on technologies that can be used

Oleksandr Alesinskyy replied on Thu, 2013/03/14 - 7:00am in response to: Laabidi Raissi

With the good old plain JDBC a load time for those records would be well below one minute.

No reasons to use Hibernate for in batch-processing cases. It is not designed for it. while it shines (if used properly) in the OLTP processing.

Robert Brown replied on Thu, 2013/03/14 - 9:11am

Agree 100% with Oleksandr, use the right tool for the right job, not all tools are useful in all situations.  I've found that anyone who says that using straight JDBC all the time is probably wrong too.  Hand coding JDBC calls to perform business object loads and saves is probably going to require a ton of boilerplate copy/paste code where introducing things like caching and auditing becomes an inline "have to remember" type of situation instead of something that doesn't get in the way of the main ideas of the program.  

Writing reports or exports, use JDBC and dump it to the output as fast as possible, no Business Objects required.  Doing OLTP with business logic, use an ORM solution to abstract away the things that don't matter to the Business Logic (like how/where to retrieve the data).  

I've always told my developers, "There's a way we do things, and you should always use the tools that we have decided on, and the techniques we lay out in our policy, UNTIL there's a good reason not to.  And there's nothing wrong with having a good reason not to."

My 2 cents

Bob

Oleksandr Alesinskyy replied on Thu, 2013/03/14 - 10:07am in response to: Robert Brown

One more remark - it is quite possible (and easy) to mix and match both  (Hibernate and plain JDBC) in the same application.

Mark Unknown replied on Thu, 2013/03/14 - 10:54pm

Bypassing layers is bad. They are there for a reason.  Yes, batch with ORMs can be slow. The questions that need to be asked are:

How often does this need to happen?

Is it fast enough?

Can you eliminate the batch load?


Remember maintenance is the largest part of what we do. Hand coding the data layer increases this.

Mark Unknown replied on Thu, 2013/03/14 - 11:06pm in response to: Robert Brown

I use Hibernate with reports. Works well. Mostly because I can use HQL which makes it easier to code queries and much more readable. YMMV.  Remember, reports are just another UI.

Vine Rustia replied on Fri, 2013/03/15 - 7:04am

 Hibernate doesn't really work well with the true extensibility required by reporting modules. Hibernate depends on association mappings and even you use HQL, you can't associate entities that doesn't have hibernate mapping. That was my company's issue with hibernate when have to integrate our system to a reporting tool, so we resort to some other framework that allows to manipulate sql code in a object oriented way without dependency in any association mappings. Take note, it was much faster than hibernate doing the same queries. So lessons learned, there is no silver bullet in all problems, use the right tool for the right job.

Oleksandr Alesinskyy replied on Fri, 2013/03/15 - 9:09am in response to: Mark Unknown

 Mostly, yes - but it is not bypassing layers,it using different technologis on the same layer (DAO layer). Hibernate by itself does not constitue a layer in your application.

Vine Rustia replied on Fri, 2013/03/15 - 10:59pm in response to: Oleksandr Alesinskyy

 Exactly!

James Peckham replied on Wed, 2013/04/03 - 10:50am

...and why would you ever query more than i guess 100 records with hibernate? no user would ever view that many in one screen. Batch work should still be done in a SQL language if possible.

Jonck Van Der Kogel replied on Fri, 2013/04/05 - 2:04am

At my company we also had the challenge of batch processing and the problems you get with Hibernate when the number of objects in your session grows. We wanted to keep the benefits of Entities but wanted to get rid of (most of) the Hibernate overhead. We found a pretty good solution in the Stateless Session approach. Some code had to be written to facilitate this, we donated it to the community. Our code can be found here:

https://jira.springsource.org/browse/SPR-2495?focusedCommentId=87070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-87070

Seb Cha replied on Mon, 2013/04/29 - 7:15am

 For batch import/export, i used "db2 load" or "pgsql copy". Cant be faster !

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.