Cloud Zone is brought to you in partnership with:

Gary Sieling is a software developer interested in dev-ops, database technologies, and machine learning. He has a computer science degree from the Rochester Institute of Technology. He has worked on many products in the legal and regulatory industries, having worked on and supported several data warehousing applications. Gary is a DZone MVB and is not an employee of DZone and has posted 62 posts at DZone. You can read more from them at their website. View Full User Profile

Data Warehousing, NoSQL, and the Cloud

11.19.2012
| 5447 views |
  • submit to reddit

With the nascent advent of NoSql, cloud computing and slick new databases, we seem to have forgotten from whence we came. I went to a conference recently on the open source search product Solr/Lucene. One of the keynote speakers, Chief Data Scientist of HortonWorks, discussed what turned him to NoSQL databases, in this case, a failed project to track every click on walmart.com in Oracle.

For all it’s idiosyncrasies and irritations Oracle (the database) is an incredibly powerful and versatile product, a power most projects do not fully use. Hortonworks appears to be trying to follow the same path as Oracle (the company), from consulting company product to vast riches. Even though many projects do not tap full featureset or power of Oracle, it is still preferred in some companies for the supposed safety of an expensive support contract. This is probably true of SQL Server as well, but I’m less experienced in that area. In the same way, I doubt few who use NoSQL solutions fully realize the power or place to use the tools available.

I think it’s worth looking at how we got here. Oracle has traditionally sells single purpose, high powered and very expensive machines, a poor fit for a scrappy web startup. The variety of configuration and installation options is overwhelming to the point that they sell pre-built HP boxes with Oracle and the OS configured for you.

Through a maze of acquisitions Oracle likely owns a company that can meet any need, if only you can figure out where to look on their website. When I last talked to their sales reps for a non database product, their preferred pricing model was revenue sharing, which to me sounds like a terrible proposal, unless you own a company that exists to lose money.

If you pay enough, Oracle will assign someone to fix your problems. When I last worked on a data warehouse, I typically found a database defect every other week, some with patches available and some without. We were running a “small” data warehousing system, recording a couple hundred million records.
Other companies in the city were well known to have larger databases,for various purposes. Had I wished for a different rendition of this project, I could well have moved to a payroll company, a grocery chain, or a computer manufacturer.

The challenge of building such a system is not unique to Oracle. Tuning queries on Postgres, in my experience, typically results in two orders of magnitude performance improvement, vs. one in Oracle. This appears to be Postgres lacking numerous micro-optimizations, while generally a solid, cleanly-designed product.

Published at DZone with permission of Gary Sieling, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)