Big Data/Analytics Zone is brought to you in partnership with:

Treasure Data's Big Data as-a-Service cloud platform enables data-driven businesses to focus their precious development resources on their applications, not on mundane, time-consuming integration and operational tasks. Our pre-built, multi-tenancy cloud platform is already in use by over 50 customers worldwide and is managing more than 200 billion rows of data and processing 130,000 jobs per day. Discover how Treasure Data can help you focus on your core business and benefit from the fastest time-to-answer service available. Sadayuki is a DZone MVB and is not an employee of DZone and has posted 27 posts at DZone. You can read more from them at their website. View Full User Profile

Why Data Warehousing as a Service?

  • submit to reddit

This guest post comes courtesy of our friends at Treasure Data.


First, let me introduce myself. My name is Kaz Ohta and I am Treasure Data’s CTO and co-founder. My expertise is in distributed and parallel computing; my passion is open source technology. I was instrumental in developing MessagePack and Fluentd, contributed to the Linux Kernel, KDE, Memcached, and MongoDB, currently curate open source components (e.g. Apache Hadoop) and founded the Japanese Hadoop User Group.

Having worked with complex open source technologies for years and experienced first-hand what companies have to go through in terms of time, expense and specialized IT resources to implement and maintain a big data analytics solution, I realized that big data analytics was really only available to companies with deep pockets and highly skilled staff. For example, an on-premise Hadoop solution can take a company anywhere from 60 to 160 DevOps days to implement. With top Hadoop consultants charging $1,500 per day, that means implementation costs of more than $100,000.

My vision is to provide a service-based big data solution that eliminates these cost and complexity barriers. The Treasure Data Cloud Data Warehouse service offers an affordable, quick-to-implement and easy-to-use big data solution that does not require specialized IT resources, making big data analytics available to the mass market.

Here’s how we’ve done it. We leverage Hadoop and other open source technologies to keep our costs low and pass the cost savings on to our customers; and we’ve added our own innovative technology to address three critical Hadoop bottlenecks:

• Data-Load. We provide two tools to make data-load faster and easier. For initial data-load, we provide a bulk data-loader that can import any amount of data into our Cloud Data Warehouse. For streaming data collection and load, we provide td-agent – a lightweight data-collection daemon based on our highly successful Fluentd product. This provides both batch and continuous data feeds and supports standard JSON format transformation for structured, semi-structured and unstructured data types.

• Columnar Data Processing. We have replaced HDFS - which still has difficulties in data management and SPOF issue – with our own columnar database. This enables us to process massive volumes of data much more quickly making near real-time analysis a reality for Hadoop users. We also use our MessagePack technology and various compression algorithms to achieve 5-10x data storage efficiencies.

• Faster and Easier Querying. On the back-end, we provide an SQL-like query language that distributes and runs your query in parallel. There is no need to learn a complex domain specific language. We also provide a JDBC driver, which allows you to use your preferred BI / Visualization tools (e.g. Jaspersoft, Indicee, Metric Insights); and we will offer an ODBC driver soon which will enable integration with Excel, Tableau, etc.

The Treasure Data service is hosted on Amazon S3 so we can offer a scalable, reliable and secure infrastructure production environment without the need to pull IT staff from other projects. Processing, storage and network resources are completely elastic and can be scaled up or down as requirements dictate - for example one of our customers scaled from zero to 50 billion rows in 3 months. Our service is managed 24x7 by an expert operations staff and this eliminates the overhead costs associated with resourcing and managing an on-premise environment.

Treasure Data officially launches on Thursday, September 27th, in San Francisco. If you want to try our service for yourself you can sign up for free.


Published at DZone with permission of Sadayuki Furuhashi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)