Tharindu holds a first class honors degree in computer science and engineering from the University of Moratuwa, Sri Lanka. He also received a professional postgraduate diploma in marketing from the CIM, UK, where he is an associate member. Tharindu currently works at WSO2. He is a Associate Tech Lead and a member of the data technologies management committee, focusing on big data, analytics, and business activity monitoring (BAM). Tharindu is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Do You Trust Google Big Query with Your Big Data?

06.24.2012
| 2406 views |
  • submit to reddit

Google has come up with a fantastic service to analyze large amounts of data. It’s called BigQuery and it allows you to run analysis on big data on the cloud. As expected, the tool has a superb, intuitive web UI. The data analysis language uses SQL like queries. (Hive, anyone ;) ). Have a look at the  Big Query Tutorial, it looks pretty neat. So, now all you need to do to run queries is to upload your data to Google using the form shown below.

It allows you to upload a file or point to it using Google’s cloud storage.

Now, the interesting question here is that to analyze using BigQuery how much of that data are you willing to give Google? And how long will that take? The answer won’t be “Let me quickly upload a 500 GB file and run some queries”. That amount of data would definitely take some time to upload. So, effectively, this SaaS becomes pretty useless as more and more data volumes need to be uploaded for analysis.

Everyone trusts Google ( :) ), so this concern might be easily ignored. But a potential other problem I see is the “Privacy Policies” that are violated. Usually, when you want to analyze data, it can contain sensitive data such as user behavior patterns and so forth. How comfortable will your customers be if you hand that data over to Google? Even anonymizing this data might not save you from a potential legal breach.

I still believe setting up your own data analysis and monitoring platform is the best way to go. Thoughts? I’d love to hear them.

Published at DZone with permission of Tharindu Mathew, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)