Big Data/Analytics Zone is brought to you in partnership with:

Istvan Szegedi is an IT Technical Architect at Vodafone UK. He has been working at Hewlett-Packard, Nokia Networks, Google, Morgan Stanley and Vodafone. He holds certificates such as Sun Certified System Administrator, Sun Certified Java Programmer, Sun Certified Web Component Developer, Salesforce.com Certified Force.com Developer, TOGAF Certified Enterprise Architect. As a big fan of mobile and cloud computing, he likes to believe that these technologies will eventually push aside the desktop/client-server architecture Istvan is a DZone MVB and is not an employee of DZone and has posted 38 posts at DZone. You can read more from them at their website. View Full User Profile

Using AWS Elastic MapReduce Results with Mobile BI Analytics

07.16.2012
| 4670 views |
  • submit to reddit

So far we covered server-side/cloud components – how to process data with  MapReduce running in the cloud or on our own Hadoop cluster. This time it is about client-side.

If you have a look at Mary Meeker’s latest brilliant presentation about the Internet trends, one of the key messages is the significant increase in mobile 3G subscriptions and the mind-boggling sales figures for tablets (read: iPad) and smartphones (read: iPhone and Android):

Internet goes mobile and the applications follow the trend – that can be seen in mobile business intelligence, too that has shown a significant momentum recently. People are on the move with mobile devices that have similar performance as a notebook a few years ago, see geekbench results in here. It is time to use this power at hand for business intelligence, too. The tools are already out there to analyse big data and then publish results to mobile devices.

Amazon Elastic MapReduce

In the March post we covered Amazon Elastic MapReduce. Having talked about the mobile internet subscriptions and the enourmous growth in that area, this time we will analyse mobile subscriptions data from Worldbank. This data is about subscriptions to a public mobile telephone service using cellular technology, postpaid and prepaid subscriptions included.

To create an AWS Elastic MapReduce job requires 3 steps: upload input data to an S3 bucket/folder, run an EMR job (e.g. Hive, Pig, custom java), and download the output from an S3 folder.

The S3 storage looks like this for our test :there is a mobilesubscriptions bucket, then there are two folders: one for hive-scripts and one for mobilesubs data (folder). In the mobilesubs folder there is an input folder where we upload the mobile_subscriptions.csv file. The output will be created under s3://mobilesubscriptions/mobilesubs/output folder in csv format.

Its format is like:

Country Name,Country Code,2010
Afghanistan,AFG,37.80718336
Albania,ALB,141.8972543
Algeria,DZA,92.42180275
American Samoa,ASM,
Andorra,AND,77.17642345
Angola,AGO,46.68902631....

(2010 is the last year where we had data)

The hive script the we use for data processing is – this will show the top 100 countries with the highest number of subscriptions:

CREATE EXTERNAL TABLE mobilesubs (
    country_name STRING, country_code STRING, subscriptions FLOAT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://mobilesubscriptions/mobilesubs/input/';

CREATE TABLE top100_mobilesubs (
    country_name STRING, country_code STRING, subscriptions FLOAT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

INSERT OVERWRITE TABLE top100_mobilesubs
SELECT country_name, country_code, subscriptions
FROM  mobilesubs
ORDER BY subscriptions DESC
LIMIT 100;

INSERT OVERWRITE DIRECTORY 's3://mobilesubscriptions/mobilesubs/output/'
SELECT * from top100_mobilesubs;

The job that will process the data using AWS EMR is configured as follows:

Once we run the job, it will create a 000000_0 file under s3://mobilesubscriptions/mobilesubs/output directory.

This ouput files needs to be downloaded and processed to replaces the SOH characters with comma (,), in order to be able to publish it with Roambi Mobile BI analytics. This can be done by any text processing tool (e.g. notepad++)

Roambi Analytics

Roambi Analytics has a cloud based publishing services and a mobile BI visualizer tool available for iPad and iPhones. The application can be installed on the mobile devices from Apple AppStore for free.

The Roambi publisher has 3 versions: Roambi Lite that is free and has limited functionality (support for csv, excel and html format), Roambi Pro (with additional Google docs and salesforce.com support) and Roambi Enterprise (with Oracle, SAP BusinessObjects, SAS, Microsoft, IBM Cognos, etc support).

This demo is based on Roambi Lite. First you need to create an account or login using Google Account (OpenID) at https://secure.roambi.com:

Then click on Publish:

Select the approriate view (e.g. CataList) and import data (this will be the mobilesubs_result.csv that we downloaded from AWS EMR s3://mobilesubscriptions/mobilesubs/output folder and prepared for Roambi Analytics as described above.

You can refine the data if you wish and then publish it:

The file will be pushed to the mobile devices (iPad or iPhone). In case of Roambi Lite e.g.  you can push it to your own device.

Roambi Analytics Visualizer

On the handset you can retrieve the result using Roambi Analytics Visualiser. You can create an email or screenshot from the report, you can add it to favorites, etc.

iPhone screenhots:

iPad screenshot:

Email sent from Roambi Analytics Visualizer:

As you can see, mobile BI and BigData in the cloud can free users from being a desktop slave: no need for datacenter infrastructure and no need for traditional desktop – just the joy of mobility spiced with the power of cloud computing.

 

 

 

Published at DZone with permission of Istvan Szegedi, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)