Big Data/Analytics Zone is brought to you in partnership with:

Matt is a DZone employee and has posted 4 posts at DZone. View Full User Profile

O'Reilly Strata Conference: Interview with Rich Dill at SnapLogic

03.15.2013
| 2895 views |
  • submit to reddit

DZone: Tell us a bit about your solution.

Rich Dill: We are in the integration space.  We’re not a repository, we’re a transport platform that allows customers to use 100% RESTful technology to access data, wherever it is, to put it wherever it needs to be.  It’s a hub and spoke model, so you deploy a SnapLogic server behind a firewall on bare metal, behind a firewall in a private cloud, or up in Amazon.  It’s horizontally scalable, so you can start with one server and then go to two, three, four, twelve, however many are necessary. 

It’s all self-contained, the development platform is browser-based, so install it where it needs to be and connect to it with a browser (my favorite is Chrome).  You go in there, log in, and then you’re presented with a visual development environment. 

The unique things about this technology are that we have what we call intelligent endpoints, or “snaps.”  The thing about the snap is that it is a connection that provides introspection against the application or the target that you’re interested in talking to.  So what happens is you build a snap whether its against DB2 or Oracle or SAP or Salesforce.com or Birst, and the snap is built once and then becomes a reusable component. 

So you can connect to Salesforce.com using your Salesforce credentials and you would then be presented with a list of all of those objects you have access to with your username, password.  You can say: “okay, I’m interested in the ‘accounts’ object, the ‘region’ objects, and the ‘opportunities’ object.”  You click on that, and what happens behind the scenes is that we build the components that do the crud.  All the crud components are built by the snap, and what happens for the developer or the user is they say “I want to create a new pipeline that’s going to create a new account,” so you drag the “create” component under the pipeline and you’ve now, instead of writing the code to create that account, its now available to you. 

So let’s say you have it in a CSV file, you drag a CSV component on there, you give it a filename, map the two together, and you have an integration that says “read the data in this file and put it into Salesforce.com” and you haven’t written any code at all.  You had some basic knowledge about what data you were interested in and that’s it.  That level of abstraction allows a high degree of approachability that will allow a developer to focus on what they want to do, and not worry about having to know Python, Java, or the nuances of the API.  We enable businesses to take advantage of integration technologies without going through long, painful, and expensive learning curves.

DZ: Are you announcing any new solutions at Strata?

RD: As a matter of fact if you go to our website, there’s the BDaaS button, or “badass button” for “Big Data as a Service.”  What we’ve just completed is what I would call a fairly complete stack of functionality against Hadoop.  We’ve had a HTFS snap that allows you to read and write directly to the file system if that’s the way you want to use it.  We have a Hive snap that allows you to use Hive against Hadoop, but what we just completed and are launching is an HBase reader and writer, so you now have the ability to talk to HBase. 

On the landing page under the press announcement of BDaaS is a 15-minute video that I built yesterday that allows you to see how we connect to Twitter, collect seven days worth of tweets with the hashtag #Strata in it and combine that with a user profile to come up with the total number of influences by individuals based the number of times they tweet versus the number of followers they have.  There are three pipelines.  One connects to Twitter, collects and aggregates the tweets and drops them into HBase.  The second extracts it out of HBase to a CSV file, which is then uploaded in a third pipeline to Birst or can be consumed by R for visualization.

DZ: DZone is primarily a developer community.  What are the biggest reasons to use Snaplogic’s BDaaS for smaller developer teams?

RD: We allow developer teams to be more responsive to customer needs.  The fact that you can develop rapidly means that you can be much more responsive and nimble (because we know “agile” is a reserved word these days).  You can build an integration in minutes to hours, as opposed to hours to days.  The developer productivity is the big advantage.  It’s a high-level integration tool that allows developers to be much more productive because let’s face it: not all developers are fluent in javascript, ruby, python, firewall configurations, DNS and all of those portions of the network stack that need to be set up so I can reach out over the cloud to work with data in a secure manner. 

We know Big Data as a term has been growing in popularity in the tech industry, as have frameworks such as Hadoop.  How do you see Big Data progressing as a topic over the next few months or the next year?

RD: What people are doing with Big Data now is to create one view of the customer.  In the master data management space you talk about the “golden record,” the single copy of the data that rationalizes and has a standardized view of the user regardless of where they are in the system.  Master data management is about having a consistent view of the customer record.  Big Data adds additional data on top of that consistent view to create a much more intimate picture of the customer.  They’re now pulling tweets, LinkedIn profiles, and Facebook postings to create a bigger customer about who their customer is.  Are they happy or sad?  Are they profitable or not? 

Years ago, you had structured data warehouses and search engines.  You had vague search indices with a lot of plumbing and the two were very far apart.  Now the warehouses and search engines are coming together.  We’re seeing that with machine data.  You saw in the supply chain space at companies like Wal Mart, who are using demographic analysis to make sure their clothing in all their locations will sell, depending on the differences in each market.  Now they’re looking at how people are buying and posting.  Pinterest is of big interest.  Marketing companies are looking at what people are posting on Pinterest to see what’s resonating with customers.  You spend a lot of money talking about your business, your product, and your message, but what do customers hear?  You see them post on social media outlets, and that’s the real gold.

The future is combining transactional data with that unstructured data, putting it all together and mining it for trend analysis.  What does each piece of data mean?  It’s a lot like the old demographic analysis.  McDonalds would do a lot of demographic analysis to determine where you open new locations, and that was done with a skeleton of data compared to what we have today.  Big Data in general is giving us a deeper, broader, and more complete picture of opportunities, problems, and markets.  All of this information is being put together, and these analyses are going to be more interesting.  They’re going to find things no one discovered or even thought of.  They’re just going to emerge because we’re starting to look at them.