Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!
NoSQL Zone is brought to you in partnership with:

I am a Webscience PhD student at the university of Koblenz and the Founder of http://www.metalcon.de Social news streams are my research interest. René is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Cascalog for Graph Processing

02.07.2012
Email
Views: 1866
  • submit to reddit
This article is part of the DZone NoSQL Resource Portal, which is brought to you in collaboration with Neo Technology and DataStax. Visit the NoSQL Resource Portal for additional tutorials, videos, opinions, and other resources on this topic.

Nils Grunwald works at the french startup Linkefluence. Their product is more or less social network analysis and graph processing. They crawl the web and blogs or get other social network data and provide solutions with statistics and insights for their customers. 

In this scenario obviously big data is involved and the data carries a natural structure of a graph. He said a system to process the data has the following constrains:

  • The processing should not compromise the rest of the system
  • Low maintenance costs
  • Used for queries and rapid prototyping (so they want a “general” graph processing solution as customer needs changes)
  • Flexible, hard to tell which field or metadata will be used beforehand.

He afterwards introduces their solution Cascalog based on Hadoop and is also inspired by cascading a workflow managment system and datalog a subset of prolog which as a declarative, expressive language is very concise way of writing queries and enable quick prototyping

For me personally it is not a very interesting solution since it is not able to answer queries in realtime which of course is obvious if you consider the technologies it is based on. But I quess for people that have time and just do analysis this solution will properly work pretty well!

What I really liked about his the solution is that after processing the graph you can export the data to Gephi or to Neo4j to have fast query processing. 

Hey then explained alot specific details about the syntax of cascalog:



Source:  www.rene-pickhardt.de/nils-grunwald-from-linkfluence-talks-at-fosdem-about-cascalog-for-graph-processing/

Published at DZone with permission of René Pickhardt, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Neo Technology and DataStax are leading the charge for the NoSQL movement.  You can learn more about the Neo4j Graph Database in the project discussion forums and try out the new Spring Data Neo4j, which enables POJO-based development.  You can also see how Apache Cassandra, a ColumnFamily data store, is pushing the boundaries of persistence with cloud capabilities and deployments at SocialFlow and Netflix.