NoSQL Zone is brought to you in partnership with:

Java Geek and managing director@comSysto GmbH in Munich ... Spring over JavaEE, Apache Wicket over JSF, Gradle over Maven, Lean over Waterfall, exploring JavaFX, Highcharts, Android, AgileUX, Lean Startup. Daniel is a DZone MVB and is not an employee of DZone and has posted 41 posts at DZone. You can read more from them at their website. View Full User Profile

Real-Time Twitter Heat Map with MongoDB

12.05.2012
| 5578 views |
  • submit to reddit

Over the last few weeks I got in touch with the fascinating field of data visualisation which offers great ways to play around with the perception of information.

In a more formal approach data visualisation denotes “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition

Nowadays there is a huge flood of information that hit’s us everyday. Enormous amounts of data collected from various sources are freely available on the internet. One of these data gargoyles is Twitter producing around 400 million (400 000 000!) tweets per day!

Tweets basically offer two “layers” of information. The obvious direct information within the text of the Tweet itself and also a second layer that is not directly perceived which is the Tweets’ metadata. In this case Twitter offers a large number of additional information like user data, retweet count, hashtags, etc. This metadata can be leveraged to experience data from Twitter in a lot of exciting new ways!

So as a little weekend project I have decided to build a small piece of software that generates real-time heat maps of certain keywords from Twitter data.

This is a first static peek on what it’s gonna look like (apparently the friendly floatees use Twitter, too):

See a screencast here: screencast.com

To get this working I have used lots of shiny things:

The app is written in Python and consists of mainly three components:

tstream.py:
A small service based on tweepy that implements a StreamListener which inserts incoming data in a MongoDB capped collection. Here you can also set filter terms. This example uses mostly terms related to “Big Data”.

tweet_service.py:
A Flask based web app which get’s new data from MongoDB and makes use of the publish-subscribe pattern. Being “tailable” MongoDB’s capped collections come in handy. There is no need to remember which messages a client has already received, the cursor itself yields new documents on arrival. Also, capped collections are of a fixed size which is appropriate for this use case but your mileage may vary. Incoming Tweets are published to a redis channel for which there is also a listener that returns a “text/event-stream” “Content-Type” header for connecting clients.

map.html:
A few lines of HTML and JavaScript which bring up a Google Maps canvas and a listener for server-sent events. When new data, basically consisting of Lat/Lon tuples, arrives the new point is added to a heat map overlay based on heatmap-gmaps.js in real-time.

Rough overview:

Conclusion

With a relatively small amount of code it is possible to turn text data in astonishing visualizations. With only a little more effort different hash tags could be illustrated by different colors. Or one could count tweets on a topic in certain regions and then compare activity based on number of citizens. There are lots of interpretations possibel through the underlying data set.

You can find the code on github, feel free to fork and play around!



Published at DZone with permission of Daniel Bartl, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)