Big Data With Twitter API : Twitter Tips – A Baker’s Dozen


I had conducted a Tutorial at OSCON-2012 – “The Art of Social Media Analysis with Twitter & Python”. Slides are at slideshare and the Python/MongoDB/Networkx programs are in GitHub. Next day I was fortunate to be interviewed by Mac Slocum –  Mac has a way of asking interesting questions. Thanks Mac

These are a series of blogs annotating the slides with notes, as required. Some things are detailed in the slides, but the slides miss lot of the stuff I talked at the tutorial. Am planning on adding the notes in a series of six blogs. This is Part 1 of 6.

The hands-on project, patterns & code ended up handling ~970,000 unique users, a social graph with ~500,000 cliques, some Twitter REST API runs took 19 hrs to complete and the MongoDB was ~6GB in an m2.large aws instance. Will point out some of the interesting big data patterns related to Twitter API and the social graph.


  • Aug 4, 2012 – Part 1 Completed
  • Aug 4, 2012 – Part 2 Completed
  • Aug 4, 2012 – Part 3 Completed
  • Aug 5,2012 – Part 4 Being contemplated


Twitter is at a fork – it has achieved certain amount of status and popularity, not to mention utility and value to the society. We all are slowly adapting to the medium and finding out ways of utilizing the medium.  My thoughts on the recent changes in API “branding”:

Twitter Tips – A Baker’s Dozen:

The slides capture the detailed bullet points.

Big Data with TwitterAPI – Twitter Tips

In the next blog, we will look at the Big Data Pipeline for a Twitter API eco system and then move on to APIs and Twitter Object Models.



Twitter 2.0

For some reason, while driving to work, I started thinking about what I want out of Twitter. May be because only the other day did I read the Venturebeat blog on this subject … Also I was talking with Aaron on a different context, but touched upon Twitter feature set …

While the Venturebeat blog has a detailed analysis of Twitter, I am not sure about threads, in-line photos and so forth. It is not worth losing the essential nature of the medium.

I use Twitter for 3 things – to keep current with topics that interest me, keep in touch with friends & acquaintances and finally publish things that I am interested in – many times as a bookmark !

In My Humble Opinion, the two essential features that can take Twitter to the next level, without sacrificing the essential nature of the medium, are …

Topic Streams a.k.a TweeTopics

It is almost impossible to follow topics. The List functionality never worked for me. It should be as easy to follow and unfollow topics. And in the day and age, it is not that hard to run the tweets through a set of analytics engines, cluster them by subjects and offer the topics with the same semantics as people ! The interaction semantics are important and that is what makes Twitter Twitter. There was some thoughts about tweet threading – I think that defeats the purpose; tweets are stateless and that attributes is very important.

Topic Spaces a.k.a. TweetSpaces a.k.a TweetRooms

Twitter is the right platform for ad-hoc,ephemeral spaces to exchange quick notes. IM is too heavy weight and not that easy for quick things like “Where is that meeting room” or “Which seat are you in” or “What should we discuss next” et al. A one-to-many exchange, between people who are spatially (and even temporally) in separate spaces. They might in a plane, on a call or even in a hallway! Should be easy to add  a “!” tag, and shout the info. Yep, folks need to know what the ! tag is. Actually come to think of it, we could have many types of tags using a lot of the ‘$’,’@’,’%’,’^’,’&’ and ‘*’ characters with different semantics! Time for a “Tweet Mark-up Language” ?

What says thee ? How do you use Twitter and what would you expect it to do next ?