Good work guys …
Chronicles of my failed attempt at procuring a GioogleIO Ticket … The Google Wallet ate my GogleIO 2013 Ticket !
It was the night before GoogleIO … Excitement was in the air … Tweets were in order …
The order of the day was to find all Easter Eggs in the page …
I clicked and clicked and clicked … and got thru all the easter Eggs …
And I slept …
It was early AM when I woke up … still 15 min before the GoogleIO stores open …
The wait was agonizing, but all for a good cause, so I thought …
I was there when the GoogleIO Ticket store opened …
I was not disappointed when my first try failed after 6 minutes …
And my optimism payed off when it eventually found me a precious little ticket …
I reviewed the purchase … and gave it to Google Wallet … little did I know that …
But the screen stayed there and the time ticked down ….
By now the verdict was clear – The Google Wallet is going to eat my lucky GoogleIO Ticket ….
And It did …..
And soon after the registration ended …. The cold hand of fate …
Can I find a kind soul at Google to help me or should I wait for GoogleIo 2014 ? ….
I came across a good diagram depicting the Big Data Eco System companies. I wanted to overlay the products as well as the Data management/Data Science pipeline viz Collect-Store-…. Here is my diagram. Let me know how I can improve it and what it is missing.
Today I installed Tomcat7 in Ubuntu. They have changed the layout of the directories and the changes are for good. We all are used to the Tomcat’s old layout and the new layout takes a little time to get used to … At first I couldn’t make head or tail out of it. Then I looked for things where they should be … and viola … it made perfect sense !
Finally we have our VPC and Mongo replica sets working. I still have to figure out the snapshots. Some notes – would appreciate comments, ideas, insights & wisdom. I have the full slides at slideshare.
I will post my notes from snapshot configuration …
I encountered this error after trying to restore 2.1 mongodumps. (This happened only after installing mongodb 2.2.0):
Wed Sep 19 18:33:26 Assertion failure b.empty() src/mongo/db/json.cpp 645
0x10036b5fb 0x10009ad86 0x1004af6f2 0x100016f85 0x100016944 0x100016944 0x100019e54 0x100313b5d 0x100315697 0x10000126a 0x1000011e4
0 mongorestore 0x000000010036b5fb _ZN5mongo15printStackTraceERSo + 43
1 mongorestore 0x000000010009ad86 _ZN5mongo12verifyFailedEPKcS1_j + 310
2 mongorestore 0x00000001004af6f2 _ZN5mongo8fromjsonEPKcPi + 1634
3 mongorestore 0x0000000100016f85 _ZN7Restore9drillDownEN5boost11filesystem210basic_pathISsNS1_11path_traitsEEEbbb + 4117
4 mongorestore 0x0000000100016944 _ZN7Restore9drillDownEN5boost11filesystem210basic_pathISsNS1_11path_traitsEEEbbb + 2516
5 mongorestore 0x0000000100016944 _ZN7Restore9drillDownEN5boost11filesystem210basic_pathISsNS1_11path_traitsEEEbbb + 2516
6 mongorestore 0x0000000100019e54 _ZN7Restore5doRunEv + 3140
7 mongorestore 0x0000000100313b5d _ZN5mongo8BSONTool3runEv + 1325
8 mongorestore 0x0000000100315697 _ZN5mongo4Tool4mainEiPPc + 5447
9 mongorestore 0x000000010000126a main + 58
10 mongorestore 0x00000001000011e4 start + 52
The offending file is the <dump directory/<database>/<collection>.metadata.json file. It has a line like so:
{options : { "create" : <database>, undefined, undefined, undefined }, indexes:[{ "v" : 1, "key" : { "_id" : 1 }, "ns" :<database>, "name" : "_id_" }]}
The “undefined,undefined,…” is an artifact from 2.1 beta.
Delete the “, undefined, undefined, undefined” and save the json metadata file and rerun the mongorestore.
You might have to do this for a few databases. You can see which data base by looking at the line before the error message like so:
Wed Sep 19 18:33:26 <dump directory>/<database>/<collection>.bson Wed Sep 19 18:33:26 going into namespace [<database>.<collection>] Wed Sep 19 18:33:26 Assertion failure b.empty() src/mongo/db/json.cpp 645
Cheers
<k/>
These are a series of blogs annotating my OSCON-2012 slides with notes, as required. Some things are detailed in the slides, but the slides miss lot of the stuff I talked about at the tutorial. This is Part 3 of 6. Link to Part 1.
I found the Twitter object model simple, intuitive and congruent. They have couple of terminologies that one would get used to – tweets are called status and the act of tweeting is status updates. Users whom you follow are friends and the users who follow you are followers.
I have a few slides describing the various objects as well as a few python programs(in GitHub) to actually get the JSON via Twitter API and inspect the actual objects. The setup slide talks about what libraries are needed. As the slides cover the objects in some detail, I will not repeat them here …
I found the APIs straightforward, abit a few glitches and mismatches. You have to assume errors and occasional dropped connections. So program accordingly with command buffers (to catchup from where you stopped), check points (to know where you stopped), control numbers monitored by supervisor processes, deal with rate limits, and so forth.
Let us move on to the Applications in the next blog …
These are a series of blogs annotating my OSCON-2012 slides with notes, as required. Some things are detailed in the slides, but the slides miss lot of the stuff I talked at the tutorial. This is Part 2 of 6. Link to Part 1.
As I had written earlier, one cannot just take some big data and analyze it. One has to approach the architecture in a systemic way with a multi-stage pipeline. For a Twitter API system, the big data pipeline has a few more interesting nuances.
While the picture is a little ugly and busy, it captures the essence. Let us go thru each stage and dig a little deeper.
This is the stage where one applies the domain data science – for example NLP for sentiment analysis or graph theory for social network analysis and so forth. It is crucial that the principles from the domains are congruent to the Twitter. Twitter is not same as friendship networks like LinkedIn or Facebook. Understanding the underlying characteristics before applying the various domain ideas are very important …
This, of course is when we see the results and use them.
Now we are ready for Part 3, The API and Object Models for Twitter API
Cheers
I had conducted a Tutorial at OSCON-2012 – “The Art of Social Media Analysis with Twitter & Python”. Slides are at slideshare and the Python/MongoDB/Networkx programs are in GitHub. Next day I was fortunate to be interviewed by Mac Slocum - Mac has a way of asking interesting questions. Thanks Mac
These are a series of blogs annotating the slides with notes, as required. Some things are detailed in the slides, but the slides miss lot of the stuff I talked at the tutorial. Am planning on adding the notes in a series of six blogs. This is Part 1 of 6.
The hands-on project, patterns & code ended up handling ~970,000 unique users, a social graph with ~500,000 cliques, some Twitter REST API runs took 19 hrs to complete and the MongoDB was ~6GB in an m2.large aws instance. Will point out some of the interesting big data patterns related to Twitter API and the social graph.
Twitter is at a fork – it has achieved certain amount of status and popularity, not to mention utility and value to the society. We all are slowly adapting to the medium and finding out ways of utilizing the medium. My thoughts on the recent changes in API “branding”:
The slides capture the detailed bullet points.
In the next blog, we will look at the Big Data Pipeline for a Twitter API eco system and then move on to APIs and Twitter Object Models.
Cheers
<k/>