The Times Election Commemorative Edition has an interesting article on the role of Data Science “Inside the Secret World of the Data Crunchers Who Helped Obama Win“. A few quick lessons (Of course, you should read the full Times article):
[Update 2/14/13] Infoworld has an interesting take on Big Data Analytics and the Obama Campaign. In addition to the Time’s narration of 4 lessons, InfoWorld adds the following:
- Combined efforts of Analysts & Engineers
- Implemented in weeks than months
- Built around unconstrained, yet centralized environment (This is important for big data)
- This enabled the analysts to ask questions irrespective of wherever the data originated from
- Continuous inprovement, with built-in feedback loop
Note : I discuss the 5 Pragmatic Steps for Data …. er… Big Data in another blog
[update 2/28/13] AWS case study “Obama For America” has interesting details
1. Elevate Data Science to a 1st class Citizen
- Campaign manager Jim Messina had promised a totally different, metric-driven kind of campaign in which politics was the goal but political instincts might not be the means. “We are going to measure every single thing in this campaign” … And hired a team of Data Scientists headed by Rayid Ghani
- Rayid had visited Stanford to recruit budding Data Scientists – I wanted to attend, but couldn’t; am sure they would have also visited other campuses
- Exactly what that team of dozens of data crunchers was doing, however, was a closely held secret. “They are our nuclear codes,” as the campaign guarded what it believed to be its biggest institutional advantage over Mitt Romney’s campaign: its data.
2. Collect, Unify & Leverage Big Data
- As I had written in one of my earlier blogs, the spectacular results of Data Science (inference and predictions) come from an effective data pipeline.
- The Obama campaign has interesting pipelines of big data streams
- While 2008 campaign was very successful, the team realized that they had too many databases & “None of them talked to each other.”
- So over the first 18 months, the campaign merged the information collected from pollsters, fundraisers, field workers and consumer databases as well as social-media and mobile contacts with the main Democratic voter files in the swing states. — Brings tears to the eyes of a data architect!
- They actually built an awesome data mining infrastructure
- This “megafile“ was the foundation for simulation runs for contributions, “persuadability” analysis and so forth
3. Practice Metric-driven Data Science
- Don’t be afraid to create bold models, but back them up with reality
- The Data Scientists have developed interesting models & predictions, but tested them with e-mails with different subjects, monitor results from e-mail and phone campaigns et al.
- “… assumptions were rarely left in place without numbers to back them up“
4. Effective Modeling comes from weaving Big Data & Live Data
- “The analytics team used four streams of polling data to build a detailed picture of voters in key states”
- The polling and voter-contact data were processed and reprocessed nightly to account for every imaginable scenario.
- “We ran the election 66,000 times every night”
And finally the article ends with an insightful statement,
In politics, the era of #BigData has arrived !
I rest my case with one more observation on Obama’s Digital Gurus
Inside The Cave report is a good read