The NOSQL Summer reading is an excellent idea initiated by Tim & is now spread across the world (~380 participants in 27 cities!). The papers are interesting and cover a variety of topics of interest to the NOSQL community.
As I was collecting the papers, the parallels with the Design Patterns study group struck me. So I thought of adding a suggested navigation through the papers and also a few opening questions for discussions. If it makes sense, would be happy to move this to a Wiki for a collaborative navigation/question/notes. Let me know … :
P.S : I also have materials from my OSCON NOSQL Tutorial
Related Links (check them out after going through the NOSQL paper list) :
- My NOSQL tutorial presentation OSCON 2010 (dated, still informative)
- My presentation on Kaggle OSCON ’11, Big Data & Social Network Analysis OSCON ’12
Update [10/2/x] : As I see more related papers, I am adding them to the bottom of this list
Update [6/30/xii] : Added new papers on the Bloom Language
Update [11/29/xii] : Added new papers from Google – Dremel/Spanner, Twitter Storm & Cloudera Impala
Update [12/21/12] : Added papers from Werner Vogles Back To the Basics Reading of 2012
Navigation:
The papers can be grouped & discussed as a topic:
- Core NOSQL (Start with running code)
- CAP Theorem (And now time for some background and underlying discussions)
- The SQL World (Time to take a peek into the SQL world)
- Distributed Storage
- Stasis
- Virtual Synchrony
- [Update 2/12/13]
- Scalable Casual Consistency - Geo-replicated Distributed Data Store
- Transactional Storage for geo replicated systems
- Distributed Time
- Paxos Made Simple
- *Paxos Made Practical
- Time, Clocks,…
- Timestamps in Message Passing
- Virtual Time
- Chubby
- A Simple Totally Ordered Broadcast Protocol – Yahoo’s Zookeeper Implementation of Paxos [Update 1/21/13]
- MDCC – Multi Data Center Consistency work from UC Berkeley
- Algorithmics & Data Structures of interest (Optional)
- Internet-scale systems (End the summer reading with a bang – looking at the big picture)
- Vector Clocks
- Bloom Filter
- Original Paper by Walter H.Bloom
- Scalable Bloom Filters
- HBase BF Impl (0.90) (Also the attached Bloom_Filters_in_Hbase.pdf is very informative)
- Combinatoral Generation
- Cache Efficient Bloom Filter
- Why Bloom Filters work they way they do
- Schemes for the usage of memory & disk
- Gossip
- Consistent Hashing
- Failure Detection
- The ϕ Accrual Failure Detector
- Unreliable Failure Detectors (Tushar,Sam)
- The Weakest Failure Detector (Tushar, Vassos & Sam)
- Implementing the Weakest Failure Detector
- [Update 11/16/11] Good blog in High Scalability on Failure detection & Gossip Protocols
- And this paper “Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems” is very interesting
- Analysis & Query at scale
- Google Percolator - “… a system for incrementally processing updates to a large data set” [Update 10/2/x]
- Google Dremel – “… scalable, interactive ad-hoc query system for analysis of read-only nested data” [Update 11/29/xii]
- Cloudera’s Impala implementation of Dremel primitives over HDFS/HBase data
- Apache Drill (Distributed System Interactive Analysis) is the Open Source implementation of Dremel
- Google Spanner – “… scalable, multi-version, globally-distributed, and synchronously-replicated database”
- Wired : Inside Google Spanner, the Largest Single Database on Earth
- Interesting discussions in YCombinator
- Twitter Storm & Here & Here - ”… distributed realtime computation system”
- Cloud Theory
- [Update 6/30/xx] Consistency & Logical Monotonicity
- [Update 6/30/xii] Logic and Lattices for Distributed Programming (Bloom Language for a cloud infrastructure)
- ACTORS & Concurrency
- Appendix:
Opening Questions & discussion pointers:
P.S: I also have a decent number of NOSQL bookmarks – need to organize it a bit better
…
Solid summary of the canon, thanks for putting this together. I only see one key paper missing:
GFS: http://portal.acm.org/citation.cfm?id=1165389.945450
Google’s distributed file system is, in my mind, one of the key pieces that enables MR, BigTable, and much of the apache stack. I’d put it front/center at the beginning of the NoSQL core.
Thanks. Added
Pingback: NOSQL Talk and References « My missives
Pingback: July Meeting! « Lambda Lounge
Thanks so much for taking the time to compile the list like this together. It made it much nicer to gather the papers, organize them as you have them here onto my iPad, and get them ready for reading. Your work made finding some of these references a lot easier
Pingback: OSCON NOSQL Training AMI « My missives
Pingback: My daily readings 07/23/2010 « Strange Kite
Just wanted to say that I thought your summer reading list was outstanding. I’ll confess to having used it as inspiration for a few papers to round out my course this semester. Really great stuff.
Pingback: Empirical Reality » Blog Archive » 2010 NoSQL Summer Reading List
Pingback: Allikmaterjalid « Üks mees, üks Sovhoos
Pingback: NoSQL Reading List @ Dyerac – 偶尔技术 偶尔小资
Pingback: NOSQL Bookmarks « My missives
Pingback: Quora
Pingback: Collection of distributed system papers | Vignesh's Blog