Inspired by Paper we love, in this blog I listed selected papers that are read, and to be read systematically in 2016.  The list is continuously growing, reflecting my explored and conquered topics in 2016.  Most of the papers fall in the category of distributed computing including classic theoretical researches and well-known system design and implementation.

Distributed Computing Theory

1978 Time, Clocks, and the Ordering of Events in a Distributed System
Leslie Lamport
1985 Distributed snapshots: determining global states of distributed systems
KM Chandy, L Lamport
2007 Paxos Made Live – An Engineering Perspective
Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone
2012 CAP Twelve Years Later: How the “Rules” Have Changed
Eric Brewer
2014 In search of an understandable consensus algorithm
D Ongaro, J Ousterhout

Distributed Systems

2003 The Google file system
S Ghemawat, H Gobioff
2004 MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean, Sanjay Ghemawat
2006 The Chubby lock service for loosely-coupled distributed systems
M Burrows
2007 Dynamo: amazon’s highly available key-value store
G DeCandia, D Hastorun, M Jampani…
2008 Bigtable: A Distributed Storage System for Structured Data
F Chang, J Dean, S Ghemawat et al.
2008 Bitcoin: A Peer-to-Peer Electronic Cash System
S Nakamoto
2010 The Hadoop Distributed File System
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler
2010 Hive – A Petabyte Scale Data Warehouse Using Hadoop
A Thusoo, JS Sarma, N Jain, Z Shao
2010 Spark: Cluster Computing with Working Sets
M Zaharia, M Chowdhury, MJ Franklin et al.
2010 ZooKeeper: Wait-free coordination for Internet-scale systems
P Hunt, M Konar, FP Junqueira
2010 Finding a needle in Haystack: Facebook’s photo storage
D Beaver, S Kumar, HC Li, J Sobel, P Vajge
2010 Cassandra: a decentralized structured storage system
A Lakshman, P Malik
2011 Kafka: a Distributed Messaging System for Log Processing
J Kreps, N Narkhede
2011 Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Benjamin Hindman, Andy Konwinski et al.
2013 Apache hadoop yarn: Yet another resource negotiator
VK Vavilapalli, AC Murthy, C Douglas…
2013 Scaling Memcache at Facebook
R Nishtala, H Fugal, S Grimm, M Kwiatkowski

Stream Processing and Database

1993 The Volcano Optimizer Generator: Extensibility and Efficient Search
G. Graefe, W. J. McKenna
1996 Implementing data cubes efficiently
V. Harinarayan, A. Rajaraman, J. Ullman
2013 MillWheel: Fault-Tolerant Stream Processing at Internet Scale
T Akidau, A Balikov, K Bekiroğlu et al.
2015 The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
T Akidau, R Bradshaw et al.
2015 Lightweight Asynchronous Snapshots for Distributed Dataflows
P Carbone, G Fóra, S Ewen, S Haridi et al.
2016 SamzaSQL: Scalable Fast Data Management with Streaming SQL
M Pathirage, J Hyde et al.

Functional Programming

1989 Why Functional Programming Matters
J Hughes
1992 The Essence of Functional Programming
P Wadler
1995 Monads for functional programming
P Wadler


1985 Random Sampling with a Reservoir
Jeffrey Scott Vitter
1985 Self-Adjusting Binary Search Trees
DD Sleator, RE Tarjan
1995 On-line construction of suffix trees
Esko Ukkonen
2011 A Comprehensive Study of Convergent and Commutative Replicated Data Types
Mark Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski
2015 Efficient Range Minimum Queries using Binary Indexed Trees