Tag
Distributed Systems
2 posts tagged Distributed Systems.
Pregel: What the Large-Scale Graph Processing Paper Actually Says
PageRank in MapReduce is O(iterations × full dataset reloads). Pregel fixes this by keeping the graph in memory across iterations and replacing disk I/O with message passing. The 'think like a vertex' model is the insight — BSP is the implementation.
Cassandra: What the Paper Actually Says
We had a Cassandra cluster where DELETE operations made reads progressively slower until queries timed out. Adding more disk space made it worse. The root cause is described precisely in the 2009 paper — but only if you understand that Cassandra cannot actually delete data.