← All writing

Tag

Distributed Systems

2 posts tagged Distributed Systems.

Pregel: What the Large-Scale Graph Processing Paper Actually Says

PageRank in MapReduce is O(iterations × full dataset reloads). Pregel fixes this by keeping the graph in memory across iterations and replacing disk I/O with message passing. The 'think like a vertex' model is the insight — BSP is the implementation.

Cassandra: What the Paper Actually Says

We had a Cassandra cluster where DELETE operations made reads progressively slower until queries timed out. Adding more disk space made it worse. The root cause is described precisely in the 2009 paper — but only if you understand that Cassandra cannot actually delete data.