Distributed Computing Summary

I just finished the CS7210 Distributed Computing this Fall. Don’t be scared by the OMSCentral review, the course is wonderful. It does require amount of more time and effort than many other courses, but the schedule and workload have been modified from the Sprint semester which it was first offered, and it felt much more balanced now. For the last projects, it’s not hard to get it working under simple conditions; however, it’s really hard to get full credits with a fault-tolerant version, particularly for messages being delayed or duplicated, and servers being crushed and recouped. The majority of time will be spent for debugging, and if you are not familiar with Java as I am not, better get started earlier.

Project 3 primary-backup

This project is to create a primary-backup service with support of fault-tolerance. The system consists of one special VeiwServer which select which one of servers plays the role of the primary or the backup. The implementations is relatively easy, I finished with 19/20.

Project 4 Paxos

While Paxos Made Simple [1] is the major reference for understanding Paxos, the Project 4 is largely based on Paxos Made Moderately Complex (PMMC) [2] for Multi-Paxos implementations. It’s actually not very hard to implement a simple version, given all the material this paper provides; the difficulties rest on how to make it fault-tolerant, while maintaining reasonable performance. I missed 4 out of 26 points for that reason.

Project 5 sharded KV Store

Project 5 is to implement Shardedj KV store. The system has multiple shards, each is responsible for a set of KV stores. Shards are spread evenly (best it can be) among groups of ShardStore servers. Each ShardStore server runs a local paxos, all paxos within a group participate quorums and reach consensus for any given request. Groups can be added and removed by the ShardMaster group, which itself runs a group of paxos to decide which shard goes to which ShardStore server group. The projects has 5 parts, among which the first two are much simpler. The part3 introduces multi-paxos servers, which means if you don’t have a perfectly working project 4, you’ll have to debug against the Gradescope implementation. The part4/5 further extends the key-value store to support cross-group transactions using two-phase commit. Being in the middle of carrier transition, I wasn’t able to finish them all and only got half of credits.

Exam

I did relatively well in both midterm and final exams. Theses are all multiple-choice questions and some simple fill-in blanks, non require much calculation. The course videos cover almost all the aspects of the exams.

Summary

This course is easily the top 1 choice for anyone into the Computing System specialization, it is well taught, the material is up-to-date, and both the professor and the TAs are very much active and helpful. Of the most, it’s not hard to score an A!

[1] Paxos Made Simple https://lamport.azurewebsites.net/pubs/paxos-simple.pdf [2] Paxos Made Moderately Complex http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf