A distributed system can be defined as multiple computers (nodes) communicating via a network trying to achieve some task together.

Martin Kleppmannā€™s Course

Notes from Martin Kleppmannā€™s Distributed Systems Course. He has a set of course notes on his teaching site as well.

How do we share data amongst different concurrent entities?

  • Recommended Reading
    • ā€œDistributed Systemsā€ by van Steen & Tanenbaum: Implementation detail heavy, more practical
    • ā€œIntroduction to Reliable and Secure Distributed Programsā€ (2nd ed) by Cachin, Guerraoui & Rodrigues: Theory heavy
    • ā€œDesigning Data-Intensive Applicationsā€ by Kleppmann: More oriented toward distributed databases
    • ā€œOperating Systems: Concurrent and Distributed Software Designā€ by Addison-Wesley: links to Operating Systems

Why distributed?

  • Things are inherently distributed: sending a message from your phone to your friendā€™s phone
  • Reliability: even if one node fails, the system as a whole keeps functioning
  • Performance: get data from a nearby node rather than one centralized server halfway around the world
  • Solve bigger problems: some amounts of data canā€™t fit on just one machine

Why not distributed?

  • Communication may fail (and we might not even know it has failed)
  • Processes may crash (and we might not know)
  • All of this can happen nondeterministically
  • Thus we need to think about fault tolerance

Notes