A distributed system can be defined as multiple computers (nodes) communicating via a network trying to achieve some task together.
Martin Kleppmannās Course
Notes from Martin Kleppmannās Distributed Systems Course. He has a set of course notes on his teaching site as well.
How do we share data amongst different concurrent entities?
- Recommended Reading
- āDistributed Systemsā by van Steen & Tanenbaum: Implementation detail heavy, more practical
- āIntroduction to Reliable and Secure Distributed Programsā (2nd ed) by Cachin, Guerraoui & Rodrigues: Theory heavy
- āDesigning Data-Intensive Applicationsā by Kleppmann: More oriented toward distributed databases
- āOperating Systems: Concurrent and Distributed Software Designā by Addison-Wesley: links to Operating Systems
Why distributed?
- Things are inherently distributed: sending a message from your phone to your friendās phone
- Reliability: even if one node fails, the system as a whole keeps functioning
- Performance: get data from a nearby node rather than one centralized server halfway around the world
- Solve bigger problems: some amounts of data canāt fit on just one machine
Why not distributed?
- Communication may fail (and we might not even know it has failed)
- Processes may crash (and we might not know)
- All of this can happen nondeterministically
- Thus we need to think about fault tolerance