《Introduction to Reliable and Secure Distributed Programming》
2.7 Distributed-System Models
组合 (1) a process abstraction, (2)a link abstraction 和 (3) a failure-detector abstraction 三者定义了 a distributed-system model。
2.7.1 Combining Abstractions
本书只会讨论几种不同的组合,通过这些组合可以发现不同的假设对算法设计的影响。
- Fail-stop. crash-stop process abstraction,link 是 perfect 的(Module 2.3),存在 perfect failure detector(Module 2.6)。这些假设将会简化分布式算法的设计。
- Fail-noisy. crash-stop process abstraction,link 是 perfect 的(Module 2.3),存在 eventually perfect failure detector (Module 2.8)或者 eventual leader detector (Module 2.9)。
- Fail-silent. crash-stop process abstraction,link 是 perfect 的(Module 2.3),但是不假设存在 failure detector 或 leader election abstraction。也就是 process 无法获取其它 process crash 的信息。
- Fail-recovery. crash-recovery process abstraction,link 是 stubborn 的(Module 2.2),存在 eventual leader detector(Module 2.9)。
- Fail-arbitrary. fail-arbitrary (Byzantine) process abstraction,link 是 authenticated perfect link abstraction(Module 2.5),也被称为 fail-silent-arbitrary model。 如果再假设存在 Byzantine eventual leader-detector abstraction(Module 2.10),那么就是 fail-noisy-arbitrary model。
- Randomized. The randomized model is of a different nature than the other distributed-system models, and can be thought of being orthogonal to all of them. Randomization is sometimes the only way to solve a problem or to circumvent inherent inefficiencies of deterministic algorithms.
2.7.2 Setup
所有 process 的 identity 是全局已知的,可以启动前静态配置,也可以通过 membership service 自动配置。
cryptographic abstractions 的相关配置也是预先定义好的。
2.7.3 Quorums
A quorum in a system with N crash-fault process abstractions (according to the fail-stop, fail-noisy, fail-silent, or fail-recovery system model) is any majority of processes, i.e., any set of more than N/2 processes.
即使系统中 f < N/2 个 process 发生故障,也会至少存在一个没有故障的 quorum。
在 arbitrary-fault process abstractions 中并不是这样,A Byzantine quorum tolerating f faults is a set of more than (N +f)/2 processes. algorithms tolerating Byzantine faults usually require that only f < N/3 processes may fail.
2.7.4 Measuring Performance
通常使用两个 metric 分析算法的成本:
- the number of messages required to terminate an operation of the abstraction
- the number of communication steps required to terminate such an operation
对于某些算法也需要评估 total communication size,对于 crash-recovery model 还需要考虑 the number of accesses to stable storage。
通常来说需要统计 message 数量、communication steps 次数和 disk accesses 次数。
Performance measurements are often stated in Big-O Notation。
精确的性能研究并不属于本书的内容。