IRSDP 读书笔记 - 2. Basic Abstractions - 7

2024年1月14日 364次阅读来源: 袁世超

《Introduction to Reliable and Secure Distributed Programming》

2.7 Distributed-System Models

组合 (1) a process abstraction, (2)a link abstraction 和 (3) a failure-detector abstraction 三者定义了 a distributed-system model。

2.7.1 Combining Abstractions

本书只会讨论几种不同的组合，通过这些组合可以发现不同的假设对算法设计的影响。

Fail-stop. crash-stop process abstraction，link 是 perfect 的（Module 2.3），存在 perfect failure detector（Module 2.6）。这些假设将会简化分布式算法的设计。
Fail-noisy. crash-stop process abstraction，link 是 perfect 的（Module 2.3），存在 eventually perfect failure detector （Module 2.8）或者 eventual leader detector （Module 2.9）。
Fail-silent. crash-stop process abstraction，link 是 perfect 的（Module 2.3），但是不假设存在 failure detector 或 leader election abstraction。也就是 process 无法获取其它 process crash 的信息。
Fail-recovery. crash-recovery process abstraction，link 是 stubborn 的（Module 2.2），存在 eventual leader detector（Module 2.9）。
Fail-arbitrary. fail-arbitrary (Byzantine) process abstraction，link 是 authenticated perfect link abstraction（Module 2.5），也被称为 fail-silent-arbitrary model。如果再假设存在 Byzantine eventual leader-detector abstraction（Module 2.10），那么就是 fail-noisy-arbitrary model。
Randomized. The randomized model is of a different nature than the other distributed-system models, and can be thought of being orthogonal to all of them. Randomization is sometimes the only way to solve a problem or to circumvent inherent inefficiencies of deterministic algorithms.

2.7.2 Setup

所有 process 的 identity 是全局已知的，可以启动前静态配置，也可以通过 membership service 自动配置。

cryptographic abstractions 的相关配置也是预先定义好的。

2.7.3 Quorums

A quorum in a system with N crash-fault process abstractions (according to the fail-stop, fail-noisy, fail-silent, or fail-recovery system model) is any majority of processes, i.e., any set of more than N/2 processes.

即使系统中 f < N/2 个 process 发生故障，也会至少存在一个没有故障的 quorum。

在 arbitrary-fault process abstractions 中并不是这样，A Byzantine quorum tolerating f faults is a set of more than (N +f)/2 processes. algorithms tolerating Byzantine faults usually require that only f < N/3 processes may fail.

2.7.4 Measuring Performance

通常使用两个 metric 分析算法的成本：

the number of messages required to terminate an operation of the abstraction
the number of communication steps required to terminate such an operation

对于某些算法也需要评估 total communication size，对于 crash-recovery model 还需要考虑 the number of accesses to stable storage。

通常来说需要统计 message 数量、communication steps 次数和 disk accesses 次数。

Performance measurements are often stated in Big-O Notation。

精确的性能研究并不属于本书的内容。

    原文作者：袁世超
    原文地址: https://www.jianshu.com/p/b427d34ce921
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。