多线程 – “isync”是否阻止CPU PowerPC上的Store-Load重新排序?

众所周知,PowerPC具有弱内存模型,允许任何推测性重新排序:存储存储,加载存储,存储加载,负载加载.

至少有3个围栏:

> hwsync或sync – 完全内存屏障,防止任何重新排序
> lwsync – 阻止重新排序的内存屏障:加载 – 加载,存储 – 存储,加载 – 存储
> isync – 指令障碍:https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_isync_ics_instrs.htm

例如,可以重新排序Store-stwcx.和Load-lwz在这段代码中?:https://godbolt.org/g/84t5jM

    lwarx 9,0,10
    addi 9,9,2
    stwcx. 9,0,10
    bne- 0,.L2
    isync
    lwz 9,8(1)

众所周知,isync会阻止重新排序lwarx,bne< – >以下任何说明.

但isync会阻止重新排序stwcx.,bne< – >以下任何说明?

即可以Store-stwcx.比以下Load-lwz更早开始,并且比Load-lwz晚完成?

即可以Store-stwcx. preforms存储到Store-Buffer的时间早于以下Load-lwz开始,但实际存储到所有CPU内核可见的缓存比Load-lwz完成的时间要晚?

正如我们从以下文件,文章和书籍中看到的那样:

> isync不是内存栅栏,但它只是指令栅栏.
> isync不会强制所有外部访问相对于访问内存的其他处理器和机制完成.
> isync不会等待所有其他处理器检测存储访问
> isync是一个非常低的开销和非常弱(低于lwsync和hwsync)
> isync不保证本地发布的订单中的其他处理器会感知到以前和将来的存储 – 这需要其中一条同步指令.
> isync是获取障碍,但正如我们所知,获取只能应用于加载操作,而不适用于Store(stwcx.)
> isync不会影响数据访问,也不会等待执行所有存储.

最主要的问题是:a = 0,b = 0

>如果CPU-Core-0执行:stwcx. [a] = 1 bne- isync lwz [b].
>和CPU-Core-1做:hwsync stw [b] = 1 hwsync lwz [a] hwsync.

那么Core-0可以看到[b] == 1而Core-1看到[a] == 0?

也:

> https://www.ibm.com/developerworks/systems/articles/powerpc.html

The isync prevents speculative execution from accessing the data block
before the flag has been set. And in conjunction with the preceding
load, compare, and conditional branch instructions, the isync
guarantees that the load on which the branch depends (the load of the
flag) is performed prior to any loads that occur subsequent to the
isync (loads from the shared block).
isync is not a memory barrier instruction, but the
load-compare-conditional branch-isync sequence can provide this
ordering property
.

> http://www.nxp.com/assets/documents/data/en/application-notes/AN2540.pdf

Unlike isync, sync forces all external accesses to complete with
respect to other processors and mechanisms that access memory.

>存储在PowerPC Janice M. Stone,Robert P. Fitzgerald,1995:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.4033&rep=rep1&type=pdf

Unlike sync , isync does not wait for all other processors to detect
storage accesses
. isync is a less conservative fence than sync because
it does not delay until all processors detect previous loads and
stores.

> http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html

bc;isync: this is a very low-overhead and very weak form of memory
fence.
A specific set of preceding loads on which the bc (branch
conditional) instruction depends are guaranteed to have completed
before any subsequent instruction begins execution. However,
store-buffer and cache-state effects can nevertheless make it appear
that subsequent loads occur before the preceding loads upon which the
twi instruction depends. That said, the PowerPC architecture does not
permit stores to be executed speculatively, so any store following the
twi;isync instruction is guaranteed to happen after any of the loads
on which the bc depends.

> https://books.google.ru/books?id=TKOfDQAAQBAJ&pg=PA264&lpg=PA264&dq=isync+store+load&source=bl&ots=-4FyWvxTwg&sig=r1fitaG-Q3GHOxvSMTgLJMBVGUU&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIOjAC#v=onepage&q=isync%20store%20load&f=false

《多线程 – “isync”是否阻止CPU PowerPC上的Store-Load重新排序?》

> https://books.google.ru/books?id=gZZgAQAAQBAJ&pg=PA71&lpg=PA71&dq=isync+store+load&source=bl&ots=bo6nTLdzEZ&sig=vCjoDmUWhn0buN_uMf8XgbDzCf4&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIcTAJ#v=onepage&q=isync%20store%20load&f=false

《多线程 – “isync”是否阻止CPU PowerPC上的Store-Load重新排序?》

> https://books.google.ru/books?id=G2fmCgAAQBAJ&pg=PA321&lpg=PA321&dq=isync+store+load&source=bl&ots=YS4mE-4f_F&sig=OVwaJYE-SNnor-KtKrjlkOd6AOs&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIYjAH#v=onepage&q&f=false

《多线程 – “isync”是否阻止CPU PowerPC上的Store-Load重新排序?》

> http://www.nxp.com/assets/documents/data/en/application-notes/AN3441.pdf

Note that isync does not affect data accesses and does not wait for
all stores to be performed
.

>第77页:https://www.setphaserstostun.org/power8/POWER8_UM_v1.3_16MAR2016_pub.pdf

3.5.7.2 Instruction Cache Block Invalidate (icbi)

As a result of this and other implementation-specific design
optimizations, instead of requiring the instruction sequence specified
by the Power ISA to be executed on a per cache-line basis, software
must only execute a single sequence of three instructions to make any
previous code modifications become visible: sync, icbi (to any
address), isync.

回答:

因此,isync不保证Store-Load顺序,因为“isync不是内存屏障指令”,然后isync不保证在下一个intruction之前任何以前的存储将对其他CPU-Cores可见(使用顺序一致性)将完成.指令同步命令isync仅保证启动指令的顺序,但不保证指令的完成顺序,即不保证它们对其他CPU核的可见效果的顺序.那些,isync允许在此代码stwcx中重新排序Store-Load的可见效果. [A] = 1; bne-; iSync中; lwz [b].

最佳答案 正如您所猜测的那样,您的大多数优秀来源都意味着,这里涉及内存访问的两个属性:

能见度

如果其他处理器可以正面访问内存.
使用特定于处理器的缓冲区或缓存可以使存储在处理器上完成,但使其对其他处理器不可见.

订购

执行内存访问时,应遵循同一处理器上的其他指令.

订购是内存访问的处理器内部方面,它控制处理器的无序功能.
对于其他处理器的指令,无法进行排序.

可见性是处理器间方面,它确保内存访问的副作用对其他处理器(或通常对其他代理)可见.
商店主要副作用是更改内存位置.

通过控制两个方面,可以实施进程间订购,即,其他处理器看到一系列存储器访问的顺序.
不言而喻,除非在没有其他代理存在的情况下使用,否则“排序”一词通常指的是第二个含义.
这无疑是一个令人困惑的术语.

请注意,我对PowerPC架构没有信心,我只是在网上找到的一些官方文档和你提供的报价的帮助下应用这个理论.

isync,就像sc和rfi是Context-Synchronizing instructions一样,它们的主要目的是保证后续指令在前面指定的上下文中执行.
例如,执行系统调用会更改上下文,我们不希望特权代码在非特权上下文中执行,反之亦然.

这些指令等待所有先前发送的指令完成但不可见

All previously issued instructions have completed, at least to a point where they can no longer
cause an exception.
However, memory accesses that these instructions cause need not have
completed with respect to other processors and mechanisms.

因此,根据重新排序的含义,isync会阻止Load-Load,Load-Store等重新排序.
它确实从执行处理器的角度来防止任何这种重新排序(进程内重新排序) – 所有先前的加载和存储在异步完成之前完成但是它们不一定是可见的.
它不会阻止从其他处理器(进程间重新排序)的角度重新排序,因为它不能确保先前指令的可见性.

But does isync prevent reordering stwcx.,bne <–> any following instructions?

只有进程内重新排序.

I.e. can Store-stwcx. begins earlier than the following Load-lwz, and finishes performed later than Load-lwz?

不是从执行它们的处理器的角度来看,stwcx.在lwz开始时完成,但是,使用英特尔术语,它在本地完成 – 其他处理器可能看不到它在lwz开始时完成.

I.e. can Store-stwcx. preforms Store to the Store-Buffer earlier than the following Load-lwz begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz finished?

对,就是这样.

点赞