Binary Indexed Trees[二进制索引树]

2019年11月4日 165次阅读

英文原文链接：链接地址

蓝色是笔者注释，高手请忽略

简介

为了使我们的算法更快，我们总是需要一些数据结构。在这篇文章中我们将讨论二进制索引树（Binary Indexed Tree）。依据Peter M. Fenwick，这个数据结构首先用于数据压缩。现在它多用于存储频率和操作累计频率表。

问题定义如下：我们有N个盒子。通常的操作是

1. 在第i个盒子中加入球

2. 求从盒子l到盒子k中球的总和

最天真的做法对于操作1而言时间复杂度是O(1)，对于操作2的时间复杂度是O(n)。假设我们查询m次，最坏情况下操作的时间复杂度是O(m*n)。使用一些数据结构（例如RMQ我也不知道是什么东西）可以将这个问题的最差时间复杂度控制在O(m*lg n)。另一种解决方式就是使用Binary Indexed Tree数据结构，最坏情况下的时间复杂度依然是O(m*lg n)，然是Binary Indexed Tree更容易编码，也有更小的空间使用量，相比RMQ而言。

注记

BIT	Binary Indexed Tree 二进制索引树
MaxVal	maximum value which will have non-zero frequency 非零最大值
f[i]	frequency of value with index i, i = 1 .. MaxVal 这个可以理解为每个盒子中小球的个数
c[i]	cumulative frequency for index i (f[1] + f[2] + … + f[i])
tree[i]	sum of frequencies stored in BIT with index i (latter will be described what index means); sometimes we will write tree frequency instead sum of frequencies stored in BIT 在BIT中存储的频率（小球个数）的和；有时在BIT中我们使用tree 频率来替代频率和
num¯	complement of integer num (integer where each binary digit is inverted: 0 -> 1; 1 -> 0 ) 求num的反
	NOTE: Often we put f[0] = 0, c[0] = 0, tree[0] = 0, so sometimes I will just ignore index 0.

基本思路

每个整数都可以表示为2的次幂的和。同理，累计的频率也可以表示为子频率集合的和。在我们这篇文章里，每一个集合含有一些连续但互补重合的频率子集。

idx是BIT的索引，r是idx以二进制表示后最右侧的0的位置（
很绕口，解释一下哈。比如idx为12，二进制为1100，则r=2。再来一个，idx=9，二进制1001，则r=0）。那么tree[idx]是从
(
idx
– 2^
r
+ 1)到idx的平率和（看表1.1）（即f[idx – 2^r + 1]+…f[idx]）。同时我们还说idx是负责(responsible)从(idx – 2^r + 1)到idx的索引（注意，这里是算法的关键，也是操作tree的方法）。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
f 1 0 2 1 1 3 0 4 2 5 2 2 3 1 0 2
c 1 1 3 4 5 8 8 12 14 19 21 23 26 27 27 29
tree 1 1 2 4 1 4 0 12 2 7 2 11 3 4 0 29

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
f	1	0	2	1	1	3	0	4	2	5	2	2	3	1	0	2
c	1	1	3	4	5	8	8	12	14	19	21	23	26	27	27	29
tree	1	1	2	4	1	4	0	12	2	7	2	11	3	4	0	29

table 1.1

(Tips：不要尝试去推理f(i)的值，因为这是给定的例子。c[i]和tree[i]是计算的结果，需要理解)

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
tree	1	1..2	3	1..4	5	5..6	7	1..8	9	9..10	11	9..12	13	13..14	15	1..16

table1.2 responsibility 表
（即tree[i]表示是f[]~f[]的和，例如tree[10]=f[9]+f[10]）

image 1.3 tree 负责的index（bar显示的是累加的频率）
《Binary Indexed Trees[二进制索引树]》

image 1.4 带有tree频率和的tree

假设我们要寻找index 13的累加频率。在二进制表示中，13表示为1101。据此我们计算c[1101]=tree[1101]+tree[1100]+tree[1000]。

找出最后的1

我们需要多次的从二进制数中获得最后一个1，所以我们需要一个高效的方法。假设我们想从num中获取最后的1.在二进制中num可以表示为a1b，a代表最后一个1之前的所有二进制位，b表示在这个1之后的0.
-num
=
(a1b)¯+ 1 = a¯0b¯ + 1. 由于b全市由0构成，所以b¯全部是1.由此可得
-num = (a1b)¯ + 1 = a¯0b¯ + 1 = a¯0(0…0)¯ + 1 = a¯0(1…1) + 1 = a¯1(0…0) = a¯1b.
我们现在可以简单的获得最后一个1，让num和-num做位与运算：

a1b
& a¯1b
——————–
= (0…0)1(0…0)

读取累计的频率和如果我们需要读取整数idx的频率累计和，我们可以让sum加上tree[idx]的值，然后让idx减去最有一个1（我们也可以说移走最后的1，使最后的1变为0），然后重复上述过程直至idx为0.我们可以使用下面这段代码（C++）。

`1`	`int` `read(int` `idx){`

`2`	`int` `sum = 0;`

`3`	`while` `(idx > 0){`

`4`	`sum += tree[idx];`

`5`	`idx -= (idx & -idx);`

6 }

`7`	`return` `sum;`

8 }

举例： idx=13，sum=0：

iteration	idx	position of the last digit	idx & -idx	sum
1	13 = 1101	0	1 (2 ^0)	3
2	12 = 1100	2	4 (2 ^2)	14
3	8 = 1000	3	8 (2 ^3)	26
4	0 = 0	—	—	—

《Binary Indexed Trees[二进制索引树]》

image 1.5 箭头指示了在遍历过程中使用的数据.

所以我们的结果是26.这个函数中遍历的次数是idx含有的1的个数，最大的便利次数也就是log MaxVal。时间复杂度： O(log MaxVal). 代码的复杂度：如上代码

改变一些位置的频率并更新tree

当改变某些位置的频率时，所有tree中负责该位置的都需要更新。在读取idx的累计和时我们移走idx最后的1并且循环继续。修改tree中的一些值val时，我们需要增加当前idx的tree值tree[idx]，增加idx最后一位的1（例如idx为6，该值增加了val，当tree[6]增加了val后。6的最后等于1的一位是2，所以6+2=8，需要继续修改tree[8]的值）并且循环继续之前的过程，只要idx小于MaxVal.C++写的函数如下

`1`	`void` `update(int` `idx ,int` `val){`

`2`	`while` `(idx <= MaxVal){`

`3`	`tree[idx] += val;`

`4`	`idx += (idx & -idx);`

5 }

6 }

例如idx=5：

iteration	idx	position of the last digit	idx & -idx
1	5 = 101	0	1 (2 ^0)
2	6 = 110	1	2 (2 ^1)
3	8 = 1000	3	8 (2 ^3)
4	16 = 10000	4	16 (2 ^4)
5	32 = 100000	—	—

《Binary Indexed Trees[二进制索引树]》 image 1.6 更新idx为5的频率时遍历的顺序使用如上算法我们可以更新整个BIT。时间复杂度： O(log MaxVal) 代码长度：最长10行

读取某个位置的频率值

(未翻译)

整个树乘以或除以某个常数

（未翻译）

给定累计的频率值，找出index（翻译这么多终于到我要用的地方了）

最笨最天真的解决方法就是遍历整个索引，计算累计频率，检查是否等于给定的值。如果考虑存在负数的话，这是唯一解决方案。但是如果我们只有非负的频率值的话（也就是说对于递增的index，累计频率值不减少）我们可以找到指数级的算法，这个算法由二分搜索修改而来。逐步遍历所有的位（从最高为开始），比较当前index的累计频率和给出的值，依据大于小于结果选择高一半或者低一半（就像二分查找）。C++写的函数如下：

`01`	`// if in tree exists more than one index with a same`

`02`	`// cumulative frequency, this procedure will return`

`03`	`// some of them (we do not know which one)`

04

`05`	`// bitMask - initialy, it is the greatest bit of MaxVal`

`06`	`// bitMask store interval which should be searched`

`07`	`int` `find(int` `cumFre){`

`08`	`int` `idx = 0;` `// this var is result of function`

09

`10`	`while` `((bitMask != 0) && (idx < MaxVal)){` `// nobody likes overflow :)`

`11`	`int` `tIdx = idx + bitMask;` `// we make midpoint of interval`

`12`	`if` `(cumFre == tree[tIdx])` `// if it is equal, we just return idx`

`13`	`return` `tIdx;`

`14`	`else` `if` `(cumFre > tree[tIdx]){`

`15`	`// if tree frequency "can fit" into cumFre,`

`16`	`// then include it`

`17`	`idx = tIdx;` `// update index`

`18`	`cumFre -= tree[tIdx];` `// set frequency for next loop`

19 }

`20`	`bitMask >>= 1;` `// half current interval`

21 }

`22`	`if` `(cumFre != 0)` `// maybe given cumulative frequency doesn't exist`

`23`	`return` `-1;`

24 else

`25`	`return` `idx;`

26 }

27

28

29

`30`	`// if in tree exists more than one index with a same`

`31`	`// cumulative frequency, this procedure will return`

`32`	`// the greatest one`

`33`	`int` `findG(int` `cumFre){`

`34`	`int` `idx = 0;`

35

`36`	`while` `((bitMask != 0) && (idx < MaxVal)){`

`37`	`int` `tIdx = idx + bitMask;`

`38`	`if` `(cumFre >= tree[tIdx]){`

`39`	`// if current cumulative frequency is equal to cumFre,`

`40`	`// we are still looking for higher index (if exists)`

`41`	`idx = tIdx;`

`42`	`cumFre -= tree[tIdx];`

43 }

`44`	`bitMask >>= 1;`

45 }

`46`	`if` `(cumFre != 0)`

`47`	`return` `-1;`

48 else

`49`	`return` `idx;`

50 }

当cumFre为21 时调用find的情况：

First iteration	tIdx is 16; tree[16] is greater than 21; half bitMask and continue
Second iteration	tIdx is 8; tree[8] is less than 21, so we should include first 8 indexes in result, remember idx because we surely know it is part of result; subtract tree[8] of cumFre (we do not want to look for the same cumulative frequency again – we are looking for another cumulative frequency in the rest/another part of tree); half bitMask and contiue
Third iteration	tIdx is 12; tree[12] is greater than 9 (there is no way to overlap interval 1-8, in this example, with some further intervals, because only interval 1-16 can overlap); half bitMask and continue
Forth iteration	tIdx is 10; tree[10] is less than 9, so we should update values; half bitMask and continue
Fifth iteration	tIdx is 11; tree[11] is equal to 2; return index (tIdx)

时间复杂度： O(log MaxVal) 代码长度：小于20行