B 树/B-树(B-Tree/Bee Tree)

2019年3月16日 367次阅读来源: B树

作者：disappearedgod
文章出处：http://blog.csdn.net/disappearedgod/article/details/25365655
时间：2014-5-9

前记

本文还是主要根据教材来进行书写《数据结构与算法》 Adam Drozdek的C++版本，代码还是用Java的较好一些。后来插入了普林斯顿大学的cousera课程“Algorithm”中的PDF。
这篇博客是“多叉树”中的B树这一段文字，由于字数偏多故而分离出来。

正文

1.4.1 B 树

1.4.1.0 介绍
B树是是红黑树的一种实用性应用（partical applications）。
参考《算法》第四版中B-Tree的建立依靠了Page API，可以建立一个很大的树。如下图
《B 树/B-树(B-Tree/Bee Tree)》

1.4.1.1 定义及应用

B树（B-tree）是有Bayer和McCreight在1972年提出的数据结构。（他们也同时提出了数据库的索引，1972）
B树索引是数据库中存取和查找文件(称为记录或键值)的一种方法，应用于磁盘读取方面

B树（B-tree）是一种树状数据结构，它能够存储数据、对其进行排序并允许以O(log n)的时间复杂度运行进行查找、顺序读取、插入和删除的数据结构。B树，概括来说是一个节点可以拥有多于2个子节点的二叉查找树。与自平衡二叉查找树不同，B-树为系统最优化大块数据的读和写操作。B-tree算法减少定位记录时所经历的中间过程，从而加快存取速度。普遍运用在数据库和文件系统。

<Algorithms 4th Edition>
Definition.A B-tree of order M (where M is an even positive integer) is a tree that
either is an external k-node (with k keys and associated information) or comprises
internal k-nodes (each with k keys and k links to B-trees representing each of the k
intervals delimited by the keys), having the following structural properties: every
path from the root to an external node must be the same length ( perfect balance);
and k must be between 2 and M 1 at the root and between M/2 and M 1 at
every other node.

《B 树/B-树(B-Tree/Bee Tree)》

1.4.1.2 一些性质

根据Knuth’s的定义，m阶B树（a B-tree of order m
）是具有以下性质：

每个点最多有m个孩子
每个非叶子节点（根节点除外）最多有m/2(向上取整)个孩子
root至少有2个子树，除非root的孩子是叶子节点
k个孩子的非叶子节点含有k-1个键值
所有的叶子节点都在同一层，并且内部节点不携带任何信息。（B树的阶指最大子节点数。优势，m阶的b树节点定义为有k个键值和k+1个指针，其中m<=k<=2m，用于指定最少的子节点数）

一些
提示：根结点为叶子结点，整棵树只有一个根节点
According to Knuth’s definition, a B-tree of order m is a tree which satisfies the following properties:（wikipedia-BTree）

Every nod.e has at most m children.
Every non-leaf node (except root) has at least [m/2] children
The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k-1 keys.
All leaves appear in the same level, and internal vertices carry no information.

《B 树/B-树(B-Tree/Bee Tree)》

1.4.1.3 一些考点

1.4.1.3.1 B树与红黑树

不同：B树可以有很多的node（一般50-500），从而一个节点可以放下辅助存储器上一页或一整块的信息。
相同：一棵含有n节点的B树和红黑树的高度均为O(lgn).一般的，B树由于有多节点高度还是比红黑树要低一些。

1.4.1.4 实现

根据性质，B数往往至少是半满的，有较少的层，而且是完全平衡的。
下面通过类来实现，该类包含：
一个有m-1个单元的数组：存储键值
一个有m个单元的数组：存储指向其他节点的指针
可能包含其他信心来方便对树维护（friend）

template
class BTreeNode{
public:
  BTreeNode();
  BTreeNode(const T&);
private:
  bool leaf;
  int keyTally;
  T keys[M-1];
  BTreeNode *pointers[M];
  friend BTree;
}

1.4.1.4.1 B树查找：

Searching in a B-tree

Start at root.
Find interval for search key and take corresponding link.
Search terminates in external node.

《B 树/B-树(B-Tree/Bee Tree)》

BTreeNode *BTreeSearch(keyType K,BTreeNode *node){
  if(node != 0){
    for(i=1;ikeyTally && node->keys[i-1]node->node->keyTally||node->keys[i-1]>K)
      return BTreeSearch(K,node->pointers[i-1]);
    else
      return node;
  }
  else 
    return 0;
}

《B 树/B-树(B-Tree/Bee Tree)》

搜索最坏情况：when B树中每个非根节点只有最少的允许指针数目。 q=M/2（向下取整），而且搜索要一直到叶子节点（无论命中与否）

1.4.1.4.2 B树的插入

Insertion in a B-tree

Search for new key.
Insert at bottom.
Split nodes with M key-link pairs on the way up the tree.

《B 树/B-树(B-Tree/Bee Tree)》

B树特点：所有叶子节点在B树的最后一层，所以插入和删除并不简单。
===》》树自底向上建立。》》根节点处于不断变化中（until 所有插入完全后才能确定）
3种插入情况

键值放入上有的空节点：叶节点内排序

要超如键值叶节点已经满：分解叶节点，创建一个心的叶节点，将已满的叶节点中的一半键值移到新的叶节点中，并将新叶节点合并到B树中。

B树根节点是满的（2成立，一直到父节点全满的情况）：创建一个心的根节点和一个与原根同级的新节点。（高度增加）

《B 树/B-树(B-Tree/Bee Tree)》

伪代码

BTreeInsert(K)
	找到一个叶节点node来插入k
	while（true）
		在数组keys中为k找到一个合适的位置；
		if node 不满
			插入K并递增keyTally；
			return；
		else			
			将node分解为node1(=node)与node2(新节点)；
			在node1和node2之间平均分配键值和指针，并正确地初始化他们的keyTally；
		k=中间键值
		if node 是根节点
			穿件一个心的根节点，作为node1和node2的父节点
			将K及指针node1和node2的指针妨碍根节点中，并将根节点的keyTally设为1；
			return；
		else
			node = 其父节点。//处理父节点

1.4.1.4.3 B树的删除

删除操作在很大程度上是插入操作的逆过程。但，删除有更多的特殊情形。应该注意避免在删除后节点出现不到半满的情形。这意味着有时节点要合并。
2中情况：

从叶节点删除：
- 如果删除键值K后，叶节点至少是半满的，只有大于K的键值向左移动，来填补空位。（第一种情况的你操作）
- 如果删除键值K后，叶节点中的键值个数少于【m/2】（向下取整）-1，则引起下溢。
  - 如果左或右同级节点的键值数目超过下限，于是该叶节点和同级叶节点中的所有键值将在这两个叶节点中重新分配，在重新分配过程中，将父节点中划分这两个叶节点的键值移到这两个叶节点中，并从中选择中间键值，移到父节点中。
  - 如果叶节点下溢，其同级节点中键值的数目等于【m/2】-1，就合并该叶节点和同级节点。将该叶节点、同级叶节点及父节点中划分这两个叶节点的所有键值一起放进该叶节点中，然后删除同级节点，如果出现空位，就移动父节点中的键值。如果父节点出现下溢，则会引发以西医额擦做，这时候吧父节点当做叶节点，直到根部。（第二步的逆操作）
  - 当父节点是只有一个键值的根节点时，会出现一个特殊的情形，即合并叶节点或非也节点和他的同级节点。在这种情形下，该节点和其同级节点的键值，以及根节点的唯一键值一起放在一个心节点中，变成新的根节点。并删除源节点和其同级节点。这是两个节点在一次操作中一同消失的惟一情形。同时，树的高度-1.（第三步的逆过程）
从非也节点中删除；（采用二叉搜索树中使用过的deleteByCopying（））
- 这或许会引起树结构的重组。因此从非叶子节点删除键值可以简化为从叶节点中删除键值。被删除的键值用其前驱取代（或后继）。这个后继键值从该叶节点中删除，回到第一种情形。

BTreeDelete(K)
	node = BTreeSearch(K,root)
	if(node!=null)
		if node 不是叶节点
			寻找一个带有最接近K的后继S的叶节点；
			把S赋值到K所在的node中；
			node = 包含S的叶节点；
			从node中删除S；
		else 从node 中删除K
		while（1）
			if node没有下溢
				return；
			else if node 有同级节点，且同级节点有足够多的键值
				在node和同级节点之间重新分配键值；
				return；
			else if node的父节点是根节点
				if 父节点只有一个键值
					合并node、它的同级节点以及父节点，形成一个心的根节点；
				else
					合并node和它的同级节点；
					return；
				else 合并node和它的同级节点；
					node = 它的父节点；

1.4.1.4.2 B树的平衡

Balance in B-tree

Proposition. A search or an insertion in a B-tree of order M with N keys requires between log_M-1(N) and log_M/2(N) probes.
Pf. All internal nodes (besides root) have between M / 2 and M – 1 links.
In practice. Number of probes is at most 4.(M = 1024; N = 62 billion log_M/2(N) ≤ 4)
Optimization. Always keep root page in memory.

Java 版本（4 Edition）

API for a B-tree page(JAVA)

public class P age<Key>
Page(boolean bottom)
create and open a page
void close()
close a page
void add(Key key)
put key into the (external) page
void add(Page p)
open p and put an entry into this (internal) page that
associates thesmallest key in p with p
boolean isExternal()
is this page external?
boolean contains(Key key)
is key in the page?
Page next(Key key)
the subtree that could contain the key
boolean isFull()
has the page overflowed?
Page split()
move the highest-ranking half of the keys in the page to a
new page
Iterable<Key> keys()
iterator for the keys on the page

/*************************************************************************
 *  Compilation:  javac BTree.java
 *  Execution:    java BTree
 *
 *  B-tree.
 *
 *  Limitations
 *  -----------
 *   -  Assumes M is even and M >= 4
 *   -  should b be an array of children or list (it would help with
 *      casting to make it a list)
 *
 *************************************************************************/


public class BTree, Value>  {
    private static final int M = 4;    // max children per B-tree node = M-1

    private Node root;             // root of the B-tree
    private int HT;                // height of the B-tree
    private int N;                 // number of key-value pairs in the B-tree

    // helper B-tree node data type
    private static final class Node {
        private int m;                             // number of children
        private Entry[] children = new Entry[M];   // the array of children
        private Node(int k) { m = k; }             // create a node with k children
    }

    // internal nodes: only use key and next
    // external nodes: only use key and value
    private static class Entry {
        private Comparable key;
        private Object value;
        private Node next;     // helper field to iterate over array entries
        public Entry(Comparable key, Object value, Node next) {
            this.key   = key;
            this.value = value;
            this.next  = next;
        }
    }

    // constructor
    public BTree() { root = new Node(0); }
 
    // return number of key-value pairs in the B-tree
    public int size() { return N; }

    // return height of B-tree
    public int height() { return HT; }


    // search for given key, return associated value; return null if no such key
    public Value get(Key key) { return search(root, key, HT); }
    private Value search(Node x, Key key, int ht) {
        Entry[] children = x.children;

        // external node
        if (ht == 0) {
            for (int j = 0; j < x.m; j++) {
                if (eq(key, children[j].key)) return (Value) children[j].value;
            }
        }

        // internal node
        else {
            for (int j = 0; j < x.m; j++) {
                if (j+1 == x.m || less(key, children[j+1].key))
                    return search(children[j].next, key, ht-1);
            }
        }
        return null;
    }


    // insert key-value pair
    // add code to check for duplicate keys
    public void put(Key key, Value value) {
        Node u = insert(root, key, value, HT); 
        N++;
        if (u == null) return;

        // need to split root
        Node t = new Node(2);
        t.children[0] = new Entry(root.children[0].key, null, root);
        t.children[1] = new Entry(u.children[0].key, null, u);
        root = t;
        HT++;
    }


    private Node insert(Node h, Key key, Value value, int ht) {
        int j;
        Entry t = new Entry(key, value, null);

        // external node
        if (ht == 0) {
            for (j = 0; j < h.m; j++) {
                if (less(key, h.children[j].key)) break;
            }
        }

        // internal node
        else {
            for (j = 0; j < h.m; j++) {
                if ((j+1 == h.m) || less(key, h.children[j+1].key)) {
                    Node u = insert(h.children[j++].next, key, value, ht-1);
                    if (u == null) return null;
                    t.key = u.children[0].key;
                    t.next = u;
                    break;
                }
            }
        }

        for (int i = h.m; i > j; i--) h.children[i] = h.children[i-1];
        h.children[j] = t;
        h.m++;
        if (h.m < M) return null;
        else         return split(h);
    }

    // split node in half
    private Node split(Node h) {
        Node t = new Node(M/2);
        h.m = M/2;
        for (int j = 0; j < M/2; j++)
            t.children[j] = h.children[M/2+j]; 
        return t;    
    }

    // for debugging
    public String toString() {
        return toString(root, HT, "") + "\n";
    }
    private String toString(Node h, int ht, String indent) {
        String s = "";
        Entry[] children = h.children;

        if (ht == 0) {
            for (int j = 0; j < h.m; j++) {
                s += indent + children[j].key + " " + children[j].value + "\n";
            }
        }
        else {
            for (int j = 0; j < h.m; j++) {
                if (j > 0) s += indent + "(" + children[j].key + ")\n";
                s += toString(children[j].next, ht-1, indent + "     ");
            }
        }
        return s;
    }


    // comparison functions - make Comparable instead of Key to avoid casts
    private boolean less(Comparable k1, Comparable k2) {
        return k1.compareTo(k2) < 0;
    }

    private boolean eq(Comparable k1, Comparable k2) {
        return k1.compareTo(k2) == 0;
    }


   /*************************************************************************
    *  test client
    *************************************************************************/
    public static void main(String[] args) {
        BTree st = new BTree();

//      st.put("www.cs.princeton.edu", "128.112.136.12");
        st.put("www.cs.princeton.edu", "128.112.136.11");
        st.put("www.princeton.edu",    "128.112.128.15");
        st.put("www.yale.edu",         "130.132.143.21");
        st.put("www.simpsons.com",     "209.052.165.60");
        st.put("www.apple.com",        "17.112.152.32");
        st.put("www.amazon.com",       "207.171.182.16");
        st.put("www.ebay.com",         "66.135.192.87");
        st.put("www.cnn.com",          "64.236.16.20");
        st.put("www.google.com",       "216.239.41.99");
        st.put("www.nytimes.com",      "199.239.136.200");
        st.put("www.microsoft.com",    "207.126.99.140");
        st.put("www.dell.com",         "143.166.224.230");
        st.put("www.slashdot.org",     "66.35.250.151");
        st.put("www.espn.com",         "199.181.135.201");
        st.put("www.weather.com",      "63.111.66.11");
        st.put("www.yahoo.com",        "216.109.118.65");


        StdOut.println("cs.princeton.edu:  " + st.get("www.cs.princeton.edu"));
        StdOut.println("hardvardsucks.com: " + st.get("www.harvardsucks.com"));
        StdOut.println("simpsons.com:      " + st.get("www.simpsons.com"));
        StdOut.println("apple.com:         " + st.get("www.apple.com"));
        StdOut.println("ebay.com:          " + st.get("www.ebay.com"));
        StdOut.println("dell.com:          " + st.get("www.dell.com"));
        StdOut.println();

        StdOut.println("size:    " + st.size());
        StdOut.println("height:  " + st.height());
        StdOut.println(st);
        StdOut.println();
    }

}

1.4.1.5 评价

根据B树的定义，B树应该保证至少半满。所以，基本上会浪费50%的空间。当这种品庐出现过高时候，就应该对G树定义加以限制。
模拟和分析表明，在执行大量的随机插入和删除操作后，B树大约69%满。

Reference：

Bayer, R.; McCreight, E. (1972), “Organization and Maintenance of Large Ordered Indexes”, Acta Informatica 1 (3): 173–189

    原文作者：B树
    原文地址: https://blog.csdn.net/disappearedgod/article/details/25365655
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。