HashMap源代码分析·上

2019年6月27日 476次阅读来源: 一条肥鱼

感觉HashMap才是集大成者啊

继承关系简要图

《HashMap源代码分析·上》

HashMap类前注释（搓翻译）

挑重点看，挑重点翻译~

一种基于散列表的Map接口实现。允许null值与null键。HashMap与HashTable大致相同，区别在于前者是非同步且允许null。不保证顺序，且顺序可能会变。

如果hash函数足够好，这种实现中的基础操作（如get、put）只需常量时间即可。选择初始容量与加载因子非常重要，如果你非常在意Iterator的表现。

一个HashMap实例拥有两个影响它的性能的因素：初始容量和加载因子。初始容量就是在hash表创建时桶的个数；加载因子是一种衡量哈希表所允许的最大容量的参数，也就是capacity * 加载因子，当超过此值时，哈希表将进行rehash操作，也即容量将翻1倍。

通常来说，默认的加载因子0.75可以在时间消耗和空间消耗之间取得一个较好的平衡。过高，会减少空间消耗但会增加查看消耗（表现在HashMap中的大部分操作，包括get和put）。当设置它的初始容量时，为了减少rehash的次数，所预期的元素个数以及加载因子应当被考虑到。如果初始容量比元素的个数除以加载因子的结果要大，那么将不会发生rehash操作。

如果要存很多元素，给一个充分大的容量给它，将会比“给个小容量然后让其自动增长容量”这种方式更加高效。如果使用了过多的经过hashCode()处理后得到相同值的键，无论在任何哈希表中，这都会表现得更慢。为了改善这种影响，当键是Comparable是，将对他们进行比较。

什么是对Map的结构性修改？添加或删除某个键值对，修改不是。需要注意，此类不是线程同步的。

成员变量

// 默认起始容量-必须是2^n
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// 最大容量，如果任何具有参数的构造函数隐式指定较高的值，则使用该容量。
static final int MAXIMUM_CAPACITY = 1 << 30;

// 构造函数没有指定加载因子时的默认值
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 当添加节点时，节点数至少达到这个临界值，才将链表转换成树
static final int TREEIFY_THRESHOLD = 8;

/** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */
static final int UNTREEIFY_THRESHOLD = 6;

/** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */
static final int MIN_TREEIFY_CAPACITY = 64;

没有看明白所有的参数所代表的意义，先记着，后续或许会有更多领悟（2018.6.6）。
不过可以肯定的是，如果在某个桶上面的节点个数大于了8个，会将其从链表结构转换成树结构。

transient Node<K,V>[] table;

这是HashMap中放桶的位置。也就是说，通过hash函数，计算出hash值后，将该节点放置到该数组中的hash位置（待分析），如果已经存在了，那么就链接到链表上（链接到链表的头部还是尾部，有待继续分析）。

链表节点

先看链表节点的数据结构。这是一个单链表，其中包括了多项信息，诸如键、值、hash值以及下一个节点的引用。

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;// hash值
    final K key;// 键
    V value;// 值
    Node<K,V> next;// 单向链表，指向下一个节点

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }
    // 这个hash值计算的是整个键值对的hash值
    public final int hashCode() {
        // key的hash与value的hash相与
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&// 键相同
                Objects.equals(value, e.getValue()))// 值相同
                return true;
        }
        return false;
    }
}

上述代码中，使用的Objects的相应方法如下：

public static int hashCode(Object o) {
    return o != null ? o.hashCode() : 0;
}
public static boolean equals(Object a, Object b) {
    return (a == b) || (a != null && a.equals(b));
}

构造函数

共3个。分别用于指定相应的加载因子与起始容量。如下：

// 指定加载因子与起始容量
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
// 指定起始容量，加载因子默认
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
// 全部默认
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

在第一个构造函数中，我们仔细看看其中的三个if语句，第一个与第三个都是判断异常情况，中间那句的意思是，如果超出了MAXIMUM_CAPACITY，那么将起始容量置为MAXIMUM_CAPACITY。之后还会计算一个值threshold，这个值是下次resize时，需要扩展到的容量。其计算方式如下：

 // 对给定的容量，比它大的2^n值
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

这位运算有点抽象，弄几个二进制进去试试，然后对比一下就知道它大概是执行什么样的操作了。总结来说，就是通过无符号右移与相与，可以让原来数从最高位开始，到最低位，全部变成1，也就是大于原数的2^n-1，后面加1，即可达到2^n，从而找到最下次resize的容量。如下：

数值	操作
1000	1000 >>> 1 = 0100 1000 \| 0100 = 1100 1100 >>> 2 = 0011 1100 \| 0011 = 1111 1111 >>> 4 = 0000 1111 \| 0000 = 1111 …
0100	0100 >>> 1 = 0010 0100 \| 0010 = 0110 0110 >>> 2 = 0001 0110 \| 0001 = 0111 0111 >>> 4 = 0000 0111 \| 0000 = 0111 …

数值

操作

1000

1000 >>> 1 = 0100
1000 | 0100 = 1100

1100 >>> 2 = 0011
1100 | 0011 = 1111

1111 >>> 4 = 0000
1111 | 0000 = 1111
…

0100

0100 >>> 1 = 0010
0100 | 0010 = 0110

0110 >>> 2 = 0001
0110 | 0001 = 0111

0111 >>> 4 = 0000
0111 | 0000 = 0111
…

试了几组数，输出如下：

capacity	`tableSizeFor()`
1	1
3	4
9	16
24	32

回过头去想想，构造函数里面没有对本实例中的容量做任何修改。我们可以通过反射来查看其中的capacity值是多少，如下：

public static void main(String[] args) {
    HashMap<String, Integer> map = new HashMap<>(12, 0.74f);
    printHashMapCapacity(map);
}
public static void printHashMapCapacity(HashMap map){
 if (map == null)
     throw new IllegalArgumentException("what the fu*king arguments for : "+map);
 Class<HashMap> clz = HashMap.class;
    try {
        Method method = clz.getDeclaredMethod("capacity");
        method.setAccessible(true);
        System.out.println("capacity : "+(int)method.invoke(map)+", size : "+map.size());
    } catch (NoSuchMethodException e) {
        e.printStackTrace();
    } catch (IllegalAccessException e) {
        e.printStackTrace();
    } catch (InvocationTargetException e) {
        e.printStackTrace();
    }
}
---
capacity : 16, size : 0

是不是有点疑问？疑问在于capacity()。我们可以看到，这个函数是在做了一系列判断后，才给出的一个值。所以有可能是直接传的threshold。

final int capacity() {
    return (table != null) ? table.length :
        (threshold > 0) ? threshold :
        DEFAULT_INITIAL_CAPACITY;
}

增加键值对

先从public V put(K key, V value)开始，假设我们map.put("first", 1);。我们将计算出key的hash值，并跳转到putVal中。

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

其中传入了两个boolean变量，第一个表示的意思是，如果已存在，则覆盖；第二个表示的意识是，非构建模式（creation mode，没太清楚是啥）。我们先看hash()的计算方式：

/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

java中int是32位的。键为时就为0，否则先得到键的hashCode()，然后其无符号右移16位后，再与原数异或。这之后才得到键的hash值。为什么要这样算。。没太明白，所以还是把注释贴上去吧。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 如果桶数组为空，那么将利用resize()初始化一个桶数组
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 此处的位置，为什么还需要将hash与length-1？所以位置不是hash值？
    // 好像不太对，n = 2^m，所以n-1应该是全1的某个数，如7,15。
    // 因此位置还应该是hash值。
    if ((p = tab[i = (n - 1) & hash]) == null)// 位置上不存在键值对
        tab[i] = newNode(hash, key, value, null);// 新建一个
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;// 若键值对的hash,key都相同，则将p暂存到e中
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {// 键值不相同，从next遍历后续链表，如果存在就替换，不存在就添加到其上。
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {// 没有后续节点
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);// 怕是要变身，后面再看
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;// 选择跳出，说明找到相同的键，并将该节点暂存到e中
                p = e;// p指向e，即p.next，也就是p节点的下一个
            }
        }
        if (e != null) { // e不为空说明键已存在
            V oldValue = e.value;
            // 之前传进来的值起作用了。传进来的是false，此时会替换。
            // 如果是true的话，且旧值为空，还是会替换。
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);// 一个回调
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();// 同样的，要变身
    afterNodeInsertion(evict);// 还是一个回调
    return null;
}

初始化或者将容量加倍的resize()。这个过程诈看有点懵，看了这篇博客之后，有一种茅塞顿开的感觉，写得非常详细，推荐博客。该段代码主要分成两部分，前半部分是计算新的容量与临界值，后半部分为将原桶数组中的节点，按照相应的规律分配到新数组中。

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)//该桶没有后续的节点
                    // 新数组上的位置可能与之前不同，可能是二倍
                    // newCap-1为0b11...11样式的二进制，比oldCap-1多一位1
                    // 相与的话，取决于hash值的前一位。所以可能是相同，可能是(原位置➕原容量)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // 对拥有后续链表的桶，另做处理
                    // lo应该是low的缩写，hi是high的缩写，表示0或1
                    // 哪里的0或1呢？这是关键点，继续看
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // 遍历该桶的所有节点
                    do {
                        next = e.next;
                        // e.hash & oldCap这个与之前的位置有什么差别呢？
                        // 之前算位置时，是用oldCap-1与hash相与，也就是1111...1样式的二进制，
                        // 现在是10000..00样式的二进制，所以等不等于0
                        // 取决于hash值的前1位。
                        if ((e.hash & oldCap) == 0) {
                            // 为什么要用这个loTail？也就是尾。后续需要将尾位置上的节点的next指向e，也就是else所执行的。
                            if (loTail == null)// 说明还没有数据
                                loHead = e;// 首
                            else
                                loTail.next = e;// 尾->next
                            loTail = e;// 尾 = e;
                        }
                        else {
                            // 逻辑同上，此时位置的二进制的第一位是1
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    // 组成了一个新的链表，位置在之前的位置的位置上，因为第1位为0
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 组成一个新的链表，位置是原位置+原容量，因为第1位为1
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

为了更好地理解上述位置的变换，就将所推荐博客中的图片引用过来，在此对原作者表示感谢。如下：
《HashMap源代码分析·上》

总的来看，对于一个没有后续节点的桶元素来说，那么它在新桶数组中的位置，可能与原桶数组中的位置相同，也有可能是原来的两倍；否则，将对该桶节点的所有链表元素都进行重新的位置规划，要么在原位置，要么在(原位置➕原容量)。

查询元素

我们先是通过常见的get(Object key)方法来获取相应的值，该方法如下：

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

于是我们接着往下看getNode。这个方法接受两个参数，一个是kek的hash值，一个是key，双重保证，也在一定程度上可以加快速度吧。感觉从添加过来，看到这个获取的过程，觉得是理所应当这样写。

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

删除键值对

删除键值对，分两步，第一步是找到对应的键值对，第二步删除该键值对。删除时，可能会有两种情况（不考虑树结构），位于桶数组上或者位于后续的链表上。对于后者，因为此链表是单链表，我们需要将该键值对的前一个节点记录下来。

final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // 寻找键值对完毕，如果没找到node为null
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)// 在桶数组上
                tab[index] = node.next;
            else// 在链表上，将前个节点的后续改成该键值对的后续，即实现删除了该键值对
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

至此，对HashMap源代码的分析上半部分基本上完成了，后续的中下，将对红黑树在其中的应用做出分析。

    原文作者：一条肥鱼
    原文地址: https://blog.csdn.net/asahinokawa/article/details/80585354
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。