HashMap源码分析（一）

2023年10月25日 311次阅读来源: EthanPark

最近开始看Collections部分源码，之前比较了C#中List和Java中的ArrayList的异同，今天来分析一下Java中的HashMap源码

HashMap实现了很多的接口，这次主要分析的是Map接口

Map interface

containsKey方法

containsKey方法跟Get方法本质上类似，一个判断是否包含，一个判断其中是获取其值

首先检查一下containsKey的注释

   /**
    * Returns <tt>true</tt> if this map contains a mapping for the specified
    * key. More formally, returns <tt>true</tt> if and only if this map contains
    * a mapping for a key <tt>k</tt> such that
    * <tt>(key==null ? k==null : key.equals(k))</tt>. (There can be at most one
    * such mapping.)
    *
    * @param key
    *           key whose presence in this map is to be tested
    * @return <tt>true</tt> if this map contains a mapping for the specified key
    * @throws ClassCastException
    *            if the key is of an inappropriate type for this map (<a
    *            href="{@docRoot} * /java/util/Collection.html#optional-restrictions">optional</a>)
    * @throws NullPointerException
    *            if the specified key is null and this map does not permit null
    *            keys (<a href="{@docRoot} * /java/util/Collection.html#optional-restrictions">optional</a>)
    */

其中对于k == null 是有比较的，而不是当key为null的时候，不写入map

源码如下：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; 
    Node<K,V> first, e; 
    int n; 
    K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

public boolean containsKey(Object key) {
    return getNode(hash(key), key) != null;
}

代码中主要包括三个函数，分别是hash， getNode，以及我们查看的函数containsKey

hash函数

hash函数主要是针对null来做处理，因为key继承于Object，是自带hashCode方法的，可以通用来计算

疑问：就是很明显，key.hashCode()我可以理解，但是异或 (h >>> 16)不太了解为什么

下面是hash函数的注释

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */

大意就是降低冲突

getNode函数

提到这个函数要从HashMap的结构来看，看如下简要代码：

public HashMap<K,V> implements Map<K, V>public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    transient Node<K, V>[] table;
    transient Set<Map.Entry<K, V>> entrySet;

    transient size;
    transient int modCount;
    int threshold;
    final float loadFactor;

    // other part
}

其中的Node< K, V>就是对所要存储的KEY-VALUE对进行的抽象和索引
而Node本身是带next指针的Node，如下：

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

由上述两个结构就可以知道，HashMap解决冲突的方式就是拉链式方法来解决冲突了。

由此，也就知道getNode函数实现的原因了，因为该实现是需要遍历链表的。

get 方法

上面描述了getNode方法了，那现在get方法也就更好理解了。看代码

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

直接从表中查找对应的Node，找到后直接返回其value，如果找不到，返回null

没有太多的技巧

containsValue 方法

当初本人自己来实现这个方法的时候，就碰到了个问题，不知道怎么提高算法复杂度，这可真是个难题，感觉除了遍历没有什么别的办法。
本来想是否额外需要有一个表针对Value来建立索引，但是发现，这个Value值是可以变得。。感觉立刻就不好了，维护起来更麻烦，于是就去看了源码，结果惊讶的发现，真的就是遍历。。。。

    public boolean containsValue(Object value) {
        Node<K,V>[] tab; V v;
        if ((tab = table) != null && size > 0) {
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next) {
                    if ((v = e.value) == value ||
                        (value != null && value.equals(v)))
                        return true;
                }
            }
        }
        return false;
    }

跟ArrayList中实现的contains没太多的区别，复杂度O(size)吧

put 函数

put函数跟ArrayList中add一样，都需要考虑一个问题，就是容量的问题。从一些基本的常数判定来看，HashMap的容量较小，Capacity不大，是16

    /** * The default initial capacity - MUST be a power of two. */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

最开始调用构造函数之时，跟ArrayList一样，并没有进行任何的resize操作，只是定义了默认的DEFAULT_LOAD_FACTOR

    /** * Constructs an empty <tt>HashMap</tt> with the default initial capacity * (16) and the default load factor (0.75). */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

当进行到put操作的时候，才进行resize操作

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

这里可以看出源码作者的独到之处，我个人实现的的HashMap根本没考虑节点可以做成二叉树，甚至是红黑树，作者考虑到了。

当put操作完成以后，会对map的大小进行判断，如果size超过了threshold，就会进行resize操作。

如果进入resize的源码，我们就能发现，resize的这个基本是loadFactor * DEFAULT_INITIAL_CAPACITY，默认的话就是0.75 * 16 = 12
这个值为什么是这个不太清楚，反正真正resize，可以看出，只有一个原因，就是size > threshold了

经过写测试代码也能证实这一点，可以debug看到HashMap结构中threshold的变化，这个变化跟ArrayList的变化是一直的

clear 函数

这个函数比较简单，就是讲原来的数组全部清空了，不过搞不懂了，这个清空了entrySet是怎么维护的。
看代码如下：

    public void clear() {
        Node<K,V>[] tab;
        modCount++;
        if ((tab = table) != null && size > 0) {
            size = 0;
            for (int i = 0; i < tab.length; ++i)
                tab[i] = null;
        }
    }

而且，可以知道，如果hashMap经过了插入，如果容量已经很高，其容量并不会缩小回来，而是保持不变。

这段代码真的震惊我了，我完全没看到这段代码调用什么方法来维护entrySet或者是values，但是测试结果显示都已经被清空了。这个问题在后面会回答。

values 函数

这个values当真没想到，本来以为虽然返回的是Collection，但是实际操作的时候只会通过返回一个ArrayList来解决。

代码如下：

public Collection<V> values() {
        Collection<V> vs;
        return (vs = values) == null ? (values = new Values()) : vs;
    }

    final class Values extends AbstractCollection<V> {
        public final int size()                 { return size; }
        public final void clear()               { HashMap.this.clear(); }
        public final Iterator<V> iterator()     { return new ValueIterator(); }
        public final boolean contains(Object o) { return containsValue(o); }
        public final Spliterator<V> spliterator() {
            return new ValueSpliterator<>(HashMap.this, 0, -1, 0, 0);
        }
        public final void forEach(Consumer<? super V> action) {
            Node<K,V>[] tab;
            if (action == null)
                throw new NullPointerException();
            if (size > 0 && (tab = table) != null) {
                int mc = modCount;
                for (int i = 0; i < tab.length; ++i) {
                    for (Node<K,V> e = tab[i]; e != null; e = e.next)
                        action.accept(e.value);
                }
                if (modCount != mc)
                    throw new ConcurrentModificationException();
            }
        }
    }

代码的实现使用了新的结构，Values结构，定义成了一个不能被继承的类，每次返回都会一个新的Values对象，这个对象继承了AbstractCollection
实现了一些基本的接口。
而values本身使用了AbstractMap中的values方法，其中使用的是匿名类来实现的。

    public Collection<V> values() {
        if (values == null) {
            values = new AbstractCollection<V>() {
                public Iterator<V> iterator() {
                    return new Iterator<V>() {
                        private Iterator<Entry<K,V>> i = entrySet().iterator();

                        public boolean hasNext() {
                            return i.hasNext();
                        }

                        public V next() {
                            return i.next().getValue();
                        }

                        public void remove() {
                            i.remove();
                        }
                    };
                }

                public int size() {
                    return AbstractMap.this.size();
                }

                public boolean isEmpty() {
                    return AbstractMap.this.isEmpty();
                }

                public void clear() {
                    AbstractMap.this.clear();
                }

                public boolean contains(Object v) {
                    return AbstractMap.this.containsValue(v);
                }
            };
        }
        return values;
    }

也挺麻烦的，不知道ArrayList怎么不好了。。
这里有个问题：

而最让我震惊的是这个代码不知道到底啥用，我从来没见过代码怎么对values对象就行赋值操作，除了令其为null
暂时标记为问题吧，debug了好几次，完全不知道这个values怎么维护的。

在后面提到entrySet的时候回答了这个问题。

entrySet，keySet 函数

前面的field部分的代码有提过，hashMap是会缓存entrySet的，难怪sonarqube经常推荐遍历优选entrySet。
但是entrySet的维护是一个非常有特色的地方，跟values等一样，其实本质上是空的，但是使用了的是内部类

内部类是可以访问类内部的table的，所以之后如果外部调用Set的一些接口，都可以通过entrySet所实现的方法来直接访问到HashMap的内部字段。

final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public final int size()                 { return size; }
    public final void clear()               { HashMap.this.clear(); }
    public final Iterator<Map.Entry<K,V>> iterator() {
        return new EntryIterator();
    }
    public final boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<?,?> e = (Map.Entry<?,?>) o;
        Object key = e.getKey();
        Node<K,V> candidate = getNode(hash(key), key);
        return candidate != null && candidate.equals(e);
    }
    public final boolean remove(Object o) {
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>) o;
            Object key = e.getKey();
            Object value = e.getValue();
            return removeNode(hash(key), key, value, true, true) != null;
        }
        return false;
    }
    public final Spliterator<Map.Entry<K,V>> spliterator() {
        return new EntrySpliterator<>(HashMap.this, 0, -1, 0, 0);
    }
    public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
        Node<K,V>[] tab;
        if (action == null)
            throw new NullPointerException();
        if (size > 0 && (tab = table) != null) {
            int mc = modCount;
            for (int i = 0; i < tab.length; ++i) {
                for (Node<K,V> e = tab[i]; e != null; e = e.next)
                    action.accept(e);
            }
            if (modCount != mc)
                throw new ConcurrentModificationException();
        }
    }
}

其中的size函数，直接访问的就是HashMap内部的size字段。entrySet只是一个类似view的概念，而并不是多出来需要额外维护的。

配合前面的transient关键字，也不影响序列化。

    原文作者：EthanPark
    原文地址: https://blog.csdn.net/ethanwhite/article/details/50826122
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。