【JAVA基础】集合类源码分析＿ArrayList

2023年8月8日 340次阅读来源: java集合源码分析

关于虚拟机垃圾回收的部分还没写完，感觉理解的不是很好，所以最近暂时先着手JDK源码分析的部分，今年是进阶之年，务必把Java的基础打牢，把曾经拖欠的东西全都补回来，废话不多说，下面开始我的阅读源码之路，希望自己对Java设计的理解能够更上一层楼，也希望能对其他人产生一点点的作用~

读源码系列文章的套路是这样的：

首先：确定该类在JDK核心类库中继承结构的位置。
然后：解释源码中最上边的类注释，就是那个动辄几十上百行的注释，不要忽视这些注释，它能够帮助我们快速地了解这个类的基本属性（如底层实现机制，是否线程安全），同时会给我们一些使用时的最佳实践，这些对于我们理解各类的特性有着很大的帮助。
接下来：分析类的核心属性和方法（主要是方法，因为通常属性是私有的，我们无法直接接触到）。
最后：用几个**MethodTest类，来展示我们阅读的源码类的重要方法的使用。

ArrayList源码解析

ArrayList继承自抽象类AbstractList，同时实现了List, RandomAccess, Cloneable, java.io.Serializable接口，作为Collection一派，ArrayList在一般的java web开发中的出场率是非常高的，它实现了可以动态扩展的数组，并且兼容性非常强（什么都能装），但是有一些小细节，在我们不注意的情况下，还是灰常容易出错误的~

首先我们逐段来看一下ArrayList源码中的类注释：

类注释

 * Resizable-array implementation of the <tt>List</tt> interface.  Implements
 * all optional list operations, and permits all elements, including
 * <tt>null</tt>.  In addition to implementing the <tt>List</tt> interface,
 * this class provides methods to manipulate the size of the array that is
 * used internally to store the list.  (This class is roughly equivalent to
 * <tt>Vector</tt>, except that it is unsynchronized.)

ArrayList是List接口的一种可变长度数组方式的实现，有点拗口吗？这么理解，ArrayList的底层实现机制是数组，而LinkedList的底层实现机制是链表，这样是不是就很好理解了？那么好，接着读，它可以装载所有类型的元素，包括null，这里有一点需要注意，所有类型不包括基本数据类型，如果你向ArrayList中添加一个基本数据类型（比如int型的56），那么它会被自动包装为对应的包装类（这里对应的应该就是Integer）类，为什么会这样，一会就知道了。

 * <p>The <tt>size</tt>, <tt>isEmpty</tt>, <tt>get</tt>, <tt>set</tt>,
 * <tt>iterator</tt>, and <tt>listIterator</tt> operations run in constant
 * time.  The <tt>add</tt> operation runs in <i>amortized constant time</i>,
 * that is, adding n elements requires O(n) time.  All of the other operations
 * run in linear time (roughly speaking).  The constant factor is low compared
 * to that for the <tt>LinkedList</tt> implementation.

size,isEmpty,get,set,iterator,listIterator方法的运行时间为常数时间（constant time），不会因为list元素数量的增长而变大，这是因为上述方法其实就是操作内部数组的索引，所以时间是固定的。

add方法的运行时间为摊还运行时间（amortized constant time），不懂摊还运行时间是什么意思不要紧，下一句就解释了：添加n个元素需要O(n)的时间，说白了时间复杂度就是线性阶，其他的方法的运行时间也是线性时间，我没明白为什么add方法跟其他方法要分开说，有大神懂的请告知小弟，感激不尽~

 * <p>Each <tt>ArrayList</tt> instance has a <i>capacity</i>.  The capacity is
 * the size of the array used to store the elements in the list.  It is always
 * at least as large as the list size.  As elements are added to an ArrayList,
 * its capacity grows automatically.  The details of the growth policy are not
 * specified beyond the fact that adding an element has constant amortized
 * time cost.

每个ArrayList实例都会有一个capacity属性(容量)，用以表示当前list里，存储数据的数组的大小，list的capacity大于等于list的size（size是当前实际存储元素的个数），向数组中添加元素时，capacity会自动增长(原来的capacity不够的话)，在添加元素花费摊还常数时间的基础上，增长策略的细节并未被指定。

 * <p>An application can increase the capacity of an <tt>ArrayList</tt> instance
 * before adding a large number of elements using the <tt>ensureCapacity</tt>
 * operation.  This may reduce the amount of incremental reallocation.

在大批量插入元素之前，可以使用ensureCapacity方法来预先增大list容量，这样可以减少因为扩容而重新分配的次数，这点比较好理解，因为底层实现是数组嘛，数组长度是固定的，而一旦扩容，势必是重新定义了一个数组，此之谓“重新分配”。

 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access an <tt>ArrayList</tt> instance concurrently,
 * and at least one of the threads modifies the list structurally, it
 * <i>must</i> be synchronized externally.  (A structural modification is
 * any operation that adds or deletes one or more elements, or explicitly
 * resizes the backing array; merely setting the value of an element is not
 * a structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the list.

需要注意的是：ArrayList是非同步的，当多个线程并发访问ArrayList实例，并且至少其中某个线程在结构上修改(结构上修改，指添加/删除了某些元素，或者显示地修改了内部数组的大小，而不是修改某个元素的值)了此实例的话，必须对在外部对实例进行同步操作，同步可以是某个包装了此实例的一个同步对象(synchronizing-object)，如果没有的话，那么在新建这个ArrayList实例的时候，应该用如下方式创建该实例：

List list = Collections.synchronizedList(new ArrayList(…))

 * <p><a name="fail-fast"/>
 * The iterators returned by this class's {@link #iterator() iterator} and * {@link #listIterator(int) listIterator} methods are <em>fail-fast</em>: * if the list is structurally modified at any time after the iterator is * created, in any way except through the iterator's own
 * {@link ListIterator#remove() remove} or
 * {@link ListIterator#add(Object) add} methods, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of
 * concurrent modification, the iterator fails quickly and cleanly, rather
 * than risking arbitrary, non-deterministic behavior at an undetermined
 * time in the future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw {@code ConcurrentModificationException} on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness:  <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>

这段不结合实例的话理解起来比较吃力，后面重点介绍的也是ArrayList里面的iterator()方法，先硬翻译一下：通过iterator()和listIterator(int)方法返回的迭代器存在一种fail-fast机制，fail-fast机制具体表现为，当list实例的迭代器被创建之后，任何除迭代器本身的remove和add方法的方法对此list结构上的修改，都会导致该迭代器抛出ConcurrentModificationException异常。因此，在面对多线程并发修改的时候，迭代器会快速地完全失效（直译为失败，但是我感觉失效更符合编程语言），而不是冒在将来不确定的时间放生不确定行为的危险。

注意，迭代器的fail-fast机制不能得到保证，一般来说，存在不同步的并发修改时，不可能作出任何硬性保证。fail-fast的迭代器尽最大努力抛出 ConcurrentModificationException 。因此，编写依赖于此异常的程序的方式是错误的，正确做法是：迭代器的fail-fast机制应该仅用于检测程序错误。

OK，以上就是JDK1.7中ArrayList的类注释部分，概要总结如下：

ArrayList可以装载所有元素类型，包括null值
ArrayList底层实现为数组，它跟Vector很相似，但是前者是非线程安全的
ArrayList的增长会导致自动扩容（本质是new一个新的数组），因此大批量插入数据之前可以预先将ArrayList扩容（ensureCapacity方法），以防止多次ArrayList自身扩容所带来的性能损耗。
创建list实例的迭代器之后就不能再对list结构上进行修改（迭代器本身的add和remove方法除外），否则迭代器会抛出ConcurrentModificationException异常。

属性

读完类注释，那么接下来，就进入ArrayList的内部，看看其究竟是如何实现的吧~首先是两个最重要的属性！

/** * The array buffer into which the elements of the ArrayList are stored. * The capacity of the ArrayList is the length of this array buffer. */
    private transient Object[] elementData;

    /** * The size of the ArrayList (the number of elements it contains). * * @serial */
    private int size;

ArrayList的根出现了！！！！！！看到上边的elementData了吗，它就是ArrayList实例最终存储数据的地方，说ArrayList基于数组实现也是因为它喽。
下边的size就是list实例实际存储元素的个数啦，它是小于等于上边elementData数组的大小的。

方法

构造方法：

1.带int参数的构造方法：我来指定初始数组的容量

    public ArrayList(int initialCapacity) {
        super();
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        this.elementData = new Object[initialCapacity];
    }

代码很简单，new了一个指定容量的数组赋给了elementData，因为数组长度不能为负数，所以我们传的参数也不能是负数。

2.无参构造方法：默认建了一个长度为10的数组

public ArrayList() {
        this(10);
    }

2.传入一个集合的构造方法：将集合初始化进内部数组里

public ArrayList(Collection<? extends E> c) {
        elementData = c.toArray();
        size = elementData.length;
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    }

首先将集合转换为数组赋给elementData，因为转换过程中可能存在集合没有正确地转换成Object类型数组的危险，所以要在转换未成功的情况下把一个新建的数组赋值给elementData（数组长度为参数集合的大小）。

扩容

扩容分为手动扩容和自动扩容，首先来看手动扩容：

    public void ensureCapacity(int minCapacity) {
        if (minCapacity > 0)
            ensureCapacityInternal(minCapacity);
    }

    private void ensureCapacityInternal(int minCapacity) {
        modCount++;
        // overflow-conscious code
        if (minCapacity - elementData.length > 0)
            grow(minCapacity);
    }

其中可以看到，只有当指定的大小大于当前内部数组容量的时候，才会发生扩容，否则什么都不会执行。

自动扩容发生在add方法执行的时候：

    public boolean add(E e) {
        ensureCapacityInternal(size + 1);  // Increments modCount!!
        elementData[size++] = e;
        return true;
    }

由上可见，add方法中也调用了ensureCapacityInternal方法，而ensureCapacityInternal中的grow方法就是最终扩容操作放生的地方，接下来看一看grow方法中是怎样的：

    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

由代码中可以看到，首先设置一个新容量newCapacity的大小等于当前容量oldCapacity的1.5倍，如果新容量小于我们要求的容量的话，那么新容量就设置为我们要求的容量，否则就使用大小为当前容量1.5倍的新容量，然后会将新容量跟MAX_ARRAY_SIZE（65535-8）比较，如果大于MAX_ARRAY_SIZE的话，就把新容量设置为65535（int型最大值），同时new一个数组赋给elementData，由此可见，ArrayList的容量是有限制的！！！并非无限扩容，其最大存储能力为65535个元素（有溢出的危险）。
这里也说明了前边提到的，为什么ArrayList实例的容量（capacity）会大于等于其大小（size）了，因为add一个元素的时候，可能发生扩容，如果当前容量是10，那么capacity会扩容成15（10+10>>1），而这时，size等于11（10+1），so~
顺便提一下，ArrayList支持将一个元素插入到指定索引处的list实例中，此时，同样会发生扩容，但是因为此类插入操作会导致扩容及内部移动（调用本地方法），效率（相对于基于链表的集合而言）很低，所以一般不会调用（addAll的指定位置插入同理），所以不赘述了。

删除

remove方法主要有两种。

1.删除指定位置元素

    public E remove(int index) {
        rangeCheck(index);

        modCount++;
        E oldValue = elementData(index);

        int numMoved = size - index - 1;
        if (numMoved > 0)
            System.arraycopy(elementData, index+1, elementData, index,
                             numMoved);
        elementData[--size] = null; // Let gc do its work

        return oldValue;
    }

整体思路如下，首先检查索引位置的合理性，这个modCount++我们放在后边说，然后算出移除元素后，索引位置后边需要移动的元素的个数,然后把size-1，并将数组最后一个元素的引用置为null，其之前引用的对象一般会在下次垃圾回收的时候被回收掉，最后返回删除的对象。

2.删除指定引用地址的所有对象

    public boolean remove(Object o) {
        if (o == null) {
            for (int index = 0; index < size; index++)
                if (elementData[index] == null) {
                    fastRemove(index);
                    return true;
                }
        } else {
            for (int index = 0; index < size; index++)
                if (o.equals(elementData[index])) {
                    fastRemove(index);
                    return true;
                }
        }
        return false;
    }

之所以说是删除指定引用地址的对象，是因为else分支中，会遍历内部数组，只要与参数地址相同（Object对象的equals方法，比较的是对象地址）的对象，都会被删除。而如果传入的参数是null的话，那就会把list中所有的null对象删除掉。

迭代器iterator

ArrayList里面有两种迭代器方法，分别是iterator()方法和listIterator()方法，简单来讲，iterator()方法返回的是针对整个list的迭代器，而listIterator()方法返回的是某一个索引值处开始的迭代器（不带参的从0开始，带参的为索引值），两者实现原理雷同，我们单讲iterator()方法。

    public Iterator<E> iterator() {
        return new Itr();
    }

iterator()方法返回了一个Itr实例对象，Itr类其实是一个ArrayList类的私有成员内部类，看代码：

    private class Itr implements Iterator<E> {
        int cursor;       // index of next element to return
        int lastRet = -1; // index of last element returned; -1 if no such
        int expectedModCount = modCount;

        public boolean hasNext() {
            return cursor != size;
        }

        @SuppressWarnings("unchecked")
        public E next() {
            checkForComodification();
            int i = cursor;
            if (i >= size)
                throw new NoSuchElementException();
            Object[] elementData = ArrayList.this.elementData;
            if (i >= elementData.length)
                throw new ConcurrentModificationException();
            cursor = i + 1;
            return (E) elementData[lastRet = i];
        }

        public void remove() {
            if (lastRet < 0)
                throw new IllegalStateException();
            checkForComodification();

            try {
                ArrayList.this.remove(lastRet);
                cursor = lastRet;
                lastRet = -1;
                expectedModCount = modCount;
            } catch (IndexOutOfBoundsException ex) {
                throw new ConcurrentModificationException();
            }
        }

        final void checkForComodification() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
        }
    }

类很简单，三个属性，四个方法，下面依次来解析：

cursor：要返回的下一个索引的元素
lastRet：最后一个索引处的元素
expectedModCount：被赋予初始值modCount

这里有必要说一下expectedModCount这个属性（很重要！！！），前边提到了，迭代器iterator有一种叫做fail-fast的机制，简单来说就是，当一个list的迭代器生成之后，在迭代器外部再操作这个list，那么使用迭代器遍历的时候便有可能抛出异常，而这个异常跟expectedModCount属性就息息相关，后面会讲。

再来看方法：
hasNext方法不用说了，主要讲讲下边三个：

1.next方法：第一行调用了一个叫做checkForComodification的方法，这个方法内部实现非常简单，就是判断expectedModCount跟外部类（ArrayList）的属性modCount是否相等，不等则抛出ConcurrentModificationException异常。那么问题来了，上边讲了，调用ArrayList的add或remove方法，都会把modCount加1，如果在迭代器创建之后，外部调用了list的add或remove等改变list结构的方法，那么在用迭代器遍历list的时候，便会抛出异常。接下来的代码便很明了了：把当前游标处的元素返回，并把游标加1。
2.remove方法：刚才一再强调，调用外部类ArrayList的add,remove方法会报错，究其原因，就是外部的add或remove方法修改结构的时候只把modCount加1了，而迭代器的属性expectedModCount并未加1，所以当iterator内部的remove方法执行时，把modCount又赋值给了expectedModCount，异常自然就不见了。打个比方，ArrayList就是个钱庄，iterator是个账房先生，账房先生统计流水账的时候，别人要是拿走一个单据，数自然就对不上了，而账房先生自己拿的话，是能对上的（对老板而言），因为账面上就是这么多单据。
3.checkForComodification，不赘述了。

好了，上述就是ArrayList的源码分析了，下面通过两个例子，来具体说明一下ArrayList的扩容机制和迭代器的fail-fast机制：

1.扩容

public class sizeAndCapacityTest {

    public static void main(String[] args) throws NoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException {

        //新建list,内部数组长度初始化为4
        List<String> list = new  ArrayList<String>(4);

        //填满整个数组
        list.add("1");
        list.add("2");
        list.add("3");
        list.add("4");

        //获取list的实际长度size和实际容量数组
        Field sizeField = ArrayList.class.getDeclaredField("size");
        Field capacityField = ArrayList.class.getDeclaredField("elementData");

        sizeField.setAccessible(true);
        capacityField.setAccessible(true);

        int value = (Integer) sizeField.get(list);
        Object[] capacity = (Object[]) capacityField.get(list);

        //输出发现二者相等
        System.out.println("size = " + value);
        System.out.println("capacity = " + capacity.length);

        //list再加入一个元素，此时会发生扩容，默认扩容1.5倍
        list.add("5");

        value = (Integer) sizeField.get(list);
        capacity = (Object[]) capacityField.get(list);

        //原内部数组长度为4，扩容后为6，但是只有5个元素，所以size为5
        System.out.println("size = " + value);
        System.out.println("capacity = " + capacity.length);


    }

}

2.fail-fast机制

public class failFastTest {

    private static List<String> list = new ArrayList<String>();

    public static void main(String[] args) {

        // 同时启动两个线程对list进行操作
        // 每个线程都会向list中添加元素，并未有删除的操作，所以对于内部数组来说
        // 迭代器的next方法并不存在越界的问题，但是依然会出现异常，原因就在于
        // 迭代器的next方法中，只会检查expectedModCount和modCount是否相等，
        // 只要不等，就抛出异常，而迭代器自己的add和remove方法中，会把
        // expectedModCount和modCount设置为相等，这样就避免了异常的出现。
        new ThreadOne().start();
        new ThreadTwo().start();
    }

    private static void printAll() {
        Iterator<String> iter = list.iterator();
        while(iter.hasNext()) {
            System.out.print(iter.next());
        }
    }

    /** * 向list中添加1,2,3,4,5，每添加一个数之后，就通过printAll()遍历整个list */
    private static class ThreadOne extends Thread {
        public void run() {
            int i = 1;
            while (i < 6) {
                list.add(i + " added by ThreadOne, ");
                printAll();
                i++;
            }
        }
    }

    /** * 向list中添加1,2,3,4,5，每添加一个数之后，就通过printAll()遍历整个list */
    private static class ThreadTwo extends Thread {
        public void run() {
            int i = 1;
            while (i < 6) {
                list.add(i + " added by ThreadTwo, ");
                printAll();
                i++;
            }
        }
    }

}

下一篇LinkedList~~~

    原文作者：java集合源码分析
    原文地址: https://blog.csdn.net/tianma0314/article/details/59538012
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。