真实案例:使用Java Thread Dump分析ReadWriteLock导致的死锁问题

真实案例:使用Java Thread Dump分析ReadWriteLock导致的死锁问题

本文的死锁

本文的死锁是由jackson-databind造成的,版本是2.4.1。
这里的死锁是这样的:一组线程中的某一个线程获得写锁之后无限循环,导致其他的试图获取读锁的线程无限等待,从而导致此组线程的工作无法推进。这有区别于常规的死锁定义。

表象

最近经常收到某应用(tomcat部署)无法响应用户请求的报警。在线上使用curl向问题实例发起请求,没法相应,然后使用ps发现CPU飙高。

获取Thread Dump

使用kill -3 <pid>。因为我们使用的是tomcat应用容器,tomcat会把Thread Dump打印到其安装目录下的logs/catalina.out文件里。

分析

  • 首先我们需要确认有多少线程处于WAITING状态,以及在执行什么:
    cat case/catalina.out|grep 'java.lang.Thread.State: WAITING'|wc -l

在我们的案例中,有378个线程处于WAITING状态,然后通过审查发现,有大量的tomcat nio线程在等待ReadWriteLock的读锁,如下:

"http-nio-9086-exec-40" daemon prio=10 tid=0x00007f8aa402a800 nid=0x7965 waiting on condition [0x00007f8a5ab68000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000d3643028> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
    at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
    at com.fasterxml.jackson.databind.util.LRUMap.get(LRUMap.java:56)
    at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:707)
  • 既然读锁被阻塞住了,那么肯定有一个线程获得了写锁,并且长时间没释放(不然不会有大量读锁阻塞),进一步审查Thread Dump,发现如下线程:
"http-nio-9086-exec-31" daemon prio=10 tid=0x00007f8ab0026000 nid=0x795c runnable [0x00007f8a5b471000]
   java.lang.Thread.State: RUNNABLE
    at java.util.LinkedHashMap.transfer(LinkedHashMap.java:253)
    at java.util.HashMap.resize(HashMap.java:581)
    at java.util.HashMap.addEntry(HashMap.java:879)
    at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
    at java.util.HashMap.put(HashMap.java:505)
    at com.fasterxml.jackson.databind.util.LRUMap.put(LRUMap.java:68)
    at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:738)
    at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387)

该程序最后执行LinkedHashMap的transfer方法,进一步查看253行的代码,发现是一个遍历一个双向链表的循环语句,根据以往经验,cpu飙高不下,又有出现循序语句的,很大可能是死循环了。进一步结合链表的特性,会不会出现回路?

  • 基于以上疑问,我回到LRUMap这个类的源码看一下它的get和put方法:
    @Override
    public V get(Object key) {
        _readLock.lock();
        try {
            return super.get(key);
        } finally {
            _readLock.unlock();
        }
    }

    @Override
    public V put(K key, V value) {
        _writeLock.lock();
        try {
            return super.put(key, value);
        } finally {
            _writeLock.unlock();
        }
    }

get方法在调用父类的get方法之前加了读锁,put在调用父类的put方法之前加了读锁,粗略一看没啥问题。
类LRUMap继承自LinkedHashMap,所以我们转而去看LinkedHashMap的源码,在读JDK源码之前,我都习惯读一下该类的设计注释,发现这么一段:

 * <p>A special {@link #LinkedHashMap(int,float,boolean) constructor} is
 * provided to create a linked hash map whose order of iteration is the order
 * in which its entries were last accessed, from least-recently accessed to
 * most-recently (<i>access-order</i>).  This kind of map is well-suited to
 * building LRU caches.  Invoking the <tt>put</tt> or <tt>get</tt> method
 * results in an access to the corresponding entry (assuming it exists after
 * the invocation completes).  The <tt>putAll</tt> method generates one entry
 * access for each mapping in the specified map, in the order that key-value
 * mappings are provided by the specified map's entry set iterator.  <i>No
 * other methods generate entry accesses.</i> In particular, operations on
 * collection-views do <i>not</i> affect the order of iteration of the backing
 * map.

简要翻译一下,说LinkedHashMap有个特别的构造器,可以让get和put方法按照”最近使用“的方式改变Map的实体entries的迭代顺序,而恰恰的是LRUMap的提供的唯一构造器就是调用其父类LinkedHashMap的这个特殊构造器来提供LRU caches的功能,再结合他的get方法只是加了读锁,但是又有更改内部状态(迭代顺序)的行为,这肯定是线程不安全的。

  • 仔细读get方法的源码,会把把链表的某元素(这里是get到entry)移除,再移动到最后,如果多个线程执行该代码时确实会有几率造成回路:
        /**
         * Removes this entry from the linked list.
         */
        private void remove() {
            before.after = after;
            after.before = before;
        }

        /**
         * Inserts this entry before the specified existing entry in the list.
         */
        private void addBefore(Entry<K,V> existingEntry) {
            after  = existingEntry;
            before = existingEntry.before;
            before.after = this;
            after.before = this;
        }

        /**
         * This method is invoked by the superclass whenever the value
         * of a pre-existing entry is read by Map.get or modified by Map.set.
         * If the enclosing Map is access-ordered, it moves the entry
         * to the end of the list; otherwise, it does nothing.
         */
        void recordAccess(HashMap<K,V> m) {
            LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
            if (lm.accessOrder) {
                lm.modCount++;
                remove();
                addBefore(lm.header);
            }
        }

一个可用于多线程的LinkedHashmap

参考

    原文作者:java锁
    原文地址: https://blog.csdn.net/u013623728/article/details/72933885
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞