真实案例:使用Java Thread Dump分析ReadWriteLock导致的死锁问题
本文的死锁
本文的死锁是由jackson-databind造成的,版本是2.4.1。
这里的死锁是这样的:一组线程中的某一个线程获得写锁之后无限循环,导致其他的试图获取读锁的线程无限等待,从而导致此组线程的工作无法推进。这有区别于常规的死锁定义。
表象
最近经常收到某应用(tomcat部署)无法响应用户请求的报警。在线上使用curl
向问题实例发起请求,没法相应,然后使用ps
发现CPU飙高。
获取Thread Dump
使用kill -3 <pid>
。因为我们使用的是tomcat应用容器,tomcat会把Thread Dump打印到其安装目录下的logs/catalina.out文件里。
分析
- 首先我们需要确认有多少线程处于WAITING状态,以及在执行什么:
cat case/catalina.out|grep 'java.lang.Thread.State: WAITING'|wc -l
在我们的案例中,有378个线程处于WAITING状态,然后通过审查发现,有大量的tomcat nio线程在等待ReadWriteLock的读锁,如下:
"http-nio-9086-exec-40" daemon prio=10 tid=0x00007f8aa402a800 nid=0x7965 waiting on condition [0x00007f8a5ab68000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d3643028> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at com.fasterxml.jackson.databind.util.LRUMap.get(LRUMap.java:56)
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:707)
- 既然读锁被阻塞住了,那么肯定有一个线程获得了写锁,并且长时间没释放(不然不会有大量读锁阻塞),进一步审查Thread Dump,发现如下线程:
"http-nio-9086-exec-31" daemon prio=10 tid=0x00007f8ab0026000 nid=0x795c runnable [0x00007f8a5b471000]
java.lang.Thread.State: RUNNABLE
at java.util.LinkedHashMap.transfer(LinkedHashMap.java:253)
at java.util.HashMap.resize(HashMap.java:581)
at java.util.HashMap.addEntry(HashMap.java:879)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
at java.util.HashMap.put(HashMap.java:505)
at com.fasterxml.jackson.databind.util.LRUMap.put(LRUMap.java:68)
at com.fasterxml.jackson.databind.type.TypeFactory._fromClass(TypeFactory.java:738)
at com.fasterxml.jackson.databind.type.TypeFactory._constructType(TypeFactory.java:387)
该程序最后执行LinkedHashMap的transfer方法,进一步查看253行的代码,发现是一个遍历一个双向链表的循环语句,根据以往经验,cpu飙高不下,又有出现循序语句的,很大可能是死循环了。进一步结合链表的特性,会不会出现回路?
- 基于以上疑问,我回到LRUMap这个类的源码看一下它的get和put方法:
@Override
public V get(Object key) {
_readLock.lock();
try {
return super.get(key);
} finally {
_readLock.unlock();
}
}
@Override
public V put(K key, V value) {
_writeLock.lock();
try {
return super.put(key, value);
} finally {
_writeLock.unlock();
}
}
get方法在调用父类的get方法之前加了读锁,put在调用父类的put方法之前加了读锁,粗略一看没啥问题。
类LRUMap继承自LinkedHashMap,所以我们转而去看LinkedHashMap的源码,在读JDK源码之前,我都习惯读一下该类的设计注释,发现这么一段:
* <p>A special {@link #LinkedHashMap(int,float,boolean) constructor} is
* provided to create a linked hash map whose order of iteration is the order
* in which its entries were last accessed, from least-recently accessed to
* most-recently (<i>access-order</i>). This kind of map is well-suited to
* building LRU caches. Invoking the <tt>put</tt> or <tt>get</tt> method
* results in an access to the corresponding entry (assuming it exists after
* the invocation completes). The <tt>putAll</tt> method generates one entry
* access for each mapping in the specified map, in the order that key-value
* mappings are provided by the specified map's entry set iterator. <i>No
* other methods generate entry accesses.</i> In particular, operations on
* collection-views do <i>not</i> affect the order of iteration of the backing
* map.
简要翻译一下,说LinkedHashMap有个特别的构造器,可以让get和put方法按照”最近使用“的方式改变Map的实体entries的迭代顺序,而恰恰的是LRUMap的提供的唯一构造器就是调用其父类LinkedHashMap的这个特殊构造器来提供LRU caches的功能,再结合他的get方法只是加了读锁,但是又有更改内部状态(迭代顺序)的行为,这肯定是线程不安全的。
- 仔细读get方法的源码,会把把链表的某元素(这里是get到entry)移除,再移动到最后,如果多个线程执行该代码时确实会有几率造成回路:
/**
* Removes this entry from the linked list.
*/
private void remove() {
before.after = after;
after.before = before;
}
/**
* Inserts this entry before the specified existing entry in the list.
*/
private void addBefore(Entry<K,V> existingEntry) {
after = existingEntry;
before = existingEntry.before;
before.after = this;
after.before = this;
}
/**
* This method is invoked by the superclass whenever the value
* of a pre-existing entry is read by Map.get or modified by Map.set.
* If the enclosing Map is access-ordered, it moves the entry
* to the end of the list; otherwise, it does nothing.
*/
void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
if (lm.accessOrder) {
lm.modCount++;
remove();
addBefore(lm.header);
}
}