聊聊redis的slowlog与latency monitor

本文主要研究一下redis的slowlog与latency monitor

slowlog

redis在2.2.12版本引入了slowlog,用于记录超过指定执行时间的命令,这个执行时间不包括诸如与客户端通信的IO操作耗时,是实实在在的命令执行的耗时。主要有如下操作:

查看slowlog的数量

127.0.0.1:6379> slowlog len
(integer) 1024

查看slowlog的执行耗时阈值

127.0.0.1:6379> config get slowlog-log-slower-than
1) "slowlog-log-slower-than"
2) "1000"

设置slowlog的执行耗时阈值

127.0.0.1:6379> config set slowlog-log-slower-than 1000
OK

查看slowlog保存数量的阈值

127.0.0.1:6379> config get slowlog-max-len
1) "slowlog-max-len"
2) "1024"

设置slowlog保存数量的阈值

127.0.0.1:6379> config set slowlog-max-len 1024
OK

查询slowlog

127.0.0.1:6379> slowlog get 1
1) 1) (integer) 76016
   2) (integer) 1537250266
   3) (integer) 48296
   4) 1) "COMMAND"

第一行是命令id,第二行是timestamp,第三行是执行耗时,第四行是命令及参数

清除slowlog记录

127.0.0.1:6379> slowlog reset
OK

latency monitor

redis在2.8.13版本引入了latency monitoring,这里主要是监控latency spikes(延时毛刺)。它基于事件机制进行监控,command事件是监控命令执行latency,fast-command事件是监控时间复杂度为O(1)及O(logN)命令的latency,fork事件则监控redis执行系统调用fork(2)的latency。主要有如下操作:

设置/开启latency monitor

127.0.0.1:6379> config set latency-monitor-threshold 100
OK

读取latency monitor配置

127.0.0.1:6379> config get latency-monitor-threshold
1) "latency-monitor-threshold"
2) "100"

获取最近的latency

127.0.0.1:6379> debug sleep 1
OK
(1.01s)
127.0.0.1:6379> debug sleep .25
OK
127.0.0.1:6379> latency latest
1) 1) "command"
   2) (integer) 1537268070
   3) (integer) 250
   4) (integer) 1010

返回事件名、发生的时间戳、最近的延时(
毫秒)、最大的延时(
毫秒)

查看某一事件的延时历史

127.0.0.1:6379> latency history command
1) 1) (integer) 1537268064
   2) (integer) 1010
2) 1) (integer) 1537268070
   2) (integer) 250

查看事件延时图

127.0.0.1:6379> latency reset command
(integer) 0
127.0.0.1:6379> debug sleep .1
OK
127.0.0.1:6379> debug sleep .2
OK
127.0.0.1:6379> debug sleep .3
OK
127.0.0.1:6379> debug sleep .4
OK
127.0.0.1:6379> debug sleep .5
OK
(0.50s)
127.0.0.1:6379> latency graph
(error) ERR syntax error
127.0.0.1:6379> latency graph command
command - high 500 ms, low 100 ms (all time high 500 ms)
--------------------------------------------------------------------------------
   _#
  _||
 _|||
_||||

22117
4062s
ssss

重置/清空事件数据

127.0.0.1:6379> latency reset command
(integer) 1
127.0.0.1:6379> latency history command
(empty list or set)
127.0.0.1:6379> latency latest
(empty list or set)

诊断建议

127.0.0.1:6379> latency doctor
Dave, I have observed latency spikes in this Redis instance. You don't mind talking about it, do you Dave?

1. command: 6 latency spikes (average 257ms, mean deviation 142ms, period 3.83 sec). Worst all time event 500ms.

I have a few advices for you:

- Check your Slow Log to understand what are the commands you are running which are too slow to execute. Please check http://redis.io/commands/slowlog for more information.
- Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that are often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.

小结

  • redis的slowlog在2.2.12版本引入,latency monitor在2.8.13版本引入
  • slowlog仅仅是记录纯命令的执行耗时,不包括与客户端的IO交互及redis的fork等耗时
  • latency monitor监控的latency spikes则范围广一点,不仅包括命令执行,也包括fork(2)系统调用,key过期等操作的耗时

doc

    原文作者:codecraft
    原文地址: https://segmentfault.com/a/1190000016444920
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞