redis-port 使用的两个例子

2019年6月9日 219次阅读来源: 董泽润

《redis-port 使用的两个例子》瑞士小刀

在上一篇《7月，redis迷情》提到平时运维中，重度使用 redis-port，下面和大家分享下使用场景和心得。redis-port 最初是 codis 项目的附属工具，同步 redis 数据到 codis 中，现在已经拆出来单独维护。感谢作者@斯宾洛克 (spinlock??的意思)。

一切从需求出发

需求即场景，运维有很多相似之处，下面列举的例子不止 redis，涉及存储数据都会遇到，mysql, pg, mongodb，带状态的存储大同小异。

1. redis 集群的扩容与收缩，最经典的需求

2. 数据异构同步，从 redis 到 mysql等等

3. redis 原有集群的拆分，按业务线打散成多个集群

4. redis 当前内存使用，key占比分析

5. 无用数据的检测和清除

6. 对于 rdb 文件的备份

看了需求，相信大家都会相视一笑，把 redis 替换成任何数据库，仍然成立。新浪对redis这块重度使用，他们有过分享，大家可以谷歌搜一下。

先来看看redis-port原理

简单来说，就是把自已伪装成 slave, 欺骗master来达到数据流同步的目地。

发送sync命令->接收rdb->解析rdb->过滤->回放rdb->回放master推送的同步数据

上面是原理流程，非常容易理解，解析完rdb后的每一步都可以高度定制。DBA 看着是不是很熟悉？非常像淘宝的canal，基于 binlog 做数据的增量消费。相比 mysql replication的原理 redis 简单太多，增量的数据就是普通的命令，需要解析的只有rdb文件。网上有一篇文章《Redis RDB Dump File Format》讲的很赞，大家可以看看。

概括起来 rdb 格式由如下块构成：

1. 文件开头是 rdb 版本号，比如REDIS0005
2. FE 后面跟随 redis DB 号，正常 slave 回放时要使用 select db
3. FD|FC 后面跟随秒或毫秒过期时间，紧随其后 value-type, string-encoded-key, encoded-value。value-type表示value的类型 set, map, sorted set等等
4. FE 后面跟随 redis DB 号，同步其它DB数据
5. 重复第3步
6. FF 表示RDB结束
7. 8 byte checksum crc64较验码

结合 redis和redis-port 源代码会对 rdb 理解更深刻。整个流程对应代码 cmd/sync.go 中的cmd.SyncRDBFile 和 cmd.SyncCommand 函数。

解决问题

假设大家有一定go基础，安装 golang 和下载 redis-port 步骤省略。

以扩容为例，假如原有集群架构是twemproxy模式，那么新建一套空集群，后端实例成倍。 redis-port默认同步到 codis, 使用slotsrestore, 需要改成 restore 命令。

cmd/utils.go restoreRdbEntry 函数

《redis-port 使用的两个例子》 restoreRdbEntry函数替换 slotsrestore

重新编译，生成 redis-port 可执行命令。两套twemproxy, 后端对应的hash策略改变，数据分布相应也会改变，redis-port 同步时源指定为旧集群后端的master实例，目标要指定为新集群的proxy地址。使用命令如下:

redis-port sync –parallel=100 –from=master_host:master_port –target=proxy:proxy_port

这条命令是前台执行，长时间运行nohup即可。parallel 指同步 rdb 事件时并发的goroutine数量。

再举一个打散的例子，将 key 前辍是user_info的迁移到新集群。根据 redis 同步原理，只要在sync rdb和sync command时，将 key 前辍是user_info的过滤出来即可，代码修改如下：

cmd/utils.go restoreRdbEntry 函数除了修改 slotsrestore，还要增加 key过滤

《redis-port 使用的两个例子》 restoreRdbEntry增加key过滤

上图代码增加对 key 的过滤，完成了 sync rdb代码的修改。还要修改sync command。

cmd/sync.go SyncCommand 函数增加 key过滤

《redis-port 使用的两个例子》 SyncCommand增加key过滤

最后使用和扩容同样的命令。迁移出 user_info 后，老集群无效的key需要过滤并删除。修改 cmd/utils.go restoreRdbEntry函数，将restore改成del命令，再将同步自身即可。

这两个例子比较典型，建义使用的同学仔细读读源代码，顺便学好go啊 ~_~

以下引用为作者@斯宾洛克同学补充：

1. 支持 psync，例如 –psync
2. 从 master 获取 rdb+backlog 速度过慢，可导致 master 主动关闭连接。解决方法你提到了一种，此外，还可以结合 –sockfile=buffer.tmp –filesize=64GB 参数，这样能使用一个最大 64GB 的文件作为缓冲（循环写，自动释放），能加速 rdb+backlog 的获取，口味更佳。
3. 向 slave restore 的速度，可以通过增加 CPU 以及增加并发连接数实现，分别是 –ncpu=4 –paralle=32
4. 其实，使用 psync 的话，port 和 master 之间就有 position 的概念了，可以减少同步失败的发生情况，redis-port 会自动重试直到不能。

注意事项

1. 同步时有两个 redis 参数需要注意

repl-backlog-size

Set the replication backlog size. The backlog is a buffer that accumulates slave data when slaves are disconnected for some time, so that when a slave wants to reconnect again, often a full resync is not needed, but a partial resync is enough, just passing the portion of data the slave missed while disconnected.
The bigger the replication backlog, the longer the time the slave can be disconnected and later be able to perform a partial resynchronization.

同步buffer的大小，默认1mb，根据当前数据量大小适当调整，比如10mb.

client-output-buffer-limit

The client output buffer limits can be used to force disconnection of clients that are not reading data from the server fast enough for some reason (a common reason is that a Pub/Sub client can’t consume messages as fast as the publisher can produce them). The limit can be set differently for the three different classes of clients: normal -> normal clients including MONITOR clients slave -> slave clients pubsub -> clients subscribed to at least one pubsub channel or pattern The syntax of every client-output-buffer-limit directive is the following: client-output-buffer-limitA client is immediately disconnected once the hard limit is reached, or if the soft limit is reached and remains reached for the specified number of seconds (continuously).
So for instance if the hard limit is 32 megabytes and the soft limit is 16 megabytes / 10 seconds, the client will get disconnected immediately if the size of the output buffers reach 32 megabytes, but will also get disconnected if the client reaches 16 megabytes and continuously overcomes the limit for 10 seconds.

client slave 级别的buffer也要调整，比如 client-output-buffer-limit slave 256mb 128mb 60

2. restore 操作只要目标集群存在指定Key, 就会fatal，如果没问题可以在代码中去掉err检测。

3. 使用伪装slave的机制，redis-port一定要轮流同步，同时bgsave可不好玩。

4. 代码修改成通用的工具，每次改代码逻辑容易出问题，from和target一定要确认好。

5. 同步时由于源数据量太大，可能工具会中断，调整redis-port并发数和redis上面提到的两个buffer就好。

存在的不足

最初 redis-port 只是定位临时迁移工具，流处理时出现异常直接 panic, 同步时间不建义太久。也和 redis replication 实现机制有关，没有类似 mysql binlog 和 position 的概念。

那么问题来了，聪明的你，想想该如何解决？

蛮好玩的工具大完玩的开心 enjoy …

    原文作者：董泽润
    原文地址: https://www.jianshu.com/p/a5eec15de485
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。