java – 针对batchRequests的Aerospike性能低下

我有一个应用程序可以执行很多batchGets(大多数使用~2000个键).这是我使用的代码:

AerospikeClient aerospike = new AerospikeClient("10.0.240.2", port)

public byte[][] getBatch(byte[][] keys) {
    Key[] aeroKeys = new Key[keys.length];
    for (int i = 0; i < keys.length; i++) {
        aeroKeys[i] = new Key(NAMESPACE, setName, keys[i]);
    }
    Record[] records = aerospike.get(batchPolicy, aeroKeys);
    byte[][] response = new byte[keys.length][];

    for (int i = 0; i < keys.length; i++) {
        if (records[i] != null) {
            response[i] = (byte[]) records[i].getValue(DEFAULT_BIN_NAME);
        }
    }
    return response;
}

当我有一个请求时,此代码可以完美而快速地运行.但是当我运行多个并行线程执行batchGets时,它非常慢(降级与线程数呈线性关系,例如4个线程= 4x慢,8个线程= 8x慢).我没有在监控中看到太多CPU或I / O使用情况,所以我怀疑有些东西在等待,但我不知道它是什么.

我尝试了很多不同的配置,这是我现在的配置(16核服务器):

    service-threads 16
    transaction-queues 16
    transaction-threads-per-queue 16
    batch-index-threads 16
    batch-max-buffers-per-queue 1000
    proto-fd-max 15000
    batch-max-requests 2000000

关于最新情况的任何想法?

编辑1:
命名空间配置:

namespace test {
    replication-factor 2
    memory-size 5G
    default-ttl 0 # 30 days, use 0 to never expire/evict.
    ldt-enabled true

    storage-engine device {
           file /data/aerospike.dat
           filesize 300G
           disable-odirect true
           write-block-size 1M
           max-write-cache 1G
        }
}

和延迟统计:

$asadm -e 'show latency' 
~~~~~~~~~~~~~~~~~~~~~~~~proxy Latency~~~~~~~~~~~~~~~~~~~~~~~~
Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
   .                     Span         .      .      .       .   
p      23:00:09-GMT->23:00:19       0.0    0.0    0.0     0.0   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~query Latency~~~~~~~~~~~~~~~~~~~~~~~~
Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
   .                     Span         .      .      .       .   
p      23:00:09-GMT->23:00:19       0.0    0.0    0.0     0.0   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~reads Latency~~~~~~~~~~~~~~~~~~~~~~~~~
Node                     Time   Ops/Sec    >1Ms    >8Ms   >64Ms   
   .                     Span         .       .       .       .   
p      23:00:09-GMT->23:00:19   15392.1   92.67   62.89    6.03   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~udf Latency~~~~~~~~~~~~~~~~~~~~~~~~~
Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
   .                     Span         .      .      .       .   
p      23:00:09-GMT->23:00:19       0.0    0.0    0.0     0.0   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~writes_master Latency~~~~~~~~~~~~~~~~~~~~
Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
   .                     Span         .      .      .       .   
p      23:00:09-GMT->23:00:19       0.0    0.0    0.0     0.0   
Number of rows: 1

编辑2:

我认为这个问题与java客户端有些相关.
我做了以下实验:在不同的机器上创建了我的应用程序的两个实例,两个实例都访问了一个airospike服务器
我将负载平衡放在这两个服务器之间传播请求.
通过这种配置,我获得了吞吐量的两倍.

当所有请求来自单个服务器之前,aerospike服务器现在接收并正确响应以使流量加倍.但是,如果我查看我的Java应用程序服务器,它不会消耗CPU,所以我不受CPU限制.在请求期间,网络似乎非常密集.它在服务器中显示5Gps.

因此,有5个应用服务器,每个服务器都有一个CPU,我可以向服务器发送1Gps的网络流量,并且它可以工作.但是如果我在具有8个内核的机器中有一个应用程序实例,它似乎会将服务器请求排入队列.

我的代码对所有请求使用单个AerospikeClient实例,如文档中所建议的那样.我从不关闭这个空中客户端连接,我在系统工作时保持打开状态.

编辑3:

$asloglatency -h reads
  reads
  Jan 04 2016 19:12:55
               % > (ms)
  slice-to (sec)      1      8     64  ops/sec
  -------------- ------ ------ ------ --------
19:13:25    10  87.27  14.87   0.00    242.7
19:13:35    10  89.44  21.90   0.00   3432.0
19:13:45    10  89.68  26.87   0.00   4981.6
19:13:55    10  89.61  25.62   0.00   5469.9
19:14:05    10  89.89  27.56   0.00   6190.8
19:14:15    10  90.59  33.84   0.30   6138.2
19:14:25    10  89.79  29.44   0.00   5393.2


ubuntu@aerospike1:~$asloglatency -h batch_index_reads
batch_index_reads
Jan 04 2016 19:30:36
               % > (ms)
slice-to (sec)      1      8     64  ops/sec
-------------- ------ ------ ------ --------
19:30:46    10 100.00 100.00   3.33      3.0
19:30:56    10 100.00 100.00  23.40      9.4
19:31:06    10 100.00 100.00  27.59     11.6
19:31:16    10 100.00 100.00  31.30     13.1
19:31:26    10 100.00 100.00  30.00     13.0
19:31:36    10 100.00 100.00  27.14     14.0

编辑4:

$asadm -e "show distribution"
~~~~~~~~~~~~~~~~~~~test - TTL Distribution in Seconds~~~~~~~~~~~~~~~~~~
      Percentage of records having ttl less than or equal to value
                          measured in Seconds
      Node   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%
aerospike1     0     0     0     0     0     0     0     0     0      0
aerospike2     0     0     0     0     0     0     0     0     0      0
Number of rows: 2

~~~~~~~~~~~~test - Object Size Distribution in Record Blocks~~~~~~~~~~~
        Percentage of records having objsz less than or equal to
                    value measured in Record Blocks
      Node   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%
aerospike1     3     3     3     3     3    65    94    97   100    100
aerospike2     3     3     3     3     3    65    94    97   100    100

编辑5:

$asadm -e "show stat like batch"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                         :   aerospike1                                                                                                                           aerospike2
batch_errors                 :   0                                                                                                                                    0
batch_index_complete         :   403423                                                                                                                               3751948
batch_index_created_buffers  :   8425500                                                                                                                              169997886
batch_index_destroyed_buffers:   8423984                                                                                                                              169994324
batch_index_errors           :   3                                                                                                                                    8305
batch_index_huge_buffers     :   7075094                                                                                                                              64191339
batch_index_initiate         :   403428                                                                                                                               3760270
batch_index_queue            :   0:0,0:0,0:0,1:99,1:205,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0   1:212,0:0,1:25,1:87,0:0,1:13,1:33,1:66,1:199,1:183,1:221,1:256,1:198,1:39,1:0,0:0,0:0,0:0,1:26,0:0,0:0,0:0,0:0,0:0,1:53,0:0,0:0,0:0,0:0,1:172,1:206,0:0
batch_index_timeout          :   0                                                                                                                                    0
batch_index_unused_buffers   :   1210                                                                                                                                 1513
batch_initiate               :   0                                                                                                                                    0
batch_queue                  :   0                                                                                                                                    0
batch_timeout                :   0                                                                                                                                    0
batch_tree_count             :   0                                                                                                                                    0

$iostat -x 1 3
Linux 4.2.0-18-generic (aerospike1)     01/05/2016  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.31    0.00    7.56    5.50    0.00   84.64

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.79    4.69    0.97   152.23    21.28    61.31     0.01    2.37    2.43    2.07   1.46   0.83
sdb               0.03  1828.07 1955.80   62.88 87448.89  8924.42    95.48     3.81    1.88    0.55   43.26   0.08  15.45

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.57    0.00   18.91    0.00    0.00   76.52

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.91    0.00   16.27    0.00    0.00   79.82

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    1.00    4.00     4.00    16.00     8.00     0.01    2.40   12.00    0.00   2.40   1.20
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

最佳答案 你的磁盘似乎是瓶颈.如果您看到asadm延迟输出,则60%的读取时间超过8ms.您可以使用iostat命令进行交叉检查.基于延迟,我猜你正在使用旋转驱动器.

根据您的配置,您的数据不在内存中.因此,每次读取都需要访问磁盘,所有读取都将是磁盘上的随机读取.这对于旋转驱动器来说并不好.

当数据仅在磁盘上时,Aerospike建议使用ssd.如果对命名空间使用“数据内存”选项,则可以将数据保留在旋转驱动器中.请阅读有关Aerospike存储选项的更多信息.

点赞