HBase region split 的一次排查

源于一个已经做了预分区的表,region发生了分裂

  1. 定位发生的时间, 是不是在load hfile的时候 ?

  2. 发生的原因, 是不是单个hfile太大了 ?

  3. 后续该如何避免?

以下分析基于hbase branch-1.2的分支

1. 定位发生的时间

通过对比导入hfiles的时间和发生split的时间,可以很明显的看出,不是在load的时候出的问题, 通过查看regionserver 的日志,可以看到以下几条:

2018-12-04 01:43:50,505 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: Completed compaction xxx
2018-12-04 01:43:50,571 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region xxx
2018-12-04 01:43:51,252 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Preparing to split 1 storefiles for region xxx
2018-12-04 01:43:52,669 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, hbase:meta updated, and report to master. Parent= xxx

可以看出是在做完compact之后,触发了split 的检查,然后检查下来要split才有了上面的日志, 具体源码如下:

收到compact的请求之后,起一个CompactionRunner来跑真正的compact

// org.apache.hadoop.hbase.regionserver.CompactSplitThread
private synchronized CompactionRequest requestCompactionInternal(final Region r, final Store s, final String why, int priority, CompactionRequest request, boolean selectNow, User user) {
    // ....
    ThreadPoolExecutor pool = (selectNow && s.throttleCompaction(compaction.getRequest().getSize()))
      ? longCompactions : shortCompactions;
    pool.execute(new CompactionRunner(s, r, compaction, pool, user));
    // ....
}

doCompaction 里面进行compactsplit检查及split

// CompactionRunner
private void doCompaction(User user) {
    // ....
    long start = EnvironmentEdgeManager.currentTime();
    boolean completed =
        region.compact(compaction, store, compactionThroughputController, user);
    long now = EnvironmentEdgeManager.currentTime();
    LOG.info(((completed) ? "Completed" : "Aborted") + " compaction: " +
          this + "; duration=" + StringUtils.formatTimeDiff(now, start));
    if (completed) {
      // degenerate case: blocked regions require recursive enqueues
      if (store.getCompactPriority() <= 0) {
        requestSystemCompaction(region, store, "Recursive enqueue");
      } else {
        // see if the compaction has caused us to exceed max region size
        requestSplit(region);
      }
    }
    // ....
}

在做compact的线程里面再起一个线程去执行split

public synchronized void requestSplit(final Region r, byte[] midKey, User user) {
    // ...
    this.splits.execute(new SplitRequest(r, midKey, this.server, user));
    // ...
}

2. 定位发生的原因

从上面可以知道,是因为compact 之后触发了split, 现在需要知道为什么split检查通过了 ?

public synchronized boolean requestSplit(final Region r) {
  // don't split regions that are blocking
  if (shouldSplitRegion() && ((HRegion)r).getCompactPriority() >= Store.PRIORITY_USER) {
    byte[] midKey = ((HRegion)r).checkSplit();
    if (midKey != null) {
      requestSplit(r, midKey);
      return true;
    }
  }
  return false;
}

要进入requestSplit需要满足2个条件:

  1. shouldSplitRegion() && ((HRegion)r).getCompactPriority() >= Store.PRIORITY_USER
  2. midKey != null (midKey = ((HRegion)r).checkSplit())

可以肯定的是发生了split的话,上述2个条件一定都满足了

1. 检查shouldSplitRegion

private boolean shouldSplitRegion() {
  if(server.getNumberOfOnlineRegions() > 0.9*regionSplitLimit) {
    LOG.warn("Total number of regions is approaching the upper limit " + regionSplitLimit + ". "
        + "Please consider taking a look at http://hbase.apache.org/book.html#ops.regionmgt");
  }
  return (regionSplitLimit > server.getNumberOfOnlineRegions());
}

以上代码涉及到2个变量:

  1. server.getNumberOfOnlineRegions(): 当前regionserver的在线region的个数
  2. regionSplitLimit: 整个hbase集群中region数的上限, regionSplitLimit = conf.getInt(REGION_SERVER_REGION_SPLIT_LIMIT, DEFAULT_REGION_SERVER_REGION_SPLIT_LIMIT)

因为当前hbase设置了整个参数并且非常大, 所以这个检查是通过的

2. 检查checkSplitmidKey

public byte[] checkSplit() {
  // Can't split META
  if (this.getRegionInfo().isMetaTable() ||
      TableName.NAMESPACE_TABLE_NAME.equals(this.getRegionInfo().getTable())) {
    if (shouldForceSplit()) {
      LOG.warn("Cannot split meta region in HBase 0.20 and above");
    }
    return null;
  }

  // Can't split region which is in recovering state
  if (this.isRecovering()) {
    LOG.info("Cannot split region " + this.getRegionInfo().getEncodedName() + " in recovery.");
    return null;
  }

  if (!splitPolicy.shouldSplit()) {
    return null;
  }

  byte[] ret = splitPolicy.getSplitPoint();

  if (ret != null) {
    try {
      checkRow(ret, "calculated split");
    } catch (IOException e) {
      LOG.error("Ignoring invalid split", e);
      return null;
    }
  }
  return ret;
}

从上面可以看到重点在 splitPolicy.shouldSplit(), 而这个splitPolicy 因为建表的时候,没有指定,所以用的是默认的,即: IncreasingToUpperBoundRegionSplitPolicy

@Override
protected boolean shouldSplit() {
  boolean force = region.shouldForceSplit();
  boolean foundABigStore = false;
  // Get count of regions that have the same common table as this.region
  int tableRegionsCount = getCountOfCommonTableRegions();
  // Get size to check
  long sizeToCheck = getSizeToCheck(tableRegionsCount);

  for (Store store : region.getStores()) {
    // If any of the stores is unable to split (eg they contain reference files)
    // then don't split
    if (!store.canSplit()) {
      return false;
    }

    // Mark if any store is big enough
    long size = store.getSize();
    if (size > sizeToCheck) {
      LOG.debug("ShouldSplit because " + store.getColumnFamilyName() + " size=" + size
                + ", sizeToCheck=" + sizeToCheck + ", regionsWithCommonTable="
                + tableRegionsCount);
      foundABigStore = true;
    }
  }

  return foundABigStore | force;
}

从上面可以看出在对region下的每个store做检查,来判断是不是进行split, 这个判断中用到了3个变量:

  1. tableRegionsCount: 在当前这个regionserver上的,当前表的region个数
  2. sizeToCheck: 获取一个文件大小的阈值, 这个阈值不止和当前集群的最大hfile的大小有关,还和initialSize, tableRegionsCount有关
  3. size: 当前hfile 的大小
protected long getSizeToCheck(final int tableRegionsCount) {
  // safety check for 100 to avoid numerical overflow in extreme cases
  return tableRegionsCount == 0 || tableRegionsCount > 100
             ? getDesiredMaxFileSize()
             : Math.min(getDesiredMaxFileSize(),
                        initialSize * tableRegionsCount * tableRegionsCount * tableRegionsCount);
}

这个方法需要着重注意一下:

  1. getDesiredMaxFileSize(): 获取的是一般是hbase.hregion.max.filesize, 默认10g

  2. initialSize: 一般是2倍的hbase.hregion.memstore.flush.size, 默认128M, 我们这边自己的环境是256M

  3. 《HBase region split 的一次排查》: 这个参数比较坑, 当 tableRegionsCount=1的时候为1, 当tableRegionsCount>1的时候很大

根据这些,我带入了当时集群的参数算了一下:

  1. getDesiredMaxFileSize(): 10g
  2. initialSize: 256M
  3. tableRegionsCount: 2

这样算下来: sizeToCheck=4g, size通过hfds命令看下来也就1.4g左右, 远远达不到split的要求

这里犯了个错误, tableRegionsCount, 看的是split之后的情况, 应该是split之前的情况, 这时候看了下HBase UI, 发现了问题:

《HBase region split 的一次排查》 hbase_region_split_vs_nonsplit_v2.jpg

可以看出,发生了splitregionserver B,之前应该是只有1个region的, 这样我们重新算一下: sizeToCheck=512M, 这样就会发生分裂了

而当每个regionserver上的region达到2个或以上的时候,一般就不再继续split,因为此时的阈值在4g以上了

3. 后续该怎么避免?

最省心的当然是在建表的时候,设置splitPolicyDisabledRegionSplitPolicy, 搭配预分区效果最好

alter 'table', {METADATA => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}

如果数据每天增量很大,就不适合这种方式, 需要参照以上split的分析, 选择region的个数,splitPolicy 来一起处理

ref: https://issues.apache.org/jira/browse/HBASE-16076?attachmentOrder=asc

    原文作者:丧诗
    原文地址: https://www.jianshu.com/p/a601fac23af8
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞