ElasticSearch路由哈希算法的实现

     “All problems in computer science can be solved by another level of indirection.” – David J. Wheeler


       学习ElasticSearch必须要认清ElasticSearch和Lucene的关系:ElasticSearch是建立在一组Lucene索引基础上的抽象层, 它的每一个 shard(无论primary 和 replica)都是一个完整和独立的Lucene索引实例。说白了,也就是在一组Lucene索引上建了 “another level of indirection”,这层indirection带来的好处就是 : 分布式、可扩展和强容错的索引集群系统。其实,Lucene索引本身也是一层indirection, 真正的索引内容是存储在被称为segment单元中。一个Lucene索引由多个segment组成,segment是immutable的,也就是创建后不能再修改的,各种索引内容的缓存也都是基于每个segment的。

       所以,当ElasticSearch收到一个为文档建立索引的请求时,它首先要做出的决定就是要在哪一个shard上对文档进行索引并保存结果。在具体实现上,ElasticSearch采用的是djb2 哈希算法对要索引文档的指定(或者默认的)key进行哈希,得到哈希结果后取模上(mod) ElasticSearch索引shard数目 n,公式如下:

        

         hash(key)modnhash(key)modn 


djb2, 俗称“Times33”算法,并不复杂, 不复杂所以计算效率才高,如下所示。问题:这样的哈希算法有啥好处呢, 为啥要乘以33 ??????

    unsigned long hash(unsigned char *str)
    {
        unsigned long hash = 5381;
        int c;

        while (c = *str++)
            hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

        return hash;
    }


Elasticsearch对djb2的实现代码在 DjbHashFunction.java 文件中:

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the “License”); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *   
http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

package org.elasticsearch.cluster.routing.operation.hash.djb;

import org.elasticsearch.cluster.routing.operation.hash.HashFunction;

/**
 * This class implements the efficient hash function
 * developed by <i>Daniel J. Bernstein</i>.
 */
public class DjbHashFunction implements HashFunction {

    public static int DJB_HASH(String value) {
        long hash = 5381;

        for (int i = 0; i < value.length(); i++) {
            hash = ((hash << 5) + hash) + value.charAt(i);
        }

        return (int) hash;
    }

    public static int DJB_HASH(byte[] value, int offset, int length) {
        long hash = 5381;

        final int end = offset + length;
        for (int i = offset; i < end; i++) {
            hash = ((hash << 5) + hash) + value[i];
        }

        return (int) hash;
    }

    @Override
    public int hash(String routing) {
        return DJB_HASH(routing);
    }

    @Override
    public int hash(String type, String id) {
        long hash = 5381;

        for (int i = 0; i < type.length(); i++) {
            hash = ((hash << 5) + hash) + type.charAt(i);
        }

        for (int i = 0; i < id.length(); i++) {
            hash = ((hash << 5) + hash) + id.charAt(i);
        }

        return (int) hash;
    }
}

  

    原文作者:哈希算法
    原文地址: https://blog.csdn.net/quicknet/article/details/42117891
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞