java数据结构和算法(哈希表)

2019年3月19日 173次阅读来源: 哈希算法

什么是哈希表？

哈希表是一种数据结构，提供快速的插入和查找操作。
优点：
插入、查找、删除的时间级为O(1)；
数据项占哈希表长的一半，或者三分之二时，哈希表的性能最好。
缺点：
基于数组，数组创建后难于扩展，某些哈希表被基本填满时性能下降的非常严重；
没有一种简单的方法可以以任何一种顺序（如从小到大）遍历整个数据项；
用途：
不需要遍历数据并且可以提前预测数据量的大小，此时哈希表的速度和易用性无与伦比。

哈希化

就是把关键字转化为数组的下标，在哈希表中通过哈希函数来完成。对于某些关键字并不需要哈希函数进行哈希化，如员工编号。

哈希函数：作用是将大的多整数范围转化为数组的下标范围。一般通过取余%来完成。key%arraySize。

冲突：哈希函数不能保证每个每个关键字都映射到数组的空白元素的位置。解决冲突的方法分为：开放地址法和链地址法。

开放地址法

若数据不能直接放在由哈希函数计算出来的数组下标所致的单元时，就要寻找数组的其他位置。该方法会发生聚集，那些哈希化后的落在聚集内的数据项，都要一步一步的移动，并且插入在聚集的最后，因此使聚集越来越大。聚集越大，它增长的越快。

线性探测

在线性探测中，线性的查找空白单元，如52的位置被占用了，就找53、54。。。的位置直到找到空位。

聚集：一串连续的已填充单元叫做填充序列。增加越来越多的数据项时，填充序列变的越来越长，这就叫聚集。

如果把哈希表填的太满，那么在表中每填一个数据都要花费很长的时间。

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：哈希表中存储的数据项
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·d·许
 * @since 1.0
 */
public class DataItem {
	private int item;// 数据项key

	public DataItem(int item) {
		this.item = item;
	}

	public int getKey() {
		return item;
	}
}

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：使用线性探测法实现哈希表
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·d·许
 * @since 1.0
 */
public class Hash {
	// 保存哈希表中的数据
	private DataItem[] hashArray;
	// 哈希表的长度
	private int arraySize;
	// 可以被删除的哈希项
	private DataItem nonItem;

	/**
	 * 初始化哈希表
	 * 
	 * @param size
	 */
	public Hash(int size) {
		arraySize = size;
		hashArray = new DataItem[arraySize];
		nonItem = new DataItem(-1);// 被删除的项为-1
	}

	// /////////////////////////////////////////////////////////
	/**
	 * 遍历哈希表
	 */
	public void displayTable() {
		System.out.print("Table: ");
		for (int i = 0; i < arraySize; i++) {
			if (hashArray[i] != null) {
				System.out.print(hashArray[i].getKey() + "\t");
			} else {
				System.out.print("**\t");
			}
		}
		System.out.println("");
	}

	// /////////////////////////////////////////////////////
	/**
	 * 哈希函数
	 * 
	 * @param key
	 * @return
	 */
	public int hashFunc(int key) {
		return key % arraySize;
	}

	// //////////////////////////////////////////////////////
	public void insert(DataItem item) {
		int key = item.getKey();
		int hashVal = hashFunc(key);

		while (hashArray[hashVal] != null && hashArray[hashVal].getKey() != -1) {
			++hashVal;// 直到找到空白元素
			hashVal %= arraySize;// 保证不越界
		}
		// 插入数据
		hashArray[hashVal] = item;
	}

	// //////////////////////////////////////////////
	public DataItem delete(int key) {
		int hashVal = hashFunc(key);
		while (hashArray[hashVal] != null) {
			if (hashArray[hashVal].getKey() == key) {
				DataItem temp = hashArray[hashVal];
				hashArray[hashVal] = nonItem;
				return temp;
			}
			++hashVal;
			hashVal %= arraySize;
		}
		return null;
	}

	// ////////////////////////////////////////////
	public DataItem find(int key) {
		int hashVal = hashFunc(key);
		while (hashArray[hashVal] != null) {
			if (hashArray[hashVal].getKey() == key) {
				return hashArray[hashVal];
			} else {
				++hashVal;
				hashVal %= arraySize;
			}

		}
		return null;
	}

	// //////////////////////////////////////////////////////
	/**
	 * 得到一个长度为质数的数组长度
	 * 
	 * @param min
	 * @return
	 */
	@SuppressWarnings("unused")
	private int getPrime(int min) {
		for (int i = 0; true; i++) {
			if (isPrime(i)) {
				return i;
			}
		}
	}

	// ///////////////////////////////////
	/**
	 * 判断一个数是否为质数
	 * 
	 * @param n
	 * @return
	 */
	private boolean isPrime(int n) {
		for (int i = 2; i * i < n; i++) {
			if (n % i == 0) {
				return false;
			}
		}
		return true;
	}

}

扩展数组

哈希函数是根据数组的大小来计算给定数据项的大小的，所以不能简单的从一个数组向另一个数组拷贝数据。需要按照遍历老数组，用insert方法向新数组中插入每个数据项。

二次探测

二次探测试防止聚集产生的一种尝试，思想史探测相隔较远的单元，而不是和原始位置相邻的单元。

步骤是是步数的平方：x+1^2、x+2^2、x+3^2，而不是x+1、x+2。

但是当二次探测的搜索变长时，好像它变得越来越绝望，因为步长平方级增长，可能会飞出整个空间。造成二次聚集。

再哈希法

是一种依赖关键字的探测序列，而不是每个关键字都是一样的。

二次聚集产生的原因是，二次探测的算法产生的探测序列步长总是固定的1,4,9,16.

方法是：把关键字用不同的哈希函数再哈希一遍，用这个结果作为步长，对指定的关键字，步长在整个探测中是不变的，不过不同的关键字使用不同的步长。

第二个哈希函数的特点：

和第一个哈希函数不同。
不能输出0(否则将没有步长：每次探测都是原地踏步，算法将陷入死循环)。

专家发现下面形式的哈希函数工作的非常好：

stepSize=constant-（key%constant）;其中constant是质数，且小于数组容量。如stepSize=5-(key%5);

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：使用再哈希法实现哈希表 <br/>
 * 要求：表的容量是一个质数，否则再哈希时可能会取到0导致算法崩溃。
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·d·许
 * @since 1.0
 */
public class HashDouble {
	// 保存哈希表中的数据
	private DataItem[] hashArray;
	// 哈希表的长度
	private int arraySize;
	// 可以被删除的哈希项
	private DataItem nonItem;

	/**
	 * 初始化哈希表
	 * 
	 * @param size
	 */
	public HashDouble(int size) {
		arraySize = size;
		hashArray = new DataItem[arraySize];
		nonItem = new DataItem(-1);// 被删除的项为-1
	}

	// /////////////////////////////////////////////////////////
	/**
	 * 遍历哈希表
	 */
	public void displayTable() {
		System.out.print("Table: ");
		for (int i = 0; i < arraySize; i++) {
			if (hashArray[i] != null) {
				System.out.print(hashArray[i].getKey() + "\t");
			} else {
				System.out.print("**\t");
			}
		}
		System.out.println("");
	}

	// //////////////////////////////////////////////////////
	public void insert(DataItem item) {
		int key = item.getKey();
		int hashVal = hashFunc1(key);
		int stepSize = hashFunc2(key);
		while (hashArray[hashVal] != null && hashArray[hashVal].getKey() != -1) {
			hashVal += stepSize;// 直到找到空白元素
			hashVal %= arraySize;// 保证不越界
		}
		// 插入数据
		hashArray[hashVal] = item;
	}

	// //////////////////////////////////////////////
	public DataItem delete(int key) {
		int hashVal = hashFunc1(key);
		int stepSize = hashFunc2(key);
		while (hashArray[hashVal] != null) {
			if (hashArray[hashVal].getKey() == key) {
				DataItem temp = hashArray[hashVal];
				hashArray[hashVal] = nonItem;
				return temp;
			}
			hashVal += stepSize;
			hashVal %= arraySize;
		}
		return null;
	}

	// ////////////////////////////////////////////
	public DataItem find(int key) {
		int hashVal = hashFunc1(key);
		int stepSize = hashFunc2(key);
		while (hashArray[hashVal] != null) {
			if (hashArray[hashVal].getKey() == key) {
				return hashArray[hashVal];
			} else {
				hashVal += stepSize;
				hashVal %= arraySize;
			}
		}
		return null;
	}

	// //////////////////////////////////////////////////////
	/**
	 * 得到一个长度为质数的数组长度
	 * 
	 * @param min
	 * @return
	 */
	@SuppressWarnings("unused")
	private int getPrime(int min) {
		for (int i = 0; true; i++) {
			if (isPrime(i)) {
				return i;
			}
		}
	}

	// /////////////////////////////////////////////////////
	/**
	 * 哈希函数
	 * 
	 * @param key
	 * @return
	 */
	public int hashFunc1(int key) {
		return key % arraySize;
	}

	// /////////////////////////////////////////////////////
	/**
	 * 再哈希函数:生成步长
	 * 
	 * @param key
	 * @return
	 */
	public int hashFunc2(int key) {
		// 再哈希的余数必须是一个小于数组长度，不和hF1一样
		// 不能为为零
		// 是一个质数
		return 5 - (key % 5);
	}

	// ///////////////////////////////////
	/**
	 * 判断一个数是否为质数
	 * 
	 * @param n
	 * @return
	 */
	private boolean isPrime(int n) {
		for (int i = 2; i * i < n; i++) {
			if (n % i == 0) {
				return false;
			}
		}
		return true;
	}

}

链地址法

在哈希表中每个单元中设置链表。某个数据项的关键字还是像通常一样映射到哈希表的单元，而数据项本身插入到这个单元的链表中。其他同样映射到这个位置的数据项只需要加入到链表中，不需要在原始数组中寻找空位。

《java数据结构和算法(哈希表)》

优缺点：

有序链表不能加快成功的搜索，但可以减少不成功搜索中的一半的时间。

删除的时间级也减少一半

插入的时间延长，因为数据项不能只插在表头；插入前必须找到有序表中的正确位置。

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：链表中的节点
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·D·许
 * @since 1.0
 */
public class LinkItem {
	private int data;// 链表中的数据项
	public LinkItem nextLink;// 下一个节点

	public LinkItem(int data) {
		this.data = data;
	}

	public int getKey() {
		return data;
	}

	public void dispalyLink() {
		System.out.println(data + "\t");
	}
}

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：从小到大的有序链表
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·d·许
 * @since 1.0
 */
public class SortLink {
	// 链表的第一个元素
	private LinkItem first;

	// ////////////////////////////////
	public SortLink() {
		first = null;
	}

	// /////////////////////////////////////
	public void insert(LinkItem theLink) {
		int key = theLink.getKey();
		LinkItem previousItem = null;
		// 从第一个开始
		LinkItem currentItem = first;
		while (currentItem != null && key > currentItem.getKey()) {
			previousItem = currentItem;
			currentItem = currentItem.nextLink;
		}
		if (previousItem == null) {
			first = theLink;
		} else {
			previousItem.nextLink = theLink;
			theLink.nextLink = currentItem;
		}
	}

	// /////////////////////////////////////////
	public void delete(int key) {
		LinkItem previous = null;
		LinkItem current = first;
		while (current != null && current.getKey() != key) {
			previous = current;
			current = current.nextLink;
		}
		if (previous == null) {
			// 说明是第一个
			if (first != null) {
				first = first.nextLink;
			}
		} else {
			previous.nextLink = current.nextLink;
		}
	}

	// ////////////////////////////////////////////////
	public LinkItem find(int key) {
		LinkItem currentItem = first;
		while (currentItem != null && currentItem.getKey() <= key) {
			if (currentItem.getKey() == key) {
				return currentItem;
			}
			currentItem = currentItem.nextLink;
		}
		return null;
	}
	///////////////////////////////////////////////
	public void displayList(){
		LinkItem currentItem=first;
		while(currentItem!=null){
			currentItem.dispalyLink();
			currentItem=currentItem.nextLink;
		}
		System.out.println();
	}

}

package cn.xyc.dataStructure.hash;

/**
 * 
 * 描述：使用链地址法实现哈希表
 * 
 * <pre>
 * HISTORY
 * ****************************************************************************
 *  ID   DATE           PERSON          REASON
 *  1    2016年10月2日        80002253         Create
 * ****************************************************************************
 * </pre>
 * 
 * @author 蒙奇·d·许
 * @since 1.0
 */
public class HashLink {
	// 保存哈希表中的数据
	private SortLink[] hashArray;
	// 哈希表的长度
	private int arraySize;
	// 可以被删除的哈希项
	private SortLink nonItem;

	/**
	 * 初始化哈希表
	 * 
	 * @param size
	 */
	public HashLink(int size) {
		arraySize = size;
		hashArray = new SortLink[arraySize];
		// 填充数组
		for (int i = 0; i < hashArray.length; i++) {
			hashArray[i] = new SortLink();
		}
	}

	// /////////////////////////////////////////////////////////
	/**
	 * 遍历哈希表
	 */
	public void displayTable() {
		System.out.print("Table: ");
		for (int i = 0; i < arraySize; i++) {
			if (hashArray[i] != null) {
				hashArray[i].displayList();
			} else {
				System.out.print("**\t");
			}
		}
		System.out.println("");
	}

	// /////////////////////////////////////////////////////
	/**
	 * 哈希函数
	 * 
	 * @param key
	 * @return
	 */
	public int hashFunc(int key) {
		return key % arraySize;
	}

	// //////////////////////////////////////////////////////
	public void insert(LinkItem item) {
		int key = item.getKey();
		int hashVal = hashFunc(key);
		// 插入数据
		hashArray[hashVal].insert(item);
	}

	// //////////////////////////////////////////////
	public void delete(int key) {
		int hashVal = hashFunc(key);
		hashArray[hashVal].delete(key);
	}

	// ////////////////////////////////////////////
	public LinkItem find(int key) {
		int hashVal = hashFunc(key);
		LinkItem item = hashArray[hashVal].find(key);
		return item;
	}

	// //////////////////////////////////////////////////////
	/**
	 * 得到一个长度为质数的数组长度
	 * 
	 * @param min
	 * @return
	 */
	@SuppressWarnings("unused")
	private int getPrime(int min) {
		for (int i = 0; true; i++) {
			if (isPrime(i)) {
				return i;
			}
		}
	}

	// ///////////////////////////////////
	/**
	 * 判断一个数是否为质数
	 * 
	 * @param n
	 * @return
	 */
	private boolean isPrime(int n) {
		for (int i = 2; i * i < n; i++) {
			if (n % i == 0) {
				return false;
			}
		}
		return true;
	}

}

桶

另一种方法类似于链接地址法，它是在哈希表的每个单元中使用数组，而不是链表。这样的数组称为桶。

缺点：桶的容量太小会溢出，太大浪费空间。

开放地址法和链地址法的比较：

《java数据结构和算法(哈希表)》

项数未知：使用链地址法。

《java数据结构和算法(哈希表)》

    原文作者：哈希算法
    原文地址: https://blog.csdn.net/baidu_35761016/article/details/52723312
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。