Python与数据结构[4] -> 散列表[0] -> 散列表与散列函数的 Python 实现

2019年2月24日 352次阅读来源: StackLike

散列表 / Hash Table

散列表与散列函数

散列表是一种将关键字映射到特定数组位置的一种数据结构，而将关键字映射到0至TableSize-1过程的函数，即为散列函数。

Hash Table:
        [0] -> A
        [1] -> B
        [2] -> C
        [3] -> D
        [4] -> E

下面以一个简单的散列函数 Hash(Key)=Key mod TableSize为例，完成一个散列表的实现。

Note: 为方便起见，这里选用了一个非素数作为TableSize，适宜的TableSize应为一个素数。

完整代码

《Python与数据结构[4] -> 散列表[0] -> 散列表与散列函数的 Python 实现》” /><br /> <img layer-src= 1 from collections import Iterable 2 3 4 class CollisionError(Exception): 5 pass 6 7 8 class HashTable: 9 """ 10 Hash Table: 11 [0] -> A 12 [1] -> B 13 [2] -> C 14 [3] -> D 15 [4] -> E 16 """ 17 def __init__(self, size, fn): 18 self._array = [None for i in range(size)] 19 self._hashing = fn 20 21 def __str__(self): 22 return '\n'.join('[%d] %s' % (index, item) for index, item in enumerate(self._array)) 23 24 def find(self, item): 25 hash_code = self._hashing(item) 26 value = self._array[hash_code] 27 return value if value == item else None, hash_code 28 29 def insert(self, *args): 30 for i in args: 31 if isinstance(i, Iterable): 32 for j in i: 33 self._insert(j) 34 else: 35 self._insert(i) 36 37 def _insert(self, item): 38 if item is None: 39 return 40 hash_code = self._hashing(item) 41 value = self._array[hash_code] 42 if value is not None and value != item: # Handle value 0 and value existed situation. 43 raise CollisionError('Hashing value collided!') 44 self._array[hash_code] = item 45 46 def delete(self, item): 47 hash_code = self._hashing(item) 48 if self._array[hash_code] != item: 49 raise KeyError('Key error with %s' % item) 50 self._array[hash_code] = None 51 52 def show(self): 53 print(self) 54 55 @property 56 def size(self): 57 return len(self._array) 58 59 @property 60 def load_factor(self): 61 element_num = sum(map(lambda x: 0 if x is None else 1, self._array)) 62 return element_num/self.size 63 64 def make_empty(self): 65 self._array = [None for i in range(self.size)] 66 67 68 def kmt_hashing(size): 69 # Key = Key mod TableSize 70 return lambda x: x % size 71 72 73 def test(h): 74 print('\nShow hash table:') 75 h.show() 76 77 print('\nInsert values:') 78 h.insert(7, 8, 9) 79 h.insert(range(7)) 80 h.show() 81 print('\nInsert values (existed):') 82 h.insert(1) 83 h.show() 84 print('\nInsert value (collided):') 85 try: 86 h.insert(11) 87 except CollisionError as e: 88 print(e) 89 90 print('\nFind value:') 91 print(h.find(7)) 92 print('\nFind value (not existed):') 93 print(h.find(77)) 94 95 print('\nDelete value:') 96 h.delete(7) 97 h.show() 98 print('\nDelete value (not existed):') 99 try: 100 h.delete(111) 101 except KeyError as e: 102 print(e) 103 104 print('\nLoad factor is:', h.load_factor) 105 print('\nClear hash table:') 106 h.make_empty() 107 h.show() 108 109 if __name__ == '__main__': 110 test(HashTable(10, kmt_hashing(10)))

View Code

分段解释

首先导入一个可迭代类，用于判断参数类型时使用，并定义一个散列冲突异常类

1 from collections import Iterable
2 
3 
4 class CollisionError(Exception):
5     pass

接着定义一个散列表类，构造函数接收两个参数，一个用于设置散列表的大小，一个用于设置散列函数，

Note: 由于Python的列表无法像C语言中的数组一样提前声明大小，因此这里的列表需要先用None进行填充。

 1 class HashTable:
 2     """
 3     Hash Table:
 4         [0] -> A
 5         [1] -> B
 6         [2] -> C
 7         [3] -> D
 8         [4] -> E
 9     """
10     def __init__(self, size, fn):
11         self._array = [None for i in range(size)]
12         self._hashing = fn

再重载__str__方法，用于更加清晰的显示散列表，

1     def __str__(self):
2         return '\n'.join('[%d] %s' % (index, item) for index, item in enumerate(self._array))

定义散列表的find方法，find方法的时间复杂度为O(1)，查找时仅需根据键值计算哈希值，再从散列表中获取元素即可。返回查找到的结果和对应哈希值，若未找到元素则返回None和最后查找的位置。

Note: O(1)的前提是散列函数足够简单快速

1     def find(self, item):
2         hash_code = self._hashing(item)
3         value = self._array[hash_code]
4         return value if value == item else None, hash_code

定义散列表的insert方法，首先对传入的参数进行判断，若为可迭代对象则迭代插入，否则直接插入。私有的插入方法将利用散列函数对插入值进行散列计算，然后插入对应位置，若对应位置已被占有，则引发一个冲突异常。

 1     def insert(self, *args):
 2         for i in args:
 3             if isinstance(i, Iterable):
 4                 for j in i:
 5                     self._insert(j)
 6             else:
 7                 self._insert(i)
 8 
 9     def _insert(self, item):
10         if item is None:
11             return
12         hash_code = self._hashing(item)
13         value = self._array[hash_code]
14         if value is not None and value != item:      # Handle value 0 and value existed situation.
15             raise CollisionError('Hashing value collided!')
16         self._array[hash_code] = item

定义散列表的delete方法，当需要删除某个值时，同样先进行散列计算，找到对应散列位置，若该位置的值与删除值不同，则引发一个键错误异常，若相同或为None，则直接删除该元素。

1     def delete(self, item):
2         hash_code = self._hashing(item)
3         if self._array[hash_code] != item:
4             raise KeyError('Key error with %s' % item)
5         self._array[hash_code] = None

接着定义散列表几个基本方法，包括显示散列表，获取散列表大小，计算装填因子和清空散列表。

 1     def show(self):
 2         print(self)
 3 
 4     @property
 5     def size(self):
 6         return len(self._array)
 7 
 8     @property
 9     def load_factor(self):
10         element_num = sum(map(lambda x: 0 if x is None else 1, self._array))
11         return element_num/self.size
12 
13     def make_empty(self):
14         self._array = [None for i in range(self.size)]

最后，定义一个简单的散列函数Hash(Key)=Key mode TableSize。

1 def kmt_hashing(size):
2     # Key = Key mod TableSize
3     return lambda x: x % size

以及一个测试函数，对散列表进行测试。

首先显示一个初始的散列表，

1 def test(h):
2     print('\nShow hash table:')
3     h.show()

得到结果

Show hash table:
[0] None
[1] None
[2] None
[3] None
[4] None
[5] None
[6] None
[7] None
[8] None
[9] None

接着测试插入方法，向散列表中插入元素

1     print('\nInsert values:')
2     h.insert(7, 8, 9)
3     h.insert(range(7))
4     h.show()

得到结果

Insert values:
[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
[6] 6
[7] 7
[8] 8
[9] 9

尝试插入已存在的元素，则没有影响，而尝试插入一个冲突元素，则会引发一个冲突异常

1     print('\nInsert values (existed):')
2     h.insert(1)
3     h.show()
4     print('\nInsert value (collided):')
5     try:
6         h.insert(11)
7     except CollisionError as e:
8         print(e)

显示结果

Insert values (existed):
[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
[6] 6
[7] 7
[8] 8
[9] 9

Insert value (collided):
Hashing value collided!

尝试查找一个存在的元素和一个不存在的元素

1     print('\nFind value:')
2     print(h.find(7))
3     print('\nFind value (not existed):')
4     print(h.find(77))

得到结果

Find value:
(7, 7)

Find value (not existed):
(None, 7)

尝试删除一个存在元素和一个不存在的元素

1     print('\nDelete value:')
2     h.delete(7)
3     h.show()
4     print('\nDelete value (not existed):')
5     try:
6         h.delete(111)
7     except KeyError as e:
8         print(e)

得到结果

Delete value:
[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
[6] 6
[7] None
[8] 8
[9] 9

Delete value (not existed):
'Key error with 111'

查看装载因子，最后清空散列表

1     print('\nLoad factor is:', h.load_factor)
2     print('\nClear hash table:')
3     h.make_empty()
4     h.show()

得到结果

Load factor is: 0.9

Clear hash table:
[0] None
[1] None
[2] None
[3] None
[4] None
[5] None
[6] None
[7] None
[8] None
[9] None

一个基本的散列表基本建立完成，但还存在一个插入冲突的问题没有解决，对于插入冲突现象，解决的方式主要有分离链接法和开放定址法，具体内容可参考相关阅读。

相关阅读

1. 分离链接法

2. 开放定址法

    原文作者：StackLike
    原文地址: https://www.cnblogs.com/stacklike/p/8298353.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。