1. navie:集合 intersect
以集合的形式分别存放两字符串,然后求集合交集。
def common_words_naive(str1, str2):
str1_set = set(str1.strip().split())
str2_set = set(str2.strip().split())
return str1_set & str2_set # 集合 intersect
>> common_words_naive('I love word', 'I love China')
{'I', 'love'}
2. 使用 hash
根据字符串hash算法,对字符串1的单词分别求其hash值,时间空间复杂度均为 O ( n ) O(n) O(n),并将hash值,存放在集合中
遍历字符串2中的单词,求其hash值,判断是否在字符串1的hash集合中,如果是,则为 common words
def bkdr_hash(str, seed=131): hash = 0 for s in str: hash = hash*seed + ord(s) return hash & 0x7fffffff
将字符串hash为整数值的方法及其对比见种字符串Hash函数比较
def common_words_hash(str1, str2): words = str1.strip().split(' ') str1_hashset = set(bkdr_hash(word) for word in words) common_words = [] for word in str2.strip().split(' '): if bkdr_hash(word) in str1_hashset: common_words.append(word) return common_words >> common_words_hash('I love word', 'I love China') {'I', 'love'}