字符串算法 —— 两字符串相同的单词

2023年6月8日 129次阅读来源: Inside_Zhang

1. navie：集合 intersect

以集合的形式分别存放两字符串，然后求集合交集。

def common_words_naive(str1, str2):
	str1_set = set(str1.strip().split())
	str2_set = set(str2.strip().split())
	return str1_set & str2_set			# 集合 intersect

>> common_words_naive('I love word', 'I love China')
{'I', 'love'}

2. 使用 hash

根据字符串hash算法，对字符串1的单词分别求其hash值，时间空间复杂度均为 O ( n ) O(n) O(n)，并将hash值，存放在集合中

遍历字符串2中的单词，求其hash值，判断是否在字符串1的hash集合中，如果是，则为 common words

def bkdr_hash(str, seed=131):
    hash = 0
    for s in str:
        hash = hash*seed + ord(s)
    return hash & 0x7fffffff

将字符串hash为整数值的方法及其对比见种字符串Hash函数比较

def common_words_hash(str1, str2):
    words = str1.strip().split(' ')
    str1_hashset = set(bkdr_hash(word) for word in words)
    common_words = []
    for word in str2.strip().split(' '):
        if bkdr_hash(word) in str1_hashset:
            common_words.append(word)
    return common_words
    
>> common_words_hash('I love word', 'I love China')
{'I', 'love'}

references

https://www.glassdoor.com/Interview/Given-2-strings-find-the-common-words-along-with-the-time-and-space-complexity-How-would-you-optimize-the-algorithm-QTN_341530.htm#

    原文作者：Inside_Zhang
    原文地址: https://blog.csdn.net/lanchunhui/article/details/82861733
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。