字符串算法 —— 两字符串相同的单词

1. navie:集合 intersect

以集合的形式分别存放两字符串,然后求集合交集。

def common_words_naive(str1, str2):
	str1_set = set(str1.strip().split())
	str2_set = set(str2.strip().split())
	return str1_set & str2_set			# 集合 intersect

>> common_words_naive('I love word', 'I love China')
{'I', 'love'}

2. 使用 hash

  • 根据字符串hash算法,对字符串1的单词分别求其hash值,时间空间复杂度均为 O ( n ) O(n) O(n),并将hash值,存放在集合中

  • 遍历字符串2中的单词,求其hash值,判断是否在字符串1的hash集合中,如果是,则为 common words

    def bkdr_hash(str, seed=131):
        hash = 0
        for s in str:
            hash = hash*seed + ord(s)
        return hash & 0x7fffffff
    

    将字符串hash为整数值的方法及其对比见种字符串Hash函数比较

    def common_words_hash(str1, str2):
        words = str1.strip().split(' ')
        str1_hashset = set(bkdr_hash(word) for word in words)
        common_words = []
        for word in str2.strip().split(' '):
            if bkdr_hash(word) in str1_hashset:
                common_words.append(word)
        return common_words
        
    >> common_words_hash('I love word', 'I love China')
    {'I', 'love'}
    

references

https://www.glassdoor.com/Interview/Given-2-strings-find-the-common-words-along-with-the-time-and-space-complexity-How-would-you-optimize-the-algorithm-QTN_341530.htm#

    原文作者:Inside_Zhang
    原文地址: https://blog.csdn.net/lanchunhui/article/details/82861733
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞