python – 我的算法的运行时间复杂度 – 我如何计算并进一步优化算法？

2019年8月4日 147次阅读

我设计了一个递归算法并用
Python写下来.当我用不同的参数测量运行时间时,它似乎需要指数时间.此外;以50这样的小数字结束需要半个多小时.(我没等到它完成,但它似乎没有在合理的时间内完成,猜测它是指数级的).

所以,我很好奇这个算法的运行时复杂性.有人可以帮我推导出方程T(n,m)吗？或者计算大哦？

算法如下：

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

我不知道是否必须理解问题和算法才能得出最坏情况下的运行时间,但这是我试图解决的问题：

问题：

给定源字符串SRC和搜索字符串SEA,在SRC中找到子序列SEA并返回在SRC中找到SEA的每个字符的索引.如果SEA中的字符可以位于SRC中的多个位置,则为该字符位置返回-1.

例如;
如果源字符串是BRRBRBR(N = 7)并且搜索字符串是BBB：
然后’BBB’中的第一个’B’可以出现在搜索字符串中的索引0处.第二个“B”可以位于搜索字符串的索引3处,而最后一个“B”可以位于第5个位置.此外;对于字符’BBB’的位置没有其他选择,因此算法返回[0,3,5].

在另一种情况下,源字符串是BRRBRB(N = 6),搜索字符串是RBR：
‘RBR’的第一个’R’可以位于1或2位置.这只留下’B’的位置3和最后’R’的位置4.然后,第一个’R’可以在多个地方,它的位置是不明确的.另外两个字符B和R只有一个地方.所以算法返回[-1,4,5].

算法没有完成并永远占用的情况是源字符串是[‘B’,’R’,’R’,’B’,’R’,’B’,’R’,’B’ ,’B’,’B’,’R’,’B’,’R’,’B’,’B’,’B’,’R’,’B’,’R’,’B’,’ B’,’B’,’R’,’B’,’B’,’B’,’B’,’B’,’R’,’B’,’R’,’B’,’B’ ,’B’,’B’,’B’,’R’,’B’,’B’,’B’,’R’,’B’,’R’,’B’,’B’,’ B’,’R’,’B’,’B’,’B’,’B’,’B’,’R’,’B’,’B’,’B’,’B’,’B’ ](N = 58)
搜索字符串是RBRRBRBBRBRRBBRRBBBRRBBBRR.它应该返回[-1,-1,-1,-1,-1,-1,-1,-1,17,18,19,23,-1,-1,-1,-1,-1,-1 ,-1,-1,-1,-1,-1,-1,-1,47,53],但遗憾的是它不=(

优化：

我想在’索引’列表完全充满-1时停止搜索.但这只会影响最佳情况(或平均情况),但不会影响最坏情况.如何进一步优化该算法.我知道存在这个问题的多项式解决方案.

比优化更重要的是,我真的很好奇运行时间的T(n,m)方程,其中n和m是源和搜索字符串的长度.

如果你能够在这里阅读,非常感谢！ =)

编辑 – IVIad的解决方案实施：

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

最佳答案您可以在O(n m)中执行此操作,其中m是SEA中的字符数,n是SRC中的字符数：

last = 1
for i = 1 to m do
    while SRC[last] != SEA[i]
        ++last

    print last
    ++last (skip this match)

基本上,对于SEA中的每个角色,找到它在SRC中的位置,但只能在找到前一个角色的位置后扫描.

For instance; if the source string is BRRBRBR (N=7) and the search string is BBB
然后：在SRC中找到B：在last = 1处找到
打印1,设置last = 2.
在SRC中找到B：在最后找到= 4,打印4,设置最后= 5.
在SRC中找到B：找到最后= 6,打印6,设置最后= 7.完成.
至于算法的复杂性,我无法提供非常正式的分析,但我会尝试解释我是如何进行的.
假设SRC和SEA中以及它们之间的所有字符都相等.因此,我们可以消除while循环中的条件.另请注意,while循环执行n次.
请注意,对于第一个字符,您将调用find(1,1),… find(m,n).但是这些也会启动while循环并进行自己的递归调用.对于i = 1到n,每个find(i,j)将在其while循环中进行O(m)递归调用.但是这些反过来会自己进行更多的递归调用,从而产生一种导致指数复杂性的“雪崩效应”.
所以你有了：
i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
       find(2, 2) calls find(3, 3), ..., find(m, n)
       find(3, 3) calls find(4, 4), ..., find(m, n)
       find(4, 4) calls find(5, 5), ..., find(m, n)
       ...
       total calls: O(m^m)
i = 2: same, but start from find(2, 3).
...
i = n: same
总复杂度因此看起来像O(n * m ^ m).我希望这是有道理的,我没有犯任何错误.