POJ 2406 连续重复字符串（KMP）和后缀数组

2019年3月17日 315次阅读来源: KMP算法

问题描述：给定一个字符串L，已知这个字符串是由某个字符串S重复R次而得到的，求R的最大值。

方法一：后缀数组。

从长度为1开始枚举到长度为n，如果n%i==0，那么判断LCS (suff(i+1),suff(1))是否等于n-i。

根据h可以求得LCS，其中lcs(i,j)=min{h[rank[i]+1],…,h[rank[j]]}，其中假设rank[i]<rank[j]。

通过这种方法竟然超时，但是还是把代码贴出来吧：

#include <iostream> #include <cstdlib> #include <cstdio> #include <cstring> using namespace std; const int max_value=1000001; char m[max_value]; int sa[max_value],lstsa[max_value]; int rank[max_value],lstrank[max_value]; int h[max_value]; int pow; int cmp1(const void * a,const void *b) { const int * x = (const int *)a; const int * y = (const int *)b; return m[*x]-m[*y]; } int cmp2(const void * a,const void * b) { const int * x= (const int *)a; const int * y = (const int *)b; if(lstrank[*x]!=lstrank[*y]) return lstrank[*x]-lstrank[*y]; else return lstrank[*x+pow]-lstrank[*y+pow]; } void suffix_array(int n) { int i=0; int j = 0; //caculate sa and rank when pow = 1; for(i=1;i<=n;i++) sa[i]=i; qsort(sa+1,n,sizeof(int),cmp1); for(i=1,j=0;i<=n;i++) { if(i==1 || m[sa[i]]!=m[sa[i-1]])//when there are some continous-muti,e.”aabbbaa”,they have the same rank j++; rank[sa[i]]=j; } //caculate sa and rank when pow<<1 using lstrank and lstsa for(pow=1;pow<n;pow<<=1)//pow<<1 { memcpy(lstsa,sa,sizeof(int)*n); memcpy(lstrank,rank,sizeof(int)*n); qsort(sa+1,n,sizeof(int),cmp2); for(i=1,j=0;i<=n;i++) { if(i==1 || cmp2(&sa[i],&sa[i-1])!=0) j++; rank[sa[i]]=j; } } } //cal the LCS between rank-adjacent suff //h[i] is the LCS between suff(sa[i]) and suff(sa[i-1]) //h[rank[i]] is the LCS between suff(i) and suff(sa[[rank[i]-1]) //h[rank[i]]>=h[rank[i-1]]-1 //that means “the LCS between the suff(i) and suff(sa[rank[i]-1])” is at least “the LCS between the suff(i-1) and suff(sa[rank[i]-1])-1” //so to calculate h[rank[i]], firstly calculate h[rank[i-1]] void cal_height(int n ) { int i=0,j=0,k=0; for(i=1;i<=n;i++) { if(rank[i]==1) { h[rank[i]]=k=0; } else { if(k>0) k–;//sub the first ‘char’ of suff(i-1) is the suff(i) j = sa[rank[i]-1]; for(;m[i+k]==m[j+k];k++);//start from k, the cmp before k has been done when calculating h[rank[i-1]] h[rank[i]] = k; } } } void solve(int n) { int i = 0; int j = 0; for(i=1;i<=n/2;i++)//从长度位1开始试 { if(n%i != 0) continue; //计算LCS(1,i+1) ,如果可以有长度为i的串重复多次构成，则有 //LCS(1,i+1)=n-i+1 //LCS(i,j)=min{h[rank[i+1]],…,h[rank[j]] },假设rank[i]<rank[j] int lsc=0; int s,e; if(rank[1]<rank[i+1]) continue; for(j=rank[i+1]+1;j<=rank[1];j++) if(h[j]>lsc) lsc=h[j]; if(lsc == n-i) { printf(“%d/n”,n/i); return; } } printf(“%d/n”,1); } int main() { scanf(“%s”,m+1); while(true) { if(m[1]==’.’) break; int len = strlen(m+1); suffix_array(len); cal_height(len); solve(len); memset(lstsa,0,sizeof(lstsa)); memset(sa,0,sizeof(sa)); memset(lstrank,0,sizeof(lstrank)); memset(rank,0,sizeof(rank)); memset(h,0,sizeof(h)); memset(m,0,sizeof(m)); scanf(“%s”,m+1); } }

方法二：KMP的next数组（怪自己KMP没有学好，打死也想不出来）

对于数组s[0~n-1]，计算next[0~n]（多计算一位）。

考虑next[n]，假设t=n-next[n]，如果n%t==0，则t就是问题的解，否则解为1。

这样考虑：

比如字符串”abababab”,

a b a b a b a b *

next -1 0 1 2 3 4 5 6 7

考虑这样的模式匹配，将”abababab#”当做主串，”abababab*”当做模式串，于是进行匹配到n（n=8）时,出现了不匹配：

主串 abababab#

模式串 abababab*

于是模式串需要回溯到next[*]=7，这之前的主串和模式串对应相等，于是需要模式串向右滚动的位移是d=n-next[n]=2，即：

123456789

主串 abababab#

模式串 abababab*

于是可以看出，s[0~1]=s[3~4]=s[5~6]=s[7~8]。

所以位移d=n-next[n]可以看作是构成字符串s的字串（如果n%d==0，存在这样的构成），相应的重复次数也就是n/d。

#include <iostream> #include <cstdlib> #include <cstdio> #include <cstring> int next[1000100]; char s[1000100]; void get_next(int n) { int i=0; next[i]=-1; int j=next[0]; while(i<=n) { if(j==-1 || s[i]==s[j]) { i++; j++; next[i]=j; } else j = next[j]; } } int main() { scanf(“%s”,s); while(true) { if(s[0]==’.’) break; int len = strlen(s); get_next(len); int t = next[len]; if(len%(len-t)==0) printf(“%d/n”, len/(len-t)); else printf(“%d/n”,1); memset(s,0,sizeof(0)); memset(next,0,sizeof(0)); scanf(“%s”,s); } }

    原文作者：KMP算法
    原文地址: https://blog.csdn.net/clearriver/article/details/4690869
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。