【HDU5510 2015沈阳赛区B】【KMP or strstr for循环剪枝】Bazinga 循环处思维灵活转化时间复杂度均摊思想

2019年3月17日 392次阅读来源: KMP算法

Bazinga

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others)
Total Submission(s): 235 Accepted Submission(s): 98

Problem Description Ladies and gentlemen, please sit up straight.
Don’t tilt your head. I’m serious.
《【HDU5510 2015沈阳赛区B】【KMP or strstr for循环剪枝】Bazinga 循环处思维灵活转化时间复杂度均摊思想》
For n given strings S1,S2,⋯,Sn, labelled from 1 to n, you should find the largest i (1≤i≤n) such that there exists an integer j (1≤j<i) and Sj is not a substring of Si.

A substring of a string Si is another string that occurs in Si. For example, “ruiz” is a substring of “ruizhang”, and “rzhang” is not a substring of “ruizhang”.
Input The first line contains an integer t (1≤t≤50) which is the number of test cases.
For each test case, the first line is the positive integer n (1≤n≤500) and in the following n lines list are the strings S1,S2,⋯,Sn.
All strings are given in lower-case letters and strings are no longer than 2000 letters.
Output For each test case, output the largest label you get. If it does not exist, output −1.
Sample Input

4 5 ab abc zabc abcd zabcd 4 you lovinyou aboutlovinyou allaboutlovinyou 5 de def abcd abcde abcdef 3 a ba ccc
Sample Output

Case #1: 4 Case #2: -1 Case #3: 4 Case #4: 3
Source 2015ACM/ICPC亚洲区沈阳站-重现赛（感谢东北大学）

strstr写法：

#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<math.h>
#include<iostream>
#include<string>
#include<set>
#include<map>
#include<vector>
#include<queue>
#include<bitset>
#include<algorithm>
#include<time.h>
using namespace std;
void fre(){freopen("c://test//input.in","r",stdin);freopen("c://test//output.out","w",stdout);}
#define MS(x,y) memset(x,y,sizeof(x))
#define MC(x,y) memcpy(x,y,sizeof(x))
#define MP(x,y) make_pair(x,y)
#define ls o<<1
#define rs o<<1|1
typedef long long LL;
typedef unsigned long long UL;
typedef unsigned int UI;
template <class T> inline void gmax(T &a,T b){if(b>a)a=b;}
template <class T> inline void gmin(T &a,T b){if(b<a)a=b;}
const int N=0,M=0,Z=1e9+7,ms63=1061109567;
int casenum,casei;
int n;
char s[505][2020];
bool e[505];
int bf()
{
	for(int i=n;i>=1;i--)
	{
		for(int j=1;j<i;j++)if(!strstr(s[i],s[j]))return i;
	}
	return -1;
}
int solve()
{
	MS(e,1);int ans=-1;
	for(int i=1;i<n;i++)//枚举子串
	{
		for(int j=i+1;j<=n;j++)if(e[j])//枚举还不一定是满足串的母串
		{
			//如果j是i之后的第一个满足要求的串，使得s[i]是s[j]的子串，那么s[i]对于后串的意义s[j]都能起到。使s[i] break即可。
			if(strstr(s[j],s[i]))break;
			//否则s[i]不是s[j]的子串，那么s[j]就是满足要求的串，以后就不用再比较了。
			else {e[j]=0;gmax(ans,j);}
		}
	}
	return ans;
}
int main()
{
	scanf("%d",&casenum);
	for(casei=1;casei<=casenum;casei++)
	{
		scanf("%d",&n);
		for(int i=1;i<=n;i++)scanf("%s",s[i]);
		//printf("Case #%d: %d\n",casei,bf());
		printf("Case #%d: %d\n",casei,solve());
	}
	return 0;
}
/*
【题意】
有共计T（[1,50]）组数据
给你n（[1,500]）个串，每个串都是小写字符串，长度可达2000。
定义：s[i]是满足要求的串的条件是——至少存在一个j(1<=j<i)，s[j]不是s[i]的子串。
让你找到最大的i(1<=i<=n)，使得s[i]是满足要求的串。

【类型】
for循环处思维灵活转化 时间复杂度均摊思想 

【分析】
暴力做法有两种。
首先，最直接最暴力的做法是——
int bf()
{
	for(int i=n;i>=1;i--)
	{
		for(int j=1;j<i;j++)
		{
			if(!strstr(s[j],s[i]))return i;
		}
	}
	return -1;
}
这个时间复杂度是O(Tnnlen)，可达50*500*500*2000=250e8，即250亿，爆炸。
但是转念一想，这种数据所对应的输出可达50*500*2000=5e7，即50MB，是不可能的，读入就爆炸了。于是，如果常数小，这样做是可以AC的。
而事实上，对于随机数据，这种暴力其实比下面的暴力，效率更高，更容易AC。

然后，另外一种暴力，是我想要针对特定构造数据（其实是针对了自己TwT）所写。
以每个串为子串筛后面的所有串：
如果它不是后串的子串，那么break掉。答案不会比当前后串的编号小。
如果它是后串的子串，那么我们可以把后串删掉。
（就是这里想错了。我们应当删掉的串不是后串，而是这个串——如果顺着这里想下去，改变for循环的顺序，for j=i+1 to n，也许就能很快做出来了。唉，还是自己思考的时间太少了，思维太不灵活了。）

正确的做法是什么呢？
参照代码，先升序枚举i，再升序枚举j。
如果j是i之后的第一个满足要求的串，使得s[i]是s[j]的子串，那么s[i]对于后串的意义s[j]都能起到。使s[i] break即可。
否则s[i]不是s[j]的子串，那么s[j]就是满足要求的串，以后就不用再比较"某个串是不是s[j]的子串"了。
这个时间复杂度是什么呢？
对于每次（s[i]是不是s[j]）的子串：如果是，break掉，s[i]不再匹配；如果不是，s[j]被确定满足要求，s[j]不再匹配。
所以每次比较都会有一个串失去后序匹配作用。
于是时间复杂度是O(T(n^2+nlen))，最大不过50*500*2000=5e7，就是一个完全可以AC的时间复杂度了。

【时间复杂度&&优化】
O(Tnnlen)->O(T(n^2+nlen))

【trick&&吐槽】
1，题目：做题要看题目名称暗示。B题题目bazinga是"逗你玩"的意思，然后我们真的被这题捉弄了。
2，读题：不要太依赖队友的读题，做一道题之前一定要自己读一遍，形成一个独立、系统的认知。很多时候，水题做不出来，都是队友开题，然后甩给我，我的思维附带了他们之前的错误思路，也就很难走出去。
3，策略：不要让队友卡题，尤其这种傻X题，不如让自己来卡。不要对自己生疏的算法有所恐惧，要挑起队伍的旗帜。
4，思维：思维要灵活。这题其实关键就是两个for循环的顺序，只要我都试着思考下，尝试下，很快就能做出来的。
5，我一开始的暴力做法，是想要剪枝的，但是在细节地方没有想清楚，可是如果抓住问题，思维严谨有序地想下去，也会很快出解。思考时间（而不是编码时间）应该是解决问题的大头，想清楚细节再做题是非常重要的。
6，strstr(母串，子串)返回的是NULL或者母串的匹配首位点指针。这个实际比KMP都要快。用这个的话这道题也不会卡住了。

总结——所以对于水题：
1，我来做
2，重新系统读遍题
3，灵活地做思维转化

【数据】
input
4
5
ab
abc
zabc
abcd
zabcd
4
you
lovinyou
aboutlovinyou
allaboutlovinyou
5
de
def
abcd
abcde
abcdef
3
a
ba
ccc

output
Case #1: 4
Case #2: -1
Case #3: 4
Case #4: 3

*/

KMP写法

#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<math.h>
#include<iostream>
#include<string>
#include<set>
#include<map>
#include<vector>
#include<queue>
#include<bitset>
#include<algorithm>
#include<time.h>
using namespace std;
void fre(){freopen("c://test//input.in","r",stdin);freopen("c://test//output.out","w",stdout);}
#define MS(x,y) memset(x,y,sizeof(x))
#define MC(x,y) memcpy(x,y,sizeof(x))
#define MP(x,y) make_pair(x,y)
#define ls o<<1
#define rs o<<1|1
typedef long long LL;
typedef unsigned long long UL;
typedef unsigned int UI;
template <class T> inline void gmax(T &a,T b){if(b>a)a=b;}
template <class T> inline void gmin(T &a,T b){if(b<a)a=b;}
const int N=0,M=0,Z=1e9+7,ms63=1061109567;
int casenum,casei;
int n;
char s[505][2020];
bool e[505];
int len[505];
int nxt[505][2020];
//求得模板串的fail指针
void getnxt(int u)
{
	int j=-1;nxt[u][0]=-1;
	for(int i=1;i<len[u];i++)
	{
		while(j>=0&&s[u][j+1]!=s[u][i])j=nxt[u][j];
		if(s[u][j+1]==s[u][i])j++;
		nxt[u][i]=j;
	}
}
//查询匹配串s[v]中是否含有匹配串s[u]
bool kmp(int v,int u)
{
	int j=-1;
	for(int i=0;i<len[v];i++)
	{
		while(j>=0&&s[u][j+1]!=s[v][i])j=nxt[u][j];
		if(s[u][j+1]==s[v][i])j++;
		if(j==len[u]-1)return 1;
	}
	return 0;
}
int solve()
{
	MS(e,1);int ans=-1;
	for(int i=1;i<n;i++)//枚举子串
	{
		for(int j=i+1;j<=n;j++)if(e[j])//枚举还不一定是满足串的母串
		{
			//如果j是i之后的第一个满足要求的串，使得s[i]是s[j]的子串，那么s[i]对于后串的意义s[j]都能起到。使s[i] break即可。
			if(kmp(j,i))break;
			//否则s[i]不是s[j]的子串，那么s[j]就是满足要求的串，以后就不用再比较了。
			else {e[j]=0;gmax(ans,j);}
		}
	}
	return ans;
}
int main()
{
	scanf("%d",&casenum);
	for(casei=1;casei<=casenum;casei++)
	{
		scanf("%d",&n);
		for(int i=1;i<=n;i++)
		{
			scanf("%s",s[i]);
			len[i]=strlen(s[i]);
			getnxt(i);
		}
		printf("Case #%d: %d\n",casei,solve());
	}
	return 0;
}
/*
【题意】
有共计T（[1,50]）组数据
给你n（[1,500]）个串，每个串都是小写字符串，长度可达2000。
定义：s[i]是满足要求的串的条件是——至少存在一个j(1<=j<i)，s[j]不是s[i]的子串。
让你找到最大的i(1<=i<=n)，使得s[i]是满足要求的串。

【类型】
for循环处思维灵活转化 时间复杂度均摊思想 

【分析】
暴力做法有两种。
首先，最直接最暴力的做法是——
int bf()
{
	for(int i=n;i>=1;i--)
	{
		for(int j=1;j<i;j++)
		{
			if(!strstr(s[j],s[i]))return i;
		}
	}
	return -1;
}
这个时间复杂度是O(Tnnlen)，可达50*500*500*2000=250e8，即250亿，爆炸。
但是转念一想，这种数据所对应的输出可达50*500*2000=5e7，即50MB，是不可能的，读入就爆炸了。于是，如果常数小，这样做是可以AC的。
而事实上，对于随机数据，这种暴力其实比下面的暴力，效率更高，更容易AC。

然后，另外一种暴力，是我想要针对特定构造数据（其实是针对了自己TwT）所写。
以每个串为子串筛后面的所有串：
如果它不是后串的子串，那么break掉。答案不会比当前后串的编号小。
如果它是后串的子串，那么我们可以把后串删掉。
（就是这里想错了。我们应当删掉的串不是后串，而是这个串——如果顺着这里想下去，改变for循环的顺序，for j=i+1 to n，也许就能很快做出来了。唉，还是自己思考的时间太少了，思维太不灵活了。）

正确的做法是什么呢？
参照代码，先升序枚举i，再升序枚举j。
如果j是i之后的第一个满足要求的串，使得s[i]是s[j]的子串，那么s[i]对于后串的意义s[j]都能起到。使s[i] break即可。
否则s[i]不是s[j]的子串，那么s[j]就是满足要求的串，以后就不用再比较"某个串是不是s[j]的子串"了。
这个时间复杂度是什么呢？
对于每次（s[i]是不是s[j]）的子串：如果是，break掉，s[i]不再匹配；如果不是，s[j]被确定满足要求，s[j]不再匹配。
所以每次比较都会有一个串失去后序匹配作用。
于是时间复杂度是O(T(n^2+nlen))，最大不过50*500*2000=5e7，就是一个完全可以AC的时间复杂度了。

【时间复杂度&&优化】
O(Tnnlen)->O(T(n^2+nlen))

【trick&&吐槽】
1，题目：做题要看题目名称暗示。B题题目bazinga是"逗你玩"的意思，然后我们真的被这题捉弄了。
2，读题：不要太依赖队友的读题，做一道题之前一定要自己读一遍，形成一个独立、系统的认知。很多时候，水题做不出来，都是队友开题，然后甩给我，我的思维附带了他们之前的错误思路，也就很难走出去。
3，策略：不要让队友卡题，尤其这种傻X题，不如让自己来卡。不要对自己生疏的算法有所恐惧，要挑起队伍的旗帜。
4，思维：思维要灵活。这题其实关键就是两个for循环的顺序，只要我都试着思考下，尝试下，很快就能做出来的。
5，我一开始的暴力做法，是想要剪枝的，但是在细节地方没有想清楚，可是如果抓住问题，思维严谨有序地想下去，也会很快出解。思考时间（而不是编码时间）应该是解决问题的大头，想清楚细节再做题是非常重要的。
6，strstr(母串，子串)返回的是NULL或者母串的匹配首位点指针。这个实际比KMP都要快。用这个的话这道题也不会卡住了。

总结——所以对于水题：
1，我来做
2，重新系统读遍题
3，灵活地做思维转化

【数据】
input
4
5
ab
abc
zabc
abcd
zabcd
4
you
lovinyou
aboutlovinyou
allaboutlovinyou
5
de
def
abcd
abcde
abcdef
3
a
ba
ccc

output
Case #1: 4
Case #2: -1
Case #3: 4
Case #4: 3

*/

    原文作者：KMP算法
    原文地址: https://blog.csdn.net/snowy_smile/article/details/49535087
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。