文档章节

poj_3080Blue Jeans && poj_3450Corporate Identity(KMP)

N3verL4nd
 N3verL4nd
发布于 2017/03/25 10:20
字数 1185
阅读 1
收藏 0

Blue Jeans
Time Limit:1000MS Memory Limit:65536K
Total Submissions:9476 Accepted:3983

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalities
AGATAC
CATCATCAT
枚举子串,然后KMP。

#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
#pragma warning(disable : 4996)
int n, m;
int Next[1000];
char str[15][65];
char ans[65];

void get_next(char *s, int m)
{
	char pat[65] = {0};
	strcpy(pat + 1, s);
	Next[1] = 0;
	int i, j = 0;
	for(i = 2; i <= m; i++)
	{
		while(j > 0 && pat[j+1] != pat[i])
		{
			j = Next[j];
		}
		if(pat[j+1] == pat[i])
		{
			j += 1;
		}
		Next[i] = j;
	}
}
bool kmp(char *A, char *B, int n, int m)
{
	char pat[65] = {0};
	char text[65] = {0};
	strcpy(text + 1, A);
	strcpy(pat + 1, B);
	int i, j = 0;
	for(i = 1; i <= n; i++)
	{
		while(j > 0 && pat[j+1] != text[i])
		{
			j = Next[j];
		}
		if(pat[j+1] == text[i])
		{
			j += 1;
		}
		if(j == m)
		{
			return true;
		}
	}
	return false;
}

bool check(char *s, int tot) //匹配串tmp
{
	int n, m;
	m = strlen(s);
	for (int i = 2; i <= tot; i++)
	{
		n = strlen(str[i] + 1);
		if(!kmp(str[i] + 1, s, n, m))
		{
			return false;
		}
	}
	return true;
}

int main()
{
	freopen("in.txt", "r", stdin);
	int t, n, length, len;
	char tmp[65] = {0};
	scanf("%d", &t);
	while (t--)
	{
		scanf("%d", &n);
		for(int i = 1; i <= n; i++)
		{
			scanf("%s", str[i] + 1);
		}
		len = strlen(str[1] + 1);
		length = 0;
		for(int i = 1; i <= len; i++)
		{
			for(int j = 1; j <= len - i + 1; j++)
			{
				strncpy(tmp, str[1] + j, i);
				memset(Next, 0, sizeof(Next));
				get_next(tmp, i);
				if(check(tmp, n))
				{
					if(i >= length)
					{
						if(strcmp(tmp, ans) < 0 && i == length)
						{
							strcpy(ans, tmp);
							length = strlen(ans);
						}
						else
						{
							strcpy(ans, tmp);
							length = strlen(ans);
						}
					}
				}
				memset(tmp, 0, sizeof(tmp));
			}
		}
		if(length >= 3)
		{
			printf("%s\n", ans);
		}
		else
		{
			printf("no significant commonalities\n");
		}
	}
	return 0;
}

Corporate Identity
Time Limit:3000MS Memory Limit:65536K
Total Submissions:3998 Accepted:1514

Description

Beside other services, ACM helps companies to clearly state their “corporate identity”, which includes company logo but also other signs, like trademarks. One of such companies is Internet Building Masters (IBM), which has recently asked ACM for a help with their new identity. IBM do not want to change their existing logos and trademarks completely, because their customers are used to the old ones. Therefore, ACM will only change existing trademarks instead of creating new ones.

After several other proposals, it was decided to take all existing trademarks and find the longest common sequence of letters that is contained in all of them. This sequence will be graphically emphasized to form a new logo. Then, the old trademarks may still be used while showing the new identity.

Your task is to find such a sequence.

Input

The input contains several tasks. Each task begins with a line containing a positive integer N, the number of trademarks (2 ≤ N ≤ 4000). The number is followed by N lines, each containing one trademark. Trademarks will be composed only from lowercase letters, the length of each trademark will be at least 1 and at most 200 characters.

After the last trademark, the next task begins. The last task is followed by a line containing zero.

Output

For each task, output a single line containing the longest string contained as a substring in all trademarks. If there are several strings of the same length, print the one that is lexicographically smallest. If there is no such non-empty string, output the words “IDENTITY LOST” instead.

Sample Input

3
aabbaabb
abbababb
bbbbbabb
2
xyz
abc
0

Sample Output

abb
IDENTITY LOST


#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
#pragma warning(disable : 4996)
char str[4005][205];
char tmp[205], ans[205];
int Next[4005];
void get_next(int m, char *B)
{
	char pat[205];
	strcpy(pat + 1, B);
	Next[1] = 0;
	int i, j = 0;
	for(i = 2; i <= m; i++)
	{
		while(j > 0 && pat[j+1] != pat[i])
		{
			j = Next[j];
		}
		if(pat[j+1] == pat[i])
		{
			j += 1;
		}
		Next[i] = j;
	}
}
bool kmp(char *A, char *B, int n, int m) //主串 模式串 主串长度 模式串长度
{
	char text[205], pat[205];
	strcpy(text + 1, A);
	strcpy(pat + 1, B);
	int i, j = 0;
	for(i = 1; i <= n; i++)
	{
		while(j > 0 && pat[j+1] != text[i])
		{
			j = Next[j];
		}
		if(pat[j+1] == text[i])
		{
			j += 1;
		}
		if(j == m)
		{
			return true;
		}
	}
	return false;
}

bool check(char *s, int t)
{
	for(int i = 2; i <= t; i++)
	{
		int n = strlen(str[i] + 1);
		int m = strlen(s);
		if(!kmp(str[i] + 1, s, n, m))
		{
			return false;
		}
	}
	return true;
}

int main()
{
	freopen("in.txt", "r", stdin);
	int t, length;
	while (scanf("%d", &t) != EOF)
	{
		if(t == 0)
		{
			break;
		}
		length = 0;
		for(int i = 1; i <= t; i++)
		{
			scanf("%s", str[i] + 1);
		}
		int len = strlen(str[1] + 1);
		for(int i = 1; i <= len; i++) // i 代表匹配串长度
		{
			for(int j = 1; j <= len - i + 1; j++)
			{
				strncpy(tmp, str[1] + j, i);
				memset(Next, 0, sizeof(Next));
				get_next(i, tmp);
				if(check(tmp, t))
				{
					if(i >= length)
					{
						if(strcmp(tmp, ans) < 0 && i == length)
						{
							strcpy(ans, tmp);
						}
						if(i > length)
						{
							strcpy(ans, tmp);
						}
						length = strlen(ans);
					}
				}
				memset(tmp, 0, sizeof(tmp));
			}
		}
		if(strlen(ans) >= 1)
		{
			printf("%s\n", ans);
			memset(ans, 0, sizeof(ans));
		}
		else
		{
			printf("IDENTITY LOST\n");
		}
	}

	return 0;
}




© 著作权归作者所有

N3verL4nd
粉丝 1
博文 379
码字总数 481243
作品 0
朝阳
私信 提问
算法进阶路径

第一阶段:练经典常用算法,下面的每个算法给我打上十到二十遍,同时自己精简代码, 因为太常用,所以要练到写时不用想,10-15分钟内打完,甚至关掉显示器都可以把程序打 出来. 1.最短路(Fl...

暖冰
2016/04/02
155
1
一个搞ACM需要掌握的算法

ACM的竞赛性强,因此自己应该和自己的实际应用联系起来.适合自己的才是好的,有的人不适合搞算法,喜欢系统架构,因此不要看到别人什么就眼红,发挥自己的长处,这才是重要的. 第一阶段:练经典常用...

long0404
2015/06/24
0
0
poj 2406 Power Strings

kmp优化过的求next的方法不能直接用 aaaaababab. Sample Output 143 Hint [Submit] [Go Back] [Status] [Discuss] /*===================================================================......

locusxt
2013/12/21
47
0
POJ3723 《挑战程序设计竞赛》踩坑

我看书上的代码,觉得这一行有错误, 所以我就没这样写,我写的是 在codeblocks运行的好好的,来了poj一直报错,debug两个多小时,终于发现,书里的题目和poj上的题目,x,y表示的正好相反啊...

小太阳花儿
2017/12/25
0
0
POJ的代码评审是如何实现的?

POJ上,提交一段代码,除了代码运行是否正确,还对程序的运行时间、空间都有限制,请问对程序运行的时空限制是如何做到的,通过编程控制(POJ支持的语言有c/c++/java/fortran/python/...)?...

J-will
2013/01/17
296
0

没有更多内容

加载失败,请刷新页面

加载更多

OpenStack 简介和几种安装方式总结

OpenStack :是一个由NASA和Rackspace合作研发并发起的,以Apache许可证授权的自由软件和开放源代码项目。项目目标是提供实施简单、可大规模扩展、丰富、标准统一的云计算管理平台。OpenSta...

小海bug
昨天
7
0
DDD(五)

1、引言 之前学习了解了DDD中实体这一概念,那么接下来需要了解的就是值对象、唯一标识。值对象,值就是数字1、2、3,字符串“1”,“2”,“3”,值时对象的特征,对象是一个事物的具体描述...

MrYuZixian
昨天
6
0
数据库中间件MyCat

什么是MyCat? 查看官网的介绍是这样说的 一个彻底开源的,面向企业应用开发的大数据库集群 支持事务、ACID、可以替代MySQL的加强版数据库 一个可以视为MySQL集群的企业级数据库,用来替代昂贵...

沉浮_
昨天
7
0
解决Mac下VSCode打开zsh乱码

1.乱码问题 iTerm2终端使用Zsh,并且配置Zsh主题,该主题主题需要安装字体来支持箭头效果,在iTerm2中设置这个字体,但是VSCode里这个箭头还是显示乱码。 iTerm2展示如下: VSCode展示如下: 2...

HelloDeveloper
昨天
9
0
常用物流快递单号查询接口种类及对接方法

目前快递查询接口有两种方式可以对接,一是和顺丰、圆通、中通、天天、韵达、德邦这些快递公司一一对接接口,二是和快递鸟这样第三方集成接口一次性对接多家常用快递。第一种耗费时间长,但是...

程序的小猿
昨天
11
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部