脚本的第一部分是OK(它删除了http://和www.).后来我需要检查源内的单词是否存在.
source = open('/net/sign/temp/python_tmp/script1/source.txt','r')
exists = open('/net/sign/temp/python_tmp/script1/exists.txt','r')
with source as f:
lines = f.read()
lines = lines.replace('http://','')
lines = lines.replace('www.','')
for a in open('/net/sign/temp/python_tmp/script1/exists.txt'):
if a == lines:
print("ok")
source.txt的内容:
www.yahoo.it
www.yahoo.com
www.google.com
http://www.libero.it
exists.txt的内容:
www.yahoo.com
最佳答案 这样的事情应该有效:
source_words = set()
with open('source.txt') as source:
for word in source.readlines():
source_words.add(word.replace('http://','').replace('www.','').strip())
exist_words = set()
with open('exist.txt') as exist:
for word in exist.readlines():
exist_words.add(word.replace('http://','').replace('www.','').strip())
print("There {} words from 'source.txt' in 'exists.txt'".format(
"are" if exist_words.intersection(source_words) else "aren't"
))
如果您需要获取两个文件中存在的确切单词,则它们位于交集结果中:
print("These words are in both files:")
for word in exist_words.intersection(source_words):
print(word)