python – 清除自己的字符串中包含的字符串列表

我有一个文本文件lists.txt,如下所示:

HI family what are u doing ?
HI Family
what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu

我想清理它,删除任何完全包含在其他行中的行.也就是说,我想要这样的东西:

HI family, what are u doing ?
The best Pokemon is Pikachu
Channel 5 is very cheap

我已经尝试计算大量的字符串,然后将其与grep进行比较,在大的results.txt上找到sorts results.txt,但它没什么效果.

最佳答案 如果我正确理解了您的问题,您需要获取字符串列表并从中删除任何字符串,这些字符串是列表中其他字符串的子字符串.

在伪代码中

outer: for string s in l
    for string s2 in l
        if s substringOf s2
            continue outer
    print s

即为每个字符串循环一次字符串,如果其内部循环中的任何测试匹配,则取消外部循环的每次运行.

这是bash中该算法的实现.注意,正在通过重定向运算符读取文件(list.txt)

$cat list.txt
HI family what are u doin?
HI family what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu
$while read line; do while read line2; do [[ $line2 != $line ]] && [[ $line2 == *$line* ]] && continue 2; done <list.txt; echo "$line"; done <list.txt
HI family what are u doin?
Channel 5 is very cheap
The best Pokemon is Pikachu
$
点赞