[Python Challenge通关]第2关 ocr

2019年5月19日 129次阅读来源: jianggushi

《[Python Challenge通关]第2关 ocr》 ocr

recognize the characters. maybe they are in the book,

but MAYBE they are in the page source.

挑战地址，点我

分析

根据提示需要查看网页源码，右键打开网页源码，可以看到有一段提示和一堆字符：

<!--
find rare characters in the mess below:
-->

<!--
%%$@_$^__#)^)&!_+]!*@&^}@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*}{}}!}_]$[%}@[{_@#_^{*

我们需要找出一堆字符中的稀有字符，先对这堆字符进行次数统计，这里用到了 collections 包中的 Counter 进行字符统计：

#!/usr/bin/env/ python3

from collections import Counter

text = '''复制那一堆字符进来'''
c = Counter(text)
print(c.most_common())

输出结果：

[(')', 6186), ('@', 6157), ('(', 6154), (']', 6152), ('#', 6115), ('_', 6112), ('[', 6108), ('}', 6105), ('%', 6104), ('!', 6079), ('+', 6066), ('$', 6046), ('{', 6046), ('&', 6043), ('*', 6034), ('^', 6030), ('\n', 1219), ('e', 1), ('q', 1), ('u', 1), ('a', 1), ('l', 1), ('i', 1), ('t', 1), ('y', 1)]

从结果来看有几个只出现了 1 次的字符，这应该就是稀有字符，而且看起来像是一个单词，过滤出来看下：

#!/usr/bin/env/ python3

from collections import Counter

text = '''复制那一堆字符进来'''
c = Counter(text)
print(''.join([i[0] for i in c.items() if i[1]==1]))

输出结果：

equality

仿照上一关，用 equality 替换当前页面的 url 就可以进入下一关 http://www.pythonchallenge.com/pc/def/equality.html。

补充

当然也可以使用 dict 来进行统计。

#!/usr/bin/env/ python3

wordcounts = {}
for c in text:
    wordcounts[c] = wordcounts.get(c, 0) + 1
print(wordcounts)
print(''.join([i[0] for i in wordcounts.items() if i[1]==1]))

参考资源：

    原文作者：jianggushi
    原文地址: https://www.jianshu.com/p/63a2201f558e
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。