python-3.x – textstat / readability包Python 3的经验

2023年8月28日 404次阅读

这里的任何人曾经在
python中使用过可读性0.2或textstat 0.3.1包吗？无法在SO上找到任何涉及此主题的内容或任何有关此问题的好文档.

到目前为止我的代码是：
它遍历本地存储的一堆txt文件,并将结果(可读性度量)打印到主文本文件中.

from textstat.textstat import textstat
import os
import glob
import contextlib


@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w', encoding="utf-8")
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()


def readability():
        os.chdir(r"F:\Level1\Level2")
        with stdout2file("Results_readability.txt"):
                for file in glob.iglob("*.txt"):  # iterates over all files in the directory ending in .txt
                        with open(file, encoding="utf8") as fin:
                                contents = fin.read()
                                if __name__ == '__main__':
                                        print(textstat.flesch_reading_ease(contents))
                                        print(file.split(os.path.sep)[-1], end=" | ")
                                        print(textstat.smog_index(contents), end="\n ")
                                        print(file.split(os.path.sep)[-1], end=" | ")
                                        print(textstat.gunning_fog(contents), end="\n ")

这很好用,但我有两个问题：

>是否可以将我的主文件存储到另一个目录中？如果我使用上面的代码,我的masterfile创建在与我的迭代文件相同的目录中,这是毫无意义的……
>任何人都有经验这些包的准确性如何？我刚刚在textstat和http://www.webpagefx.com/tools/read-able/check.php/http://gunning-fog-index.com/中测试了相同的字符串,并在所有度量上获得了显着不同的结果？

任何帮助赞赏.

最佳答案我怀疑textstat使用不同的系数.一个简单的检查：在一个由一个音节组成的单词组成的句子上运行它.我使用了文字“号”：

In: textstat.flesch_kincaid_grade("No.")
Out: -4.6

但根据文献中的公式,答案应该是-3.4
(那是0.39 * 1 11.8 * 1-15.59)