python – 在子树之前有多少叶子？

2019年7月20日 248次阅读

我正在使用
nltk trees来读取文本的stanford语法分析(使用Tree.fromstring()),并且我正在寻找在更大的树中找到给定子树的叶位置的方法.基本上,我想要与
leaf_treeposition()相反.

在树t中,我得到了子树np,我想要的是索引x,这样：

t.leaves()[x] == np.leaves()[0] # x = ???(t, np)

我不想使用t.leaves().index(…)因为可能在句子中有几个np的出现,我需要正确的而不是第一个.

我所拥有的是n中t的树位置(是ParentedTree),np.treeposition(),这样：

t[np.treeposition()] == np

我想一个繁琐的解决方案是在所有级别为np的所有left_siblings求和.或者我可以遍历所有叶子,直到leaf_treeposition(叶子)等于np.treeposition()“[0]”*,但这听起来不是最理想的.

有没有更好的办法？

最佳答案编辑：毕竟有一个简单的解决方案：

>构造子树的第一片叶子的树位置.
>在所有叶子树位置列表中查找.

建立：

>>> t = ParentedTree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
>>> np_pos = (1,1)
>>> np = t[np_pos]
>>> print(np)
(NP (D the) (N cat))

对于步骤1,我将np的树位置与树连接起来
np内第一片叶子的位置.所有叶子树位置的列表(步骤2)让我难以理解,直到我仔细观察并意识到它实际上在Tree API中实现(有点模糊)：treepositions()的order参数的特殊值.你所追求的x只是这个列表中target_leafpos的索引.

>>> target_leafpos = np.treeposition() + np.leaf_treeposition(0) # Step 1
>>> all_leaf_treepositions = t.treepositions("leaves")           # Step 2
>>> x = all_leaf_treepositions.index(target_leafpos)
>>> print(x)
3

如果你不介意不可读的代码,你甚至可以把它写成一行代码：

x = t.treepositions("leaves").index( np.treeposition()+np.leaf_treeposition(0) )