我想将R包coreNLP生成的解析树转换为data.tree R包格式.使用以下代码生成解析树:
options( java.parameters = "-Xmx2g" )
library(NLP)
library(coreNLP)
#initCoreNLP() # change this if downloaded to non-standard location
initCoreNLP(annotators = "tokenize,ssplit,pos,lemma,parse")
## Some text.
s <- c("A rare black squirrel has become a regular visitor to a suburban garden.")
s <- as.String(s)
anno<-annotateString(s)
parse_tree <- getParse(anno)
parse_tree
The output parse tree is as follows:
> parse_tree
[1] "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
我发现发布Visualize Parse Tree Structure之后
它将openNLP包生成的解析树转换为树格式.但是解析树与coreNLP生成的解析树不同,并且解决方案也没有转换为我想要的data.tree格式.
编辑
通过添加以下2行,我们可以使用发布Visualize Parse Tree Structure中提供的功能
# this step modifies coreNLP parse tree to mimic openNLP parse tree
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(igraph)
library(NLP)
parse2graph(parse_tree, # plus optional graphing parameters
title = sprintf("'%s'", x), margin=-0.05,
vertex.color=NA, vertex.frame.color=NA,
vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
edge.width=1.5, edge.color='black', edge.arrow.size=0)
但我想要的是将解析树转换为data.tree包提供的data.tree格式
最佳答案 获得边缘列表后,转换为data.tree是微不足道的.仅替换parse2graph函数的最后一位,并将样式移出函数:
parse2tree <- function(ptext) {
stopifnot(require(NLP) && require(igraph))
## Replace words with unique versions
ms <- gregexpr("[^() ]+", ptext) # just ignoring spaces and brackets?
words <- regmatches(ptext, ms)[[1]] # just words
regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words)))) # add id to words
## Going to construct an edgelist and pass that to igraph
## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
edgelist <- matrix('', nrow=length(words)-2, ncol=2)
## Function to fill in edgelist in place
edgemaker <- (function() {
i <- 0 # row counter
g <- function(node) { # the recursive function
if (inherits(node, "Tree")) { # only recurse subtrees
if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
for (child in node$children) {
childval <- if(inherits(child, "Tree")) child$value else child
i <<- i+1
edgelist[i,1:2] <<- c(val, childval)
}
}
invisible(lapply(node$children, g))
}
}
})()
## Create the edgelist from the parse tree
edgemaker(Tree_parse(ptext))
tree <- FromDataFrameNetwork(as.data.frame(edgelist))
return (tree)
}
parse_tree <- "(ROOT\r\n (S\r\n (NP (DT A) (JJ rare) (JJ black) (NN squirrel))\r\n (VP (VBZ has)\r\n (VP (VBN become)\r\n (NP (DT a) (JJ regular) (NN visitor))\r\n (PP (TO to)\r\n (NP (DT a) (JJ suburban) (NN garden)))))\r\n (. .)))\r\n\r\n"
parse_tree <- gsub("[\r\n]", "", parse_tree)
parse_tree <- gsub("ROOT", "TOP", parse_tree)
library(data.tree)
tree <- parse2tree(parse_tree)
tree
SetNodeStyle(tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow")
plot(tree)