我试图在R中写一个表格,我已经用html格式给出了. Rvest在将所有文本排除在表格之外非常有用,但我希望保持
HTML表单中的内联样式.
例如,表中的文本可能是
"This is a sentence <BR> this is another sentence"
我想保留BR
我试过在整个表格中阅读:
my_table <- my_table_html %>%
html_nodes("table") %>%
html_table(fill=TRUE)
我也尝试在表格中选择特定列:
my_column <- my_table_html %>%
html_nodes(".Tabletitle:nth-child(2)") %>%
html_text()
任何想法将不胜感激
最佳答案
library(rvest)
pg <- read_html("This is a sentence <BR> this is another sentence")
xml_find_all(pg, ".//br") %>% xml_add_sibling("p", "\n")
xml_find_all(pg, ".//br") %>% xml_remove()
html_text(pg)
## [1] "This is a sentence \n this is another sentence"