我对一些tidyr行为感到困惑.我可以取消这样的单一回复:
library(tidyr)
resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
tidy <- data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest()
# Source: local data frame [6 x 3]
#
# resp2 resp3 resp1
# (chr) (chr) (chr)
# 1 C; D; F NA A
# 2 NA NA B
# 3 NA NA A
# 4 C; F G; H; I B
# 5 D H; I NA
# 6 E I B
但是我需要在数据集中删除多个列,并且列具有不同数量的NA.我试过这个并且它抛出一个错误:
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest()
# Error: All nested columns must have the same number of elements.
我希望上面的代码能给我与以下相同的输出:
# unnesting multiple response (desired output / is there a better way?)
data %>%
transform(resp1 = strsplit(resp1, "; ")) %>%
unnest() %>%
transform(resp2 = strsplit(resp2, "; ")) %>%
unnest() %>%
transform(resp3 = strsplit(resp3, "; ")) %>%
unnest()
# resp1 resp2 resp3
# (chr) (chr) (chr)
# 1 A C NA
# 2 A D NA
# 3 A F NA
# 4 B NA NA
# 5 A NA NA
# 6 B C G
# 7 B C H
# 8 B C I
# 9 B F G
# 10 B F H
# 11 B F I
# 12 NA D H
# 13 NA D I
# 14 B E I
我是R的新手,但这让人觉得笨拙,让我想知道我是否在滥用我不应该滥用的东西.多次尝试失败的尝试失败了怎么回事?
最佳答案 检查
this link,它显示了从您的列中删除多列的不同情况.根据文档和给出的链接,除非有一些聪明的方法来执行此操作,否则可能只为单个列定义函数以避免歧义.
所以你可能不得不逐个删除你的列,下面给出的代码可能仍然很麻烦,但简化了一点.
> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
resp1 resp2 resp3
1 A C; D; F <NA>
2 B; A <NA> <NA>
3 B C; F G; H; I
4 <NA> D H; I
5 B E I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
resp2 = strsplit(resp2, "; "),
resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
resp1 resp2 resp3
1 A C <NA>
2 A D <NA>
3 A F <NA>
4 B <NA> <NA>
5 A <NA> <NA>
6 B C G
7 B C H
8 B C I
9 B F G
10 B F H
11 B F I
12 <NA> D H
13 <NA> D I
14 B E I