tidyr:具有不同NA计数的多次取消

我对一些tidyr行为感到困惑.我可以取消这样的单一回复:

library(tidyr)

resp1 <- c("A", "B; A", "B", NA, "B")
resp2 <- c("C; D; F", NA, "C; F", "D", "E")
resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)

tidy <- data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest()

# Source: local data frame [6 x 3]
#
#      resp2   resp3 resp1
#      (chr)   (chr) (chr)
# 1 C; D; F      NA     A
# 2      NA      NA     B
# 3      NA      NA     A
# 4    C; F G; H; I     B
# 5       D    H; I    NA
# 6       E       I     B

但是我需要在数据集中删除多个列,并且列具有不同数量的NA.我试过这个并且它抛出一个错误:

data %>%
  transform(resp1 = strsplit(resp1, "; "),
            resp2 = strsplit(resp2, "; "),
            resp3 = strsplit(resp3, "; ")) %>%
  unnest()
# Error: All nested columns must have the same number of elements.

我希望上面的代码能给我与以下相同的输出:

# unnesting multiple response (desired output / is there a better way?)
data %>%
  transform(resp1 = strsplit(resp1, "; ")) %>%
  unnest() %>%
  transform(resp2 = strsplit(resp2, "; ")) %>%
  unnest() %>%
  transform(resp3 = strsplit(resp3, "; ")) %>%
  unnest()

#     resp1 resp2 resp3
#     (chr) (chr) (chr)
# 1      A     C    NA
# 2      A     D    NA
# 3      A     F    NA
# 4      B    NA    NA
# 5      A    NA    NA
# 6      B     C     G
# 7      B     C     H
# 8      B     C     I
# 9      B     F     G
# 10     B     F     H
# 11     B     F     I
# 12    NA     D     H
# 13    NA     D     I
# 14     B     E     I

我是R的新手,但这让人觉得笨拙,让我想知道我是否在滥用我不应该滥用的东西.多次尝试失败的尝试失败了怎么回事?

最佳答案 检查
this link,它显示了从您的列中删除多列的不同情况.根据文档和给出的链接,除非有一些聪明的方法来执行此操作,否则可能只为单个列定义函数以避免歧义.

所以你可能不得不逐个删除你的列,下面给出的代码可能仍然很麻烦,但简化了一点.

> resp1 <- c("A", "B; A", "B", NA, "B")
> resp2 <- c("C; D; F", NA, "C; F", "D", "E")
> resp3 <- c(NA, NA, "G; H; I", "H; I", "I")
> data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F)
> data
  resp1   resp2   resp3
1     A C; D; F    <NA>
2  B; A    <NA>    <NA>
3     B    C; F G; H; I
4  <NA>       D    H; I
5     B       E       I
library(tidyr)
library(dplyr)
data %>%
transform(resp1 = strsplit(resp1, "; "),
          resp2 = strsplit(resp2, "; "),
          resp3 = strsplit(resp3, "; ")) %>%
unnest(resp1) %>% unnest(resp2) %>% unnest(resp3)
   resp1 resp2 resp3
1      A     C  <NA>
2      A     D  <NA>
3      A     F  <NA>
4      B  <NA>  <NA>
5      A  <NA>  <NA>
6      B     C     G
7      B     C     H
8      B     C     I
9      B     F     G
10     B     F     H
11     B     F     I
12  <NA>     D     H
13  <NA>     D     I
14     B     E     I
点赞