我有这样的数据,其中桶可以有不同数量的项目:
Bucket A | Item 1
Bucket A | Item 2
Bucket A | Item 3
Bucket B | Item 3
Bucket B | Item 4
Bucket C | Item 1
Bucket C | Item 5
Bucket C | Item 2
我想找到所有桶的项目重叠,所以我得到以下格式(左边的基础桶):
Bucket A | Bucket B | Bucket C
Bucket A 100% | 33% | 66%
Bucket B 50% | 100% | 0%
Bucket C 66% | 0% | 100%
最佳答案 这是使用dplyr的一种方法:
temp <- df %>%
group_by(V2) %>%
do(expand.grid(.$V1, .$V1, stringsAsFactors=FALSE)) %>%
ungroup() %>%
select(Var1, Var2) %>%
table()
temp / diag(temp)
Var2
Var1 Bucket A Bucket B Bucket C
Bucket A 1.0000000 0.3333333 0.6666667
Bucket B 0.5000000 1.0000000 0.0000000
Bucket C 0.6666667 0.0000000 1.0000000
数据
df <- structure(list(V1 = c("Bucket A ", "Bucket A ", "Bucket A ",
"Bucket B ", "Bucket B ", "Bucket C ", "Bucket C ", "Bucket C "
), V2 = c(" Item 1", " Item 2", " Item 3", " Item 3", " Item 4",
" Item 1", " Item 5", " Item 2")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-8L))