我在看下面的示例代码,
r element frequency and column name
并且想知道除了r中的等级和频率之外,是否有任何方法可以显示每列中每个元素的索引.例如,所需的输入和输出将是
df <- read.table(header=T, text='A B C D
a a b c
b c x e
c d y a
d NA NA z
e NA NA NA
f NA NA NA',stringsAsFactors=F)
和输出
element frequency columns ranking A B C D
1 a 3 A,B,D 1 1 1 na 2
3 c 3 A,B,D 1 3 2 na 1
2 b 2 A,C 2 2 na 1 na
4 d 2 A,B 2 4 3 na na
5 e 2 A,D 2 5 na na 2
6 f 1 A 3 6 na na na
8 x 1 C 3 na na 2 na
9 y 1 C 3 na na 3 na
10 z 1 D 3 na na na 3
谢谢.
最佳答案 也许有一种方法可以一步到位,但目前还没有想到.所以,继续
my previous answer:
library(dplyr)
library(tidyr)
step1 <- df %>%
gather(var, val, everything()) %>% ## Make a long dataset
na.omit %>% ## We don't need the NA values
group_by(val) %>% ## All calculations grouped by val
summarise(column = toString(var), ## This collapses
freq = n()) %>% ## This counts
mutate(ranking = dense_rank(desc(freq))) ## This ranks
step2 <- df %>%
mutate(ind = 1:nrow(df)) %>% ## Add an indicator column
gather(var, val, -ind) %>% ## Go long
na.omit %>% ## Remove NA
spread(var, ind) ## Go wide
inner_join(step1, step2)
# Joining by: "val"
# Source: local data frame [9 x 8]
#
# val column freq ranking A B C D
# 1 a A, B, D 3 1 1 1 NA 3
# 2 b A, C 2 2 2 NA 1 NA
# 3 c A, B, D 3 1 3 2 NA 1
# 4 d A, B 2 2 4 3 NA NA
# 5 e A, D 2 2 5 NA NA 2
# 6 f A 1 3 6 NA NA NA
# 7 x C 1 3 NA NA 2 NA
# 8 y C 1 3 NA NA 3 NA
# 9 z D 1 3 NA NA NA 4