如何使用R将频率转换为文本?

我有这样的数据帧(ID,频率A B C D E)

ID A B C D E    
1  5 3 2 1 0  
2  3 2 2 1 0  
3  4 2 1 1 1

我想将这个数据帧转换为这样的基于测试的文档(ID和它们的频率ABCDE作为单个列中的单词).然后我可以使用LDA算法来识别每个ID的热门话题.

ID                     Text
1   "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2   "A" "A" "A" "B" "B" "C" "C" "D"
3   "A" "A" "A" "A" "B" "B" "C" "D" "E"

最佳答案 我们可以使用data.table

library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"

#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"

#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"

或者拆分基本R选项

lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"

#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"

#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"

如果我们想将’lst’写入csv文件,一个选项是将列表转换为data.frame,方法是在结尾处附加NA以使长度相等,同时转换为data.frame(因为data.frame是一个相等的列表)长度(列))

res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))

或者使用stringi中的便捷功能

library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)

然后使用write.csv

write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)
点赞