我有一个包含起始 – 目的地数据和一些相关变量的数据集.它看起来像这样:
"Origin","Destination","distance","volume"
"A01" "A01" 0.0 10
"A02" "A01" 1.2 9
"A03" "A01" 1.4 15
"A01" "A02" 1.2 16
然后,对于每个起始 – 目的地对,我希望能够根据该行和所选其他行中的数据计算其他变量.例如,前往该目的地的其他多个原始区域的交通量大于焦点对.在这个例子中,我最终得到了目的地A01的以下内容.
"Origin","Destination","distance","volume","greater_flow"
"A01" "A01" 0.0 10 1
"A02" "A01" 1.2 9 2
"A03" "A01" 1.4 15 0
我一直试图用group_by和apply来解决问题,但是无法解决如何a)’修复’我想用作参考的数据(从A01到A01的卷)和b)仅限于数据的比较具有相同目的地(A01)和c)重复所有起始 – 目的地对.
最佳答案 这是使用基数R的答案(使用apply):
d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16))
# extracting entries with destination = A01
d2 <- d[d[, "Destination"] == "A01", ]
# calculating number of rows satisfying your condition
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
# sticking things back together
data.frame(d2, greater_flow)
# Origin Destination distance volume greater_flow
# 1 A01 A01 0.0 10 1
# 2 A02 A01 1.2 9 2
# 3 A03 A01 1.4 15 0
如果您需要对所有可能的目的地进行计算,您可以循环显示唯一(d [,“目的地”]):
lapply(unique(d[, "Destination"]), FUN = function(dest){
d2 <- d[d[, "Destination"] == dest, ]
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )
data.frame(d2, greater_flow)
})
然后,如果需要,可以通过do.call(rbind,output)将输出粘合在一起.