R中数据的逐行比较

2023年10月30日 147次阅读

我有一个包含起始 – 目的地数据和一些相关变量的数据集.它看起来像这样：

    "Origin","Destination","distance","volume"
    "A01"     "A01"          0.0        10
    "A02"     "A01"          1.2         9
    "A03"     "A01"          1.4        15 
    "A01"     "A02"          1.2        16

然后,对于每个起始 – 目的地对,我希望能够根据该行和所选其他行中的数据计算其他变量.例如,前往该目的地的其他多个原始区域的交通量大于焦点对.在这个例子中,我最终得到了目的地A01的以下内容.

    "Origin","Destination","distance","volume","greater_flow"
    "A01"    "A01"            0.0        10         1
    "A02"    "A01"            1.2         9         2
    "A03"    "A01"            1.4        15         0

我一直试图用group_by和apply来解决问题,但是无法解决如何a)’修复’我想用作参考的数据(从A01到A01的卷)和b)仅限于数据的比较具有相同目的地(A01)和c)重复所有起始 – 目的地对.

最佳答案这是使用基数R的答案(使用apply)：

d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16))

# extracting entries with destination = A01
d2 <- d[d[, "Destination"] == "A01", ]

# calculating number of rows satisfying your condition
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )

# sticking things back together
data.frame(d2, greater_flow)

#  Origin Destination distance volume greater_flow
# 1    A01         A01      0.0     10            1
# 2    A02         A01      1.2      9            2
# 3    A03         A01      1.4     15            0

如果您需要对所有可能的目的地进行计算,您可以循环显示唯一(d [,“目的地”])：

 lapply(unique(d[, "Destination"]), FUN = function(dest){
         d2 <- d[d[, "Destination"] == dest, ]
         greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0) )

    data.frame(d2, greater_flow)    
 })

然后,如果需要,可以通过do.call(rbind,output)将输出粘合在一起.