data.table中的Equi-join和rolling join

我想在两个字段上连接表,一个使用equi-join,另一个使用滚动连接.我正在使用的数据如下:

library(data.table)
dt <- data.table(Date = as.Date(c("2015-12-29", "2015-12-29", "2015-12-29", "2015-12-29", "2016-01-30", "2016-01    -30", "2016-01-30", "2016-01-30", "2016-02-29", "2016-02-29", "2016-02-29", "2016-02-29", "2016-03-26", "2016-03-26", "2016-03-26", "2016-03-26")), 
                   ID = c("A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D"), 
                Value = c("A201512", "B201512", "C201512", "D201512", "A201601", "B201601", "C201601", "D201601", "A201602", "B201602", "C201602", "D201602", "A201603", "B201603", "C201603", "D201603"), key = c('Date', 'ID'))

dtes <- data.table(Date=as.Date(c("2015-12-31", "2016-01-31", "2016-02-29", "2016-03-31")), key="Date")

dte <- CJ(Date=dtes$Date, ID=unique(dt$ID))

我想在ID(使用equi-join)AND Date(使用滚动连接)上加入表’dt’和’dte’

dt[dte, roll=T]

给我

#           Date ID   Value
#  1: 2015-12-31  A      NA
#  2: 2015-12-31  B      NA
#  3: 2015-12-31  C      NA
#  4: 2015-12-31  D      NA
#  5: 2016-01-31  A      NA
#  6: 2016-01-31  B      NA
#  7: 2016-01-31  C      NA
#  8: 2016-01-31  D      NA
#  9: 2016-02-29  A A201602
# 10: 2016-02-29  B B201602
# 11: 2016-02-29  C C201602
# 12: 2016-02-29  D D201602
# 13: 2016-03-31  A      NA
# 14: 2016-03-31  B      NA
# 15: 2016-03-31  C      NA
# 16: 2016-03-31  D      NA

我追求的结果是这样的:

# Date      ID        Value
# 2016-03-31    A   A201603
# 2016-02-29    A   A201602
# 2016-01-31    A   A201601
# 2015-12-31    A   A201512
# 2016-03-31    B   B201603
# 2016-02-29    B   B201602
# 2016-01-31    B   B201601
# 2015-12-31    B   B201512
# 2016-03-31    C   C201603
# 2016-02-29    C   C201602
# 2016-01-31    C   C201601
# 2015-12-31    C   C201512
# 2016-03-31    D   D201603
# 2016-02-29    D   D201602
# 2016-01-31    D   D201601
# 2015-12-31    D   D201512

这在data.table中是否可行?

最佳答案 是的,按相反顺序设置键;滚动进入合并的最后一列:

setkey(dt, ID, Date)
setkey(dte, ID, Date)
dt[dte, roll=TRUE][order(ID, -Date)]


          Date ID   Value
 1: 2016-03-31  A A201603
 2: 2016-02-29  A A201602
 3: 2016-01-31  A A201601
 4: 2015-12-31  A A201512
 5: 2016-03-31  B B201603
 6: 2016-02-29  B B201602
 7: 2016-01-31  B B201512
 8: 2015-12-31  B B201512
 9: 2016-03-31  C C201603
10: 2016-02-29  C C201602
11: 2016-01-31  C C201601
12: 2015-12-31  C C201512
13: 2016-03-31  D D201603
14: 2016-02-29  D D201602
15: 2016-01-31  D D201601
16: 2015-12-31  D D201512

或者,而不是使用setkey,只需使用X [Y,on = cols,roll = TRUE]以正确的顺序写入cols(假设上面评论中提到的bug是固定的).

点赞