说我有以下历史联赛成绩:
Season <- c(1,1,2,2,3,3,4,4,5,5)
Team <- c("Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton")
End.Rank <- c(8,17,4,15,3,6,4,16,3,17)
PLRank <- cbind(Season,Team,End.Rank)
我希望(有效地)根据两个标准为每个团队创建一年滞后变量:
>延迟End.Rank按季节(即以季节为时间变量的t-1)
>由团队分开(Deverton的滞后End.Rank与Diverpool的滞后End.Rank)
基本上,我希望输出如下:
l.End.Rank <- c(NA,NA,8,17,4,15,3,6,4,16)
尝试滞后(),并在此刻尝试在for()循环中丢失.
最佳答案 您可以尝试以下其中一项……
请注意,我使用了data.frame而不是使用cbind获得的矩阵:
PLRank <- data.frame(Season, Team, End.Rank)
使用“data.table”:
library(data.table)
setDT(PLRank)[, l.End.Rank := shift(End.Rank), by = .(Team)][]
# Season Team End.Rank l.End.Rank
# 1: 1 Diverpool 8 NA
# 2: 1 Deverton 17 NA
# 3: 2 Diverpool 4 8
# 4: 2 Deverton 15 17
# 5: 3 Diverpool 3 4
# 6: 3 Deverton 6 15
# 7: 4 Diverpool 4 3
# 8: 4 Deverton 16 6
# 9: 5 Diverpool 3 4
# 10: 5 Deverton 17 16
或者,使用“dplyr”:
library(dplyr)
PLRank %>%
group_by(Team) %>%
mutate(l.End.Rank = lag(End.Rank))
# Source: local data frame [10 x 4]
# Groups: Team [2]
#
# Season Team End.Rank l.End.Rank
# (dbl) (fctr) (dbl) (dbl)
# 1 1 Diverpool 8 NA
# 2 1 Deverton 17 NA
# 3 2 Diverpool 4 8
# 4 2 Deverton 15 17
# 5 3 Diverpool 3 4
# 6 3 Deverton 6 15
# 7 4 Diverpool 4 3
# 8 4 Deverton 16 6
# 9 5 Diverpool 3 4
# 10 5 Deverton 17 16
更新
我老实说完全误读了你想按季节分组.
如果你是按季节滞后,也许你应该考虑扩大数据,这样每个赛季只有一排.然后按季节来说很容易.
例子:
在这里,我们使用“data.table”中的dcast将“End.Rank”的值传播出“Team”.然后,我们只延迟新创建的列.
library(data.table)
teams <- as.character(unique(PLRank$Team))
dcast(as.data.table(PLRank), Season ~ Team, value.var = "End.Rank")[
, (teams) := lapply(.SD, shift), .SDcols = teams][]
# Season Deverton Diverpool
# 1: 1 NA NA
# 2: 2 17 8
# 3: 3 15 4
# 4: 4 6 3
# 5: 5 16 4
或者,如果您希望团队名称和值都是宽泛的形式,您可以尝试以下方法:
dcast(as.data.table(PLRank)[, ind := sequence(.N), by = Season],
Season ~ ind, value.var = c("Team", "End.Rank"))[
, c("End.Rank_1", "End.Rank_2") := lapply(.SD, shift),
.SDcols = c("End.Rank_1", "End.Rank_2")][]
# Season Team_1 Team_2 End.Rank_1 End.Rank_2
# 1: 1 Diverpool Deverton NA NA
# 2: 2 Diverpool Deverton 8 17
# 3: 3 Diverpool Deverton 4 15
# 4: 4 Diverpool Deverton 3 6
# 5: 5 Diverpool Deverton 4 16
“dplyr”中的方法是类似的.由于您要使用的是宽屏,因此您还需要加载“tidyr”.
library(dplyr)
library(tidyr)
PLRank %>%
spread(Team, End.Rank) %>%
mutate_each(funs(lag), -Season)
# Season Deverton Diverpool
# 1 1 NA NA
# 2 2 17 8
# 3 3 15 4
# 4 4 6 3
# 5 5 16 4