我有这样一个数据框架
0 weekday day month year hour basal bolus carb period.h
1 Tuesday 01 03 2016 0.0 0.25 NA NA 0
2 Tuesday 01 03 2016 10.9 NA NA 67 10
3 Tuesday 01 03 2016 10.9 NA 4.15 NA 10
4 Tuesday 01 03 2016 12.0 0.30 NA NA 12
5 Tuesday 01 03 2016 17.0 0.50 NA NA 17
6 Tuesday 01 03 2016 17.6 NA NA 33 17
7 Tuesday 01 03 2016 17.6 NA 1.35 NA 17
8 Tuesday 01 03 2016 18.6 NA NA 44 18
9 Tuesday 01 03 2016 18.6 NA 1.80 NA 18
10 Tuesday 01 03 2016 18.9 NA NA 17 18
11 Tuesday 01 03 2016 18.9 NA 0.70 NA 18
12 Tuesday 01 03 2016 22.0 0.40 NA NA 22
13 Wednesday 02 03 2016 0.0 0.25 NA NA 0
14 Wednesday 02 03 2016 9.7 NA NA 39 9
15 Wednesday 02 03 2016 9.7 NA 2.65 NA 9
16 Wednesday 02 03 2016 11.2 NA NA 13 11
17 Wednesday 02 03 2016 11.2 NA 0.30 NA 11
18 Wednesday 02 03 2016 12.0 0.30 NA NA 12
19 Wednesday 02 03 2016 12.0 NA NA 16 12
20 Wednesday 02 03 2016 12.0 NA 0.65 NA 12
如果你看第2行和第3行,你会发现它们完全符合同一天的情况.时间:仅对于第2行,“碳水化合物”不是NA,“推注”不是NA(这些是关于糖尿病的数据).
我想将这些行合并为一个:
2 Tuesday 01 03 2016 10.9 NA NA 67 10
3 Tuesday 01 03 2016 10.9 NA 4.15 NA 10
– >
2 Tuesday 01 03 2016 10.9 NA 4.15 67 10
我当然可以在每一行上做一个残酷的双循环,但我寻找一种更聪明,更快捷的方式.
最佳答案 您可以在此处按公共标识符列工作日,日,月,年,小时,期间h对数据框进行分组,然后对要合并的其余列中的第一个元素进行排序,默认情况下排序()函数将删除要排序的向量中的NA,因此每个组中的每个列最终都会得到非NA元素;如果列中的所有元素都是NA,则sort(col)[1]返回NA:
library(dplyr)
df %>%
group_by(weekday, day, month, year, hour, period.h) %>%
summarise_all(funs(sort(.)[1]))
# weekday day month year hour period.h basal bolus carb
# <fctr> <int> <int> <int> <dbl> <int> <dbl> <dbl> <int>
# 1 Tuesday 1 3 2016 0.0 0 0.25 NA NA
# 2 Tuesday 1 3 2016 10.9 10 NA 4.15 67
# 3 Tuesday 1 3 2016 12.0 12 0.30 NA NA
# 4 Tuesday 1 3 2016 17.0 17 0.50 NA NA
# 5 Tuesday 1 3 2016 17.6 17 NA 1.35 33
# 6 Tuesday 1 3 2016 18.6 18 NA 1.80 44
# 7 Tuesday 1 3 2016 18.9 18 NA 0.70 17
# 8 Tuesday 1 3 2016 22.0 22 0.40 NA NA
# 9 Wednesday 2 3 2016 0.0 0 0.25 NA NA
# 10 Wednesday 2 3 2016 9.7 9 NA 2.65 39
# 11 Wednesday 2 3 2016 11.2 11 NA 0.30 13
# 12 Wednesday 2 3 2016 12.0 12 0.30 0.65 16
而不是sort(),这里使用的更合适的函数是na.omit():
df %>% group_by(weekday, day, month, year, hour, period.h) %>%
summarise_all(funs(na.omit(.)[1]))