使用gather()进行密钥排序与原始列的排序

2023年11月25日 262次阅读

密钥排序是否取决于我是否首先列出要收集的列与不收集的列？

这是我的data.frame：

library(tidyr)
wide_df <- data.frame(c("a", "b"), c("oh", "ah"), c("bla", "ble"), stringsAsFactors = FALSE)
colnames(wide_df) <- c("first", "second", "third")
wide_df

 first second third
1     a     oh   bla
2     b     ah   ble

首先,我按照特定的顺序收集所有列,并且我的排序在密钥列表中被尊重为第二个,首先,尽管列实际上是作为第一个,第二个排序：

long_01_df <- gather(wide_df, my_key, my_value, second, first, third)
long_01_df

  my_key my_value
1 second       oh
2 second       ah
3  first        a
4  first        b
5  third      bla
6  third      ble

然后我决定从收集中排除一列：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_02_df

 third my_key my_value
1   bla second       oh
2   ble second       ah
3   bla  first        a
4   ble  first        b

首先按键排序第二个键.然后我像这样编码,相信做同样的事情：

long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)
long_03_df

我根据原始data.frame中的实际列顺序获取了键.

 third my_key my_value
1   bla  first        a
2   ble  first        b
3   bla second       oh
4   ble second       ah

当我用factor_key = TRUE调用函数时,这种行为甚至没有改变.我错过了什么？

最佳答案摘要

这样做的原因是你不能混合负指数和正指数. (你也不应该：它根本就没有意义.)如果你这样做,那么gather()会忽略一些索引.

详细的答案

同样对于标准索引,不允许混合正负索引：

x <- 1:10
x[c(4, -2)]
## Error in x[c(4, -2)] : only 0's may be mixed with negative subscripts

这是有道理的：用4索引告诉R只保留第四个元素.没有必要明确告诉它另外抛弃第二个元素.

根据gather()的文档,选择列的工作方式与dplyr的select()相同.那就让我们玩吧.我将使用mtcars的一个子集：

mtcars <- mtcars[1:2, 1:5]
mtcars
##                mpg cyl disp  hp drat
## Mazda RX4     21.0   6  160 110 3.90
## Mazda RX4 Wag 21.0   6  160 110 3.90

您可以使用select()使用正面和负面索引：

select(mtcars, mpg, cyl)
##              mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

同样对于select(),混合正负指数是没有意义的.但是,select()似乎忽略了与第一个符号不同的所有索引,而不是抛出错误：

select(mtcars, mpg, -hp, cyl)
##               mpg cyl
## Mazda RX4      21   6
## Mazda RX4 Wag  21   6

select(mtcars, -mpg, hp, -cyl)
##               disp  hp drat
## Mazda RX4      160 110  3.9
## Mazda RX4 Wag  160 110  3.9

如您所见,结果与以前完全相同.

对于使用gather()的示例,您可以使用以下两行：

long_02_df <- gather(wide_df, my_key, my_value, second, first, -third)
long_03_df <- gather(wide_df, my_key, my_value, -third, second, first)

根据我上面所示,这些线条与：

long_02_df <- gather(wide_df, my_key, my_value, second, first)
long_03_df <- gather(wide_df, my_key, my_value, -third)

请注意,第二行中没有任何内容表示您首选的键排序.它只说第三个应该省略.