为什么RNN总是输出1

2023年10月4日 279次阅读

我使用递归神经网络(RNN)进行预测,但由于一些奇怪的原因,它总是输出1.这里我用玩具示例解释这个：

例
考虑一个维度矩阵M(360,5)和一个包含M.的行数的向量Y.现在,使用RNN,我想从M预测Y.使用rnn R包,我训练模型为

   library(rnn) 
    M <- matrix(c(1:1800),ncol=5,byrow = TRUE) # Matrix (say features) 
    Y <- apply(M,1,sum) # Output equls to row sum of M
    mt <- array(c(M),dim=c(NROW(M),1,NCOL(M))) # matrix formatting as [samples, timesteps, features]
    yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) # formatting
    model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=1000) # training

我在训练时观察到的一个奇怪的事情是,纪元错误总是4501.理想情况下,纪元错误应该随着纪元的增加而减少.

接下来,我创建了一个测试数据集,其结构与上面的结构相同：

M2 <- matrix(c(1:15),nrow=3,byrow = TRUE)
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

通过预测,我总是将输出设为1.
什么原因可能是恒定的纪元错误和相同的输出？

更新#1

@Barker提供的答案不适用于我的问题.为了让它打开,我在这里通过Dropbox链接分享简约数据,如traindata,testadata和我的R代码为.

数据详情：列’power’是响应变量,它是从第1天到第14天的前几天消耗的温度,湿度和功率的函数.

normalize_data <- function(x){
  normalized = (x-min(x))/(max(x)-min(x))
  return(normalized)
}

#read test and train data
traindat <- read.csv(file = "train.csv")
testdat <- read.csv(file = "test.csv")
# column "power" is response variable and remaining are predictors
# predictors in  traindata
trainX <- traindat[,1:dim(traindat)[2]-1]
# response of train data
trainY <- traindat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx <- array(as.matrix(trainX), dim=c(NROW(trainX), 1, NCOL(trainX)))
tx <- normalize_data(tx) # normalize data in range of [0,1]
ty <- array(trainY, dim=c(NROW(trainY), 1, NCOL(trainY))) # arrange response acc. to predictors
# train model
model <- trainr(X = tx, Y = ty, learningrate = 0.08, hidden_dim = 6, numepochs = 400)

# predictors in test data
testX <- testdat[,1:dim(testdat)[2]-1]
testX <- normalize_data(testX) # normalize data in range of [0,1]
#testY <- testdat$power
# arrange data acc. to RNN as [samples,time steps, features]
tx2 <- array(as.matrix(testX), dim=c(NROW(testX), 1, NCOL(testX))) # predict
pred <- predictr(model,tx2)
pred

我改变了参数学习率,hidden_dim,numepochs,但它仍然导致0.9或1.

最佳答案大多数RNN不喜欢没有常数均值的数据.处理此问题的一种策略是区分数据.要了解其工作原理,请使用基本R时间序列co2.这是一个具有良好平滑季节性和趋势的时间序列,因此我们应该能够预测它.

对于我们的模型,我们的输入矩阵将是使用stl分解创建的co2时间序列的“季节性”和“趋势”.因此,让我们像以前一样制作我们的训练和测试数据并训练模型(注意我减少了运行时间的数字).我将使用过去一年半的所有数据进行培训,然后使用过去一年半进行测试：

#Create the STL decomposition
sdcomp <- stl(co2, s.window = 7)$time.series[,1:2]

Y <- window(co2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))
#Taken from OP's code
mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y))) 
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

现在我们可以在测试数据的最后一年创建我们的预测：

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
predictr(model,mt2)

output:
      [,1]
 [1,]    1
 [2,]    1
 [3,]    1
 [4,]    1
 [5,]    1
 [6,]    1
 [7,]    1
 [8,]    1
 [9,]    1
[10,]    1
[11,]    1
[12,]    1
[13,]    1
[14,]    1
[15,]    1
[16,]    1
[17,]    1
[18,]    1

Ewe,它就像你的例子一样.现在让我们再试一次,但这次我们将区分数据.由于我们试图将我们的预测做出一年半的时间,我们将使用18作为差异滞后,因为这些是我们提前18个月知道的值.

dco2 <- diff(co2, 18)
sdcomp <- stl(dco2, s.window = "periodic")$time.series[,1:2]
plot(dco2)

很好,趋势现在已经消失,所以我们的神经网络应该能够更好地找到模式.让我们再试一次新数据.

Y <- window(dco2, end = c(1996, 6))
M <- window(sdcomp, end = c(1996, 6))

mt <- array(c(M),dim=c(NROW(M),1,NCOL(M)))
yt <- array(c(Y),dim=c(NROW(M),1,NCOL(Y)))
model <- trainr(X=mt,Y=yt,learningrate=0.5,hidden_dim=10,numepochs=100)

M2 <- window(sdcomp, start = c(1996,7))
mt2 <- array(c(M2),dim=c(NROW(M2),1,NCOL(M2)))
(preds <- predictr(model,mt2))

output:
              [,1]
 [1,] 9.999408e-01
 [2,] 9.478496e-01
 [3,] 6.101828e-08
 [4,] 2.615463e-08
 [5,] 3.144719e-08
 [6,] 1.668084e-06
 [7,] 9.972314e-01
 [8,] 9.999901e-01
 [9,] 9.999916e-01
[10,] 9.999916e-01
[11,] 9.999916e-01
[12,] 9.999915e-01
[13,] 9.999646e-01
[14,] 1.299846e-02
[15,] 3.114577e-08
[16,] 2.432247e-08
[17,] 2.586075e-08
[18,] 1.101596e-07

好的,现在有东西！让我们看看它与试图预测的东西相比如何,dco2：

不理想,但我们却找到了数据的一般“向上”模式.现在你所要做的就是修补你的学习率,并开始优化所有那些使神经网络工作如此快乐的可爱超参数.当它按照你想要的方式工作时,你可以在你的训练数据的最后18个月内获取最终输出并添加回来.