这些R glm错误消息的含义是什么：“错误：找不到有效的系数集：请提供起始值”

2023年6月10日 685次阅读

这里有两个相关的问题,但它们不是我的重复,因为第一个问题有一个特定于数据集的解决方案,第二个问题涉及当启动与偏移一起提供时glm的失败.

https://stackoverflow.com/questions/31342637/error-please-supply-starting-valueshttps://stackoverflow.com/questions/8212063/r-glm-starting-values-not-accepted-log-link

我有以下数据集：

library(data.table)
df <- data.frame(names = factor(1:10))
set.seed(0)
df$probs <- c(0, 0, runif(8, 0, 1))
df$response = lapply(df$probs, function(i){
  rbinom(50, 1, i)  
})



dt <- data.table(df)

dt <- dt[, list(response = unlist(response)), by = c('names', 'probs')]

这样dt是：

> dt
     names     probs response 
  1:     1 0.0000000        0 
  2:     1 0.0000000        0 
  3:     1 0.0000000        0 
  4:     1 0.0000000        0 
  5:     1 0.0000000        0 
 ---                                     
496:    10 0.9446753        0 
497:    10 0.9446753        1 
498:    10 0.9446753        1 
499:    10 0.9446753        1 
500:    10 0.9446753        1

我试图使用lm2< – glm(data = dt,formula = response~probs,family = binomial(link =’identity’))来使用身份链接拟合逻辑回归模型. 这给出了一个错误：

Error: no valid set of coefficients has been found: please supply starting values

我尝试通过提供一个start参数来修复它,但后来又出现了另一个错误.

> lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity'), start = c(0, 1))
Error: cannot find valid starting values: please specify some

在这一点上,这些错误对我来说毫无意义,我不知道该怎么做.

编辑：@iraserd已经对这个问题提出了更多的启示.使用start = c(0.5,0.5),我得到：

> lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity'), start = c(0.5, 0.5))
There were 25 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: step size truncated: out of bounds
2: step size truncated: out of bounds
3: step size truncated: out of bounds
4: step size truncated: out of bounds
5: step size truncated: out of bounds
6: step size truncated: out of bounds
7: step size truncated: out of bounds
8: step size truncated: out of bounds
9: step size truncated: out of bounds
10: step size truncated: out of bounds
11: step size truncated: out of bounds
12: step size truncated: out of bounds
13: step size truncated: out of bounds
14: step size truncated: out of bounds
15: step size truncated: out of bounds
16: step size truncated: out of bounds
17: step size truncated: out of bounds
18: step size truncated: out of bounds
19: step size truncated: out of bounds
20: step size truncated: out of bounds
21: step size truncated: out of bounds
22: step size truncated: out of bounds
23: step size truncated: out of bounds
24: step size truncated: out of bounds
25: glm.fit: algorithm stopped at boundary value

和

> summary(lm2)

Call:
glm(formula = response ~ probs, family = binomial(link = "identity"), 
    data = dt, start = c(0.5, 0.5))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4023  -0.6710   0.3389   0.4641   1.7897  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.486e-08  1.752e-06   0.008    0.993    
probs       9.995e-01  2.068e-03 483.372   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 69312  on 49999  degrees of freedom
Residual deviance: 35984  on 49998  degrees of freedom
AIC: 35988

Number of Fisher Scoring iterations: 24

我非常怀疑这与某些响应是以真实概率零生成的事实有关,这会导致问题,因为probs的系数接近1.

最佳答案在fit.glm代码中有两个位置,它以错误终止,没有找到有效的系数集：请提供起始值.在一种情况下,当一些计算的偏差变为无穷大时,另一种情况似乎在提供无效的etastart和mustart选项时发生.

另见答案,详细阐述：How do I use a custom link function in glm?

当您尝试对概率进行回归(值介于0和1之间)时,我猜您需要指定不等于0或1的起始值：

lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='identity'), start=c(0.5,0.5))

这会引发很多警告,并以溢出终止,可能是因为示例的人为性质.

更改公式以使用logit链接(因为您希望根据您的问题进行逻辑回归)消除警告(并且不需要启动参数)：

    lm2 <- glm(data = dt, formula = response ~ probs, family = binomial(link='logit')