如何在R中的这2个场景中生成高斯分布的数据？

2023年4月15日 350次阅读

在Tibshirani的“统计学习要素”中,当比较最小二乘/线性模型和knn这两个场景时：

Scenario 1: The training data in each class were generated from bivariate Gaussian distributions with uncorrelated components and different means.
Scenario 2: The training data in each class came from a mixture of 10
low- variance Gaussian distributions, with individual means themselves
distributed as Gaussian.

我们的想法是,第一个更适合于最小二乘/线性模型,第二个更适合类似模型(那些具有更高方差的因为我知道因为knn考虑了最接近的点而不是所有点).

在R中,我如何模拟两种情况的数据？

最终目标是能够重现两种情景,以证明线性模型比第二种更有效地解释第一种情况.

谢谢！

最佳答案这可能是方案1

library(mvtnorm)

N1 = 50
N2 = 50
K = 2

mu1 = c(-1,3)
mu2 = c(2,0)

cov1 = 0
v11 = 2
v12 = 2
Sigma1 = matrix(c(v11,cov1,cov1,v12),nrow=2)

cov2 = 0
v21 = 2
v22 = 2
Sigma2 = matrix(c(v21,cov2,cov2,v22),nrow=2)

x1 = rmvnorm(N1,mu1,Sigma1)
x2 = rmvnorm(N2,mu2,Sigma2)

这可能是从高斯混合模拟的候选者：

BartSimpson <- function(x,n = 100){ 
   means <- as.matrix(sort(rnorm(10)))
   dens <- .1*rowSums(apply(means,1,dnorm,x=x,sd=.1)) 
   rBartSimpson <- c(apply(means,1,rnorm,n=n/10,sd=.1))
   return(list("thedensity" = dens,"draws" = rBartSimpson))
}

x <- seq(-5,5,by=.01)

plot(x,BartSimpson(x)$thedensity,type="l",lwd=4,col="yellow2",xlim=c(-4,4),ylim=c(0,0.6))