在处理分层/多级/面板数据集时,采用一个包可以返回可用变量的组内和组之间标准偏差可能非常有用.
这可以通过命令轻松完成Stata中的以下数据
xtsum, i(momid)
我做了一个研究,但我找不到任何可以做到这一点的R包.
编辑:
只是为了解决问题,分层数据集的一个例子可能是这样的:
son_id mom_id hispanic mom_smoke son_birthweigth
1 1 1 1 3950
2 1 1 0 3890
3 1 1 0 3990
1 2 0 1 4200
2 2 0 1 4120
1 3 0 0 2975
2 3 0 1 2980
“多级”结构由每个母亲(较高级别)具有两个或更多个儿子(较低级别)的事实给出.因此,每个母亲定义一组观察.
因此,每个数据集变量可以在母亲之间和母亲之间变化,或者仅在母亲之间变化. birtweigth因母亲而异,但也在同一位母亲中.相反,西班牙裔美国人是固定的同一个母亲.
例如,son_birthweigth的母亲内部差异是:
# mom1 means
bwt_mean1 <- (3950+3890+3990)/3
bwt_mean2 <- (4200+4120)/2
bwt_mean3 <- (2975+2980)/2
# Within-mother variance for birthweigth
((3950-bwt_mean1)^2 + (3890-bwt_mean1)^2 + (3990-bwt_mean1)^2 +
(4200-bwt_mean2)^2 + (4120-bwt_mean2)^2 +
(2975-bwt_mean3)^2 + (2980-bwt_mean3)^2)/(7-1)
而母亲之间的差异是:
# overall mean of birthweigth:
# mean <- sum(data$son_birthweigth)/length(data$son_birthweigth)
mean <- (3950+3890+3990+4200+4120+2975+2980)/7
# within variance:
((bwt_mean1-mean)^2 + (bwt_mean2-mean)^2 + (bwt_mean3-mean)^2)/(3-1)
最佳答案 我不知道你的stata命令应该重现什么,但要回答关于问题的第二部分
层次结构,用列表很容易做到这一点.
例如,您定义这样的结构:
tree = list(
"var1" = list(
"panel" = list(type ='p',mean = 1,sd=0)
,"cluster" = list(type = 'c',value = c(5,8,10)))
,"var2" = list(
"panel" = list(type ='p',mean = 2,sd=0.5)
,"cluster" = list(type="c",value =c(1,2)))
)
创建这个lapply很有可能与list一起工作
tree <- lapply(list('var1','var2'),function(x){
ll <- list(panel= list(type ='p',mean = rnorm(1),sd=0), ## I use symbol here not name
cluster= list(type = 'c',value = rnorm(3))) ## R prefer symbols
})
names(tree) <-c('var1','var2')
你可以用str查看他的结构
str(tree)
List of 2
$var1:List of 2
..$panel :List of 3
.. ..$type: chr "p"
.. ..$mean: num 0.284
.. ..$sd : num 0
..$cluster:List of 2
.. ..$type : chr "c"
.. ..$value: num [1:3] 0.0722 -0.9413 0.6649
$var2:List of 2
..$panel :List of 3
.. ..$type: chr "p"
.. ..$mean: num -0.144
.. ..$sd : num 0
..$cluster:List of 2
.. ..$type : chr "c"
.. ..$value: num [1:3] -0.595 -1.795 -0.439
在OP澄清后编辑
我认为包reshape2就是你想要的.我将在这里证明这一点.
这里的想法是为了进行我们需要重新整形数据的多级分析.
首先将变量分为两组:标识符和测量变量.
库(reshape2)
dat.m< – melt(dat,id.vars = c(‘son_id’,’mom_id’))##其他列被测量
str(dat.m)
'data.frame': 21 obs. of 4 variables:
$son_id : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 1 2 1 2 3 ...
$mom_id : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 3 3 1 1 1 ...
$variable: Factor w/ 3 levels "hispanic","mom_smoke",..: 1 1 1 1 1 1 1 2 2 2 ...
$value : num 1 1 1 0 0 0 0 1 0 0 ..
一旦你有“moten”形式的数据,你可以“施放”以你想要的形状重新排列它:
# mom1 means for all variable
acast(dat.m,variable~mom_id,mean)
1 2 3
hispanic 1.0000000 0 0.0
mom_smoke 0.3333333 1 0.5
son_birthweigth 3943.3333333 4160 2977.5
# Within-mother variance for birthweigth
acast(dat.m,variable~mom_id,function(x) sum((x-mean(x))^2))
1 2 3
hispanic 0.0000000 0 0.0
mom_smoke 0.6666667 0 0.5
son_birthweigth 5066.6666667 3200 12.5
## overall mean of each variable
acast(dat.m,variable~.,mean)
[,1]
hispanic 0.4285714
mom_smoke 0.5714286
son_birthweigth 3729.2857143