我目前正在阅读Advanced
Bash-Scripting Guide,并发现以下内容:
# Generate binary choice, that is, "true" or "false" value.
BINARY=2
T=1
number=$RANDOM
let "number %= $BINARY"
# Note that let "number >>= 14" gives a better random distribution
#+ (right shifts out everything except last binary digit).
if [ "$number" -eq $T ]
then
echo "TRUE"
else
echo "FALSE"
fi
echo
为什么建议采用第15位而不是第1位?一些二元决策的运行显示两者之间没有显着差异.
//更新
既然我被问到如何计算分布,我们就去了.我生成了几个$RANDOM数字,取了每个数字的第15位和第1位并创建了两个二进制序列.之后我循环查看这些序列,检查1和0链(运行),计算出最大长度序列将生成多少个(供参考)并将所有内容打印到一个令人困惑的表中.这是所有它的荣耀中的代码(对于脏代码感到抱歉……):
#! /bin/bash
COUNT=10000
RUN=1
# generate 2 sequences based on the same $RANDOM numbers
# seq1 = modulo 2, seq2 = bitshift 14
while [ $RUN -le $COUNT ]
do
number=$RANDOM
let 'var1=number%2'
var2=$number
let 'var2 >>= 14'
seq1="${seq1}${var1}"
seq2="${seq2}${var2}"
(( RUN+=1 ))
done
# loop through sequences and check for chains of 1 and 0 (runs)
length=${#seq1}
prevSym=${seq1:0:1}
currRun="${prevSym}"
for (( i=1; i<length; i++ )); do
currSym=${seq1:$i:1}
if (( currSym==prevSym )); then
currRun="${currRun}${currSym}"
(( i!=length-1 )) && continue
(( runStat1[${#currRun}]++ )) #case: ends with run length > 1
break
fi
(( runStat1[${#currRun}]++ ))
(( prevSym=currSym ))
(( i==length-1 )) && (( runStat1[1]++ )) #case: ends with run length = 1
currRun="${currSym}"
done
length=${#seq2}
prevSym=${seq2:0:1}
currRun="${prevSym}"
for (( i=1; i<length; i++ )); do
currSym=${seq2:$i:1}
if (( currSym==prevSym )); then
currRun="${currRun}${currSym}"
(( i!=length-1 )) && continue
(( runStat2[${#currRun}]++ )) #case: ends with run length > 1
break
fi
(( runStat2[${#currRun}]++ ))
(( prevSym=currSym ))
(( i==length-1 )) && (( runStat2[1]++ )) #case: ends with run length = 1
currRun="${currSym}"
done
# print results and expected frequency
# number of expected runs with runlength k:
# 1/2**k if k<n, 1/2**(k-1) if k=n
# $RANDOM generates random numbers in the range 0 to 32768 thus n=15
n=15
echo -e "Length L of run | # of runs with %2 | # of runs with >>14 | # of runs with MLS (calculated)\n "
echo -e "L\t|%2\t|>>14\t|MLS"
echo -e "-----------------------------------\n"
sorted="${!runStat1[*]} ${!runStat2[*]}"
sorted=$(echo $sorted | tr ' ' '\n' | sort -n | uniq)
for a in $sorted; do
k=${a}
(( ${a}==${n} )) && (( k=a-1 ))
prob=$(awk -v k=${a} -v c=${COUNT} 'BEGIN { print (((1/2)**k)*c)/k}')
echo -e "${a} \t| ${runStat1[$a]} \t| ${runStat2[$a]} \t| ${prob} "
done
运行它会打印出这些内容:
Length L of run | # of runs with %2 | # of runs with >>14 | # of runs with MLS (calculated)
L |%2 |>>14 |MLS
-----------------------------------
1 | 2495 | 2450 | 5000
2 | 1219 | 1212 | 1250
3 | 638 | 621 | 416.667
4 | 300 | 329 | 156.25
5 | 162 | 166 | 62.5
6 | 75 | 81 | 26.0417
7 | 46 | 34 | 11.1607
8 | 23 | 26 | 4.88281
9 | 13 | 7 | 2.17014
10 | 2 | 6 | 0.976562
11 | 1 | 1 | 0.443892
13 | 3 | | 0.0939002
15 | | 2 | 0.0203451
21 | | 1 | 0.000227065
这让我得出的结论是,毫无疑问并且在所有bash引用中也提到过,$RANDOM是随机性的可怕来源……但“number>> = 14”也没有比“number”更好的随机分布对于二元选择,%= 2“.
……或者在这个愚蠢的计算中,我犯了一个大错误.你告诉我.
最佳答案 使用高阶位的建议是因为许多随机数发生器被实现为
linear congruential generators,这在低阶位中产生差的随机性.
例如,以下RNG实施过去非常常见. (我相信它是作为C89标准中的一个例子.)
unsigned old_rand() {
next = next * 1103515245 + 12345;
return next;
}
现在看看这会产生什么样的数字.
2140733074 // even
3902869603 // odd
4012135520 // even
2255314201 // odd
3913576926 // even
2626310079 // odd
4159329932 // even
1903014357 // odd
位1根本不是随机的.
即使是像Java中使用的更高质量的LCG,也会受到这种影响,正如nice graphical demonstration所示.所以不要相信未知RNG的低阶位.