表达式的时间复杂度是多少?
(doall (take n (distinct stream)))
stream是一个带有重复项的懒惰生成(可能是无限的)集合?
我想这部分取决于流中重复的数量或机会?如果流是(重复#(rand-int m))),其中m> = n?
我估计:
对于结果列表中的每个元素,必须至少有一个从流中实现的元素.如果流有重复,则为多个.对于每次迭代,都有一个集查找和/或插入,但由于那些接近恒定时间,我们至少得到:O(n * ~1)= O(n),然后重复的一些复杂性.我的直觉是复制品的复杂性也可以忽略不计,但我不确定如何将其形式化.例如,对于某些常数k,我们不能只说它是O(n * k * ~1)= O(n),因为在流中我们可能遇到的重复项没有明显的最大数量k.
让我用一些数据证明问题:
(defn stream [upper distinct-n]
(let [counter (volatile! 0)]
(doall (take distinct-n
(distinct
(repeatedly (fn []
(vswap! counter inc)
(rand-int upper))))))
@counter))
(defn sample [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by val)
reverse))
(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])
请注意,对于最后一个样本,不同的迭代次数可以达到66,尽管机会很小.
还要注意,为了增加n in(样本10000 n n),来自流的最可能数量的已实现元素似乎比线性增加更多.
该图表说明了对于不同数量的n和m,(输入n(重复#(rand-int m))中输入的实现元素的数量(最常见的10000个样本).
为了完整性,这是我用来生成图表的代码:
(require '[com.hypirion.clj-xchart :as c])
(defn most-common [times-n upper distinct-n]
(->> (repeatedly times-n
#(stream upper distinct-n))
frequencies
(sort-by #(- (val %)))
ffirst))
(defn series [m]
{(str "m = " m)
(let [x (range 1 (inc m))]
{:x x
:y (map #(most-common 10000 m %)
x)})})
(c/view
(c/xy-chart
(merge (series 10)
(series 25)
(series 50)
(series 100))
{:x-axis {:title "n"}
:y-axis {:title "realized"}}))
最佳答案 你的问题被称为
Coupon collectors problem,预期的元素数量只需加总m / m m /(m-1)……直到你有n个项目为止:
(defn general-coupon-collector-expect
"n: Cardinality of resulting set (# of uniuque coupons to collect)
m: Cardinality of set to draw from (#coupons that exist)"
[n m]
{:pre [(<= n m)]}
(double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
(general-coupon-collector-expect x 100))
最坏的情况将是无限的.最好的情况是N.平均情况是O(N log N).这忽略了检查元素是否已被绘制的复杂性.在实践中,对于clojure集合是Log_32 N(以不同的方式使用).