Clojure的独特随机生成流的复杂性

2019年7月20日 244次阅读

表达式的时间复杂度是多少？

(doall (take n (distinct stream)))

stream是一个带有重复项的懒惰生成(可能是无限的)集合？

我想这部分取决于流中重复的数量或机会？如果流是(重复#(rand-int m))),其中m> = n？

我估计：

对于结果列表中的每个元素,必须至少有一个从流中实现的元素.如果流有重复,则为多个.对于每次迭代,都有一个集查找和/或插入,但由于那些接近恒定时间,我们至少得到：O(n * ~1)= O(n),然后重复的一些复杂性.我的直觉是复制品的复杂性也可以忽略不计,但我不确定如何将其形式化.例如,对于某些常数k,我们不能只说它是O(n * k * ~1)= O(n),因为在流中我们可能遇到的重复项没有明显的最大数量k.

让我用一些数据证明问题：

(defn stream [upper distinct-n]
  (let [counter (volatile! 0)]
    (doall (take distinct-n
                 (distinct
                  (repeatedly (fn []
                                (vswap! counter inc)
                                (rand-int upper))))))
    @counter))

(defn sample [times-n upper distinct-n]
  (->> (repeatedly times-n
                  #(stream upper distinct-n))
       frequencies
       (sort-by val)
       reverse))

(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])

请注意,对于最后一个样本,不同的迭代次数可以达到66,尽管机会很小.
还要注意,为了增加n in(样本10000 n n),来自流的最可能数量的已实现元素似乎比线性增加更多.

该图表说明了对于不同数量的n和m,(输入n(重复#(rand-int m))中输入的实现元素的数量(最常见的10000个样本).

为了完整性,这是我用来生成图表的代码：

(require '[com.hypirion.clj-xchart :as c])

(defn most-common [times-n upper distinct-n]
  (->> (repeatedly times-n
                   #(stream upper distinct-n))
       frequencies
       (sort-by #(- (val %)))
       ffirst))

(defn series [m]
  {(str "m = " m)
   (let [x (range 1 (inc m))]
     {:x x
      :y (map #(most-common 10000 m %)
              x)})})

(c/view
 (c/xy-chart
  (merge (series 10)
         (series 25)
         (series 50)
         (series 100))
  {:x-axis {:title "n"}
   :y-axis {:title "realized"}}))

最佳答案你的问题被称为
Coupon collectors problem,预期的元素数量只需加总m / m m /(m-1)……直到你有n个项目为止：

(defn general-coupon-collector-expect
  "n: Cardinality of resulting set (# of uniuque coupons to collect)
   m: Cardinality of set to draw from (#coupons that exist)"
  [n m]
  {:pre [(<= n m)]}
  (double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
  (general-coupon-collector-expect x 100))

最坏的情况将是无限的.最好的情况是N.平均情况是O(N log N).这忽略了检查元素是否已被绘制的复杂性.在实践中,对于clojure集合是Log_32 N(以不同的方式使用).