[Leetcode] Find Median from Data Stream 数据流中位数

Data Stream Median

最新更新:https://yanjia.me/zh/2019/02/…

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples: [2,3,4] , the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) – Add a integer number from the data stream to the data structure.
double findMedian() – Return the median of all elements so far. For example:

add(1) add(2) findMedian() -> 1.5 add(3) findMedian() -> 2

最大最小堆

复杂度

时间 O(NlogN) 空间 O(N)

思路

维护一个最大堆,一个最小堆。最大堆存的是到目前为止较小的那一半数,最小堆存的是到目前为止较大的那一半数,这样中位数只有可能是堆顶或者堆顶两个数的均值。而维护两个堆的技巧在于判断堆顶数和新来的数的大小关系,还有两个堆的大小关系。我们将新数加入堆后,要保证两个堆的大小之差不超过1。先判断堆顶数和新数的大小关系,有如下三种情况:最小堆堆顶小于新数时,说明新数在所有数的上半部分。最小堆堆顶大于新数,但最大堆堆顶小于新数时,说明新数将处在最小堆堆顶或最大堆堆顶,也就是一半的位置。最大堆堆顶大于新数时,说明新数将处在所有数的下半部分。再判断两个堆的大小关系,如果新数不在中间,那目标堆不大于另一个堆时,将新数加入目标堆,否则将目标堆的堆顶加入另一个堆,再把新数加入目标堆。如果新数在中间,那加到大小较小的那个堆就行了(一样大的话随便,代码中是加入最大堆)。这样,每次新加进来一个数以后,如果两个堆一样大,则中位数是两个堆顶的均值,否则中位数是较大的那个堆的堆顶。

注意

Java中实现最大堆是在初始化优先队列时加入一个自定义的Comparator,默认初始堆大小是11。Comparator实现compare方法时,用arg1 – arg0来表示大的值在前面

代码

Leetcode

class MedianFinder {
    
    PriorityQueue<Integer> maxheap;
    PriorityQueue<Integer> minheap;
    
    public MedianFinder(){
        // 新建最大堆
        maxheap = new PriorityQueue<Integer>(11, new Comparator<Integer>(){
            public int compare(Integer i1, Integer i2){
                return i2 - i1;
            }
        });
        // 新建最小堆
        minheap = new PriorityQueue<Integer>();
    }

    // Adds a number into the data structure.
    public void addNum(int num) {
        // 如果最大堆为空,或者该数小于最大堆堆顶,则加入最大堆
        if(maxheap.size() == 0 || num <= maxheap.peek()){
            // 如果最大堆大小超过最小堆,则要平衡一下
            if(maxheap.size() > minheap.size()){
                minheap.offer(maxheap.poll());
            }
            maxheap.offer(num);
        // 数字大于最小堆堆顶,加入最小堆的情况
        } else if (minheap.size() == 0 || num > minheap.peek()){
            if(minheap.size() > maxheap.size()){
                maxheap.offer(minheap.poll());
            }
            minheap.offer(num);
        // 数字在两个堆顶之间的情况
        } else {
            if(maxheap.size() <= minheap.size()){
                maxheap.offer(num);
            } else {
                minheap.offer(num);
            }
        }
    }

    // Returns the median of current data stream
    public double findMedian() {
        // 返回大小较大的那个堆堆顶,如果大小一样说明是偶数个,则返回堆顶均值
        if(maxheap.size() > minheap.size()){
            return maxheap.peek();
        } else if (maxheap.size() < minheap.size()){
            return minheap.peek();
        } else if (maxheap.isEmpty() && minheap.isEmpty()){
            return 0;
        } else {
            return (maxheap.peek() + minheap.peek()) / 2.0;
        }
    }
};

简洁版

class MedianFinder {
    
    PriorityQueue<Integer> maxheap = new PriorityQueue<Integer>();
    PriorityQueue<Integer> minheap = new PriorityQueue<Integer>(Collections.reverseOrder());
    
    // Adds a number into the data structure.
    public void addNum(int num) {
        maxheap.offer(num);
        minheap.offer(maxheap.poll());
        if(maxheap.size() < minheap.size()){
            maxheap.offer(minheap.poll());
        }
    }

    // Returns the median of current data stream
    public double findMedian() {
        return maxheap.size() == minheap.size() ? (double)(maxheap.peek() + minheap.peek()) / 2.0 : maxheap.peek();
    }
};

Lintcode

public class Solution {
    public int[] medianII(int[] nums) {
        // write your code here
        if(nums.length <= 1) return nums;
        int[] res = new int[nums.length];
        PriorityQueue<Integer> minheap = new PriorityQueue<Integer>();
        PriorityQueue<Integer> maxheap = new PriorityQueue<Integer>(11, new Comparator<Integer>(){
            public int compare(Integer arg0, Integer arg1) {
                return arg1 - arg0;
            }
        });
        // 将前两个元素先加入堆中
        minheap.offer(Math.max(nums[0], nums[1]));
        maxheap.offer(Math.min(nums[0], nums[1]));
        res[0] = res[1] = Math.min(nums[0], nums[1]);
        for(int i = 2; i < nums.length; i++){
            int mintop = minheap.peek();
            int maxtop = maxheap.peek();
            int curr = nums[i];
            // 新数在较小的一半中
            if (curr < maxtop){
                if (maxheap.size() <= minheap.size()){
                    maxheap.offer(curr);
                } else {
                    minheap.offer(maxheap.poll());
                    maxheap.offer(curr);
                }
            // 新数在中间
            } else if (curr >= maxtop && curr <= mintop){
                if (maxheap.size() <= minheap.size()){
                    maxheap.offer(curr);
                } else {
                    minheap.offer(curr);
                }
            // 新数在较大的一半中
            } else {
                if(minheap.size() <= maxheap.size()){
                    minheap.offer(curr);
                } else {
                    maxheap.offer(minheap.poll());
                    minheap.offer(curr);
                }
            }
            if (maxheap.size() == minheap.size()){
                res[i] = (maxheap.peek() + minheap.peek()) / 2;
            } else if (maxheap.size() > minheap.size()){
                res[i] = maxheap.peek();
            } else {
                res[i] = minheap.peek();
            }
        }
        return res;
    }
}

后续 Follow Up

Q:如果要求第n/10个数字该怎么做?
A:改变两个堆的大小比例,当求n/2即中位数时,两个堆是一样大的。而n/10时,说明有n/10个数小于目标数,9n/10个数大于目标数。所以我们保证最小堆是最大堆的9倍大小就行了。

    原文作者:ethannnli
    原文地址: https://segmentfault.com/a/1190000003709954
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞