中位数和顺序统计

2019年11月4日 162次阅读

如何求取数组中第i 小(大)的数，好像就是顺序统计了，反正我是这么理解的。。。

有了顺序统计，求取中位数就方便了。假设数组有n个元素，如果n为奇数，则转化为求取第(n+1)/2小的元素；如果n为偶数，则转化为求取第n/2小和第n/2+1小的元素，并取平均值得到。

中位数的好处？好像是为了解决求平均值时的一个弊端：如果样本中有一个极大值，其它大部分都是普通值，则求出平均值和真实情况相比误差很大。这时用中位数就比较好。（什么？极大值正好在数组中间？那是RP问题，>_<）

========================

顺序统计算法

========================

1. 基本思想
参考快速排序中的分治法，快速排序每一轮都将小于哨兵的元素放到哨兵之前，大于哨兵的元素放在哨兵之后。而小于等于哨兵的元素个数n和顺序i有联系，利用这一点联系在分裂的某一侧进行不断的测试。

2. 算法描述

和快速排序类似，见本博客的另外一篇文章“排序算法总结”。但有一点不明，在程序中标出了，望解答，谢谢。

3. 复杂度

空间复杂度O(1), 时间复杂度为O(n)，原因是相对于快速排序，只在划分的一侧进行运算。

4. 算法实现

template<typename T>
int partition(T * array, const int low, const int high)
{
	//select the first elem as pivot. Here use random index for pivot will be better
	T pivot = array[low]; 
	for(int i=low, j=high; i<j;)
	{
		//scan for the first elem that smaller than pivot
		while(j>i && array[j]>=pivot)
		  j--;
		if(i<j) 
		{
		  //since array[i] is already be array[j] which is smaller than pivot,
		  //array[i+1] will firstly checkde in next scan
		  array[i++] = array[j];
		}
		
		//scan for the first elem that bigger than pivot
		while(i<j && array[i]<=pivot)
		  i++;
		if(i<j)
		{
		  array[j--] = array[i];
		}
	}
	
	//put the pivot to right position, pls notice that i will be equal with j at this point
	array[i] = pivot;
    
	return i;
}

template<typename T>
T order_select(T *array, const int low, const int high, const int order)
{
	if(low == high)  //逻辑推理上怎么理解这个递归返回条件呢？求高手解答，详细一点，谢谢
	  return array[low];
	
	int pivot_index = partition(array, low, high);
         //计算哨兵节点左边(包含哨兵节点)的元素数目n
	int leftSegLength = pivot_index - low + 1; 
         //如果顺序值小于等于n，则表明第i小的数存在于数组左半部分(以哨兵节点为界)
	if(order <= leftSegLength)
	{	
         	  return order_select(array, low, pivot_index, order);
	}
	else 
	{	
	  return order_select(array, pivot_index+1, high, order-leftSegLength);
	}
}