平均数、中位数、众数常用来描述数据的集中程度,在mysql中,均值的计算较为简单,但中位数和众数尚不支持可以直接计算的函数,在这里向大家介绍一下中位数和众数的计算方法。
本文所使用的数据表为score,包含三个字段:
s_id 为学生id
c_id 为课程id
s_score 为课程成绩
1、中位数
查找课程编号为‘02’号的学生成绩中位数
课程编号为‘02’的课程成绩为
mysql> select s_score from score where c_id='02';
+---------+
| s_score |
+---------+
| 90 |
| 60 |
| 80 |
| 30 |
| 87 |
| 89 |
+---------+
6 rows in set (0.00 sec)
查找中位数
mysql> select avg(c.s_score) from(
-> select a.s_score from score a,score b
-> where a.c_id=b.c_id and a.c_id='02'
-> group by a.s_score
-> having sum(case when a.s_score=b.s_score then 1 else 0 end)
-> >= abs(sum(sign(a.s_score-b.s_score)))
-> )c;
+----------------+
| avg(c.s_score) |
+----------------+
| 83.5000 |
+----------------+
1 row in set (0.00 sec)
解析:
当把having条件作为查询内容时,便比较清晰了,结果如下
mysql> select a.s_score,
-> sum(case when a.s_score=b.s_score then 1 else 0 end )as if_equal,
-> abs(sum(sign(a.s_score-b.s_score)))
-> from score a,score b where a.c_id=b.c_id and a.c_id='02'
-> group by a.s_score;
+---------+----------+-------------------------------------+
| s_score | if_equal | abs(sum(sign(a.s_score-b.s_score))) |
+---------+----------+-------------------------------------+
| 30 | 1 | 5 |
| 60 | 1 | 3 |
| 80 | 1 | 1 |
| 87 | 1 | 1 |
| 89 | 1 | 3 |
| 90 | 1 | 5 |
+---------+----------+-------------------------------------+
6 rows in set (0.00 sec)
2、众数
一组数据可以存在多个众数
mysql> select s_score from score where c_id='02';
+---------+
| s_score |
+---------+
| 90 |
| 60 |
| 80 |
| 30 |
| 80 |
| 89 |
+---------+
6 rows in set (0.00 sec)
mysql> select s_score from score
-> where c_id='02'
-> group by s_score
-> having count(*)
-> >= all(select count(*) from score where c_id='02' group by s_score);
+---------+
| s_score |
+---------+
| 80 |
+---------+
1 row in set (0.00 sec)
解析:
mysql> select s_score,count(*) from score where c_id='02' group by s_score;
+---------+----------+
| s_score | count(*) |
+---------+----------+
| 30 | 1 |
| 60 | 1 |
| 80 | 2 |
| 89 | 1 |
| 90 | 1 |
+---------+----------+
5 rows in set (0.00 sec)
count() >= all(select count() from score where c_id=’02’ group by s_score)
表示score中分组后某个s_score出现的次数 >= 所有不同s_score出现的次数。