优化具有时区转换的Mysql查询并逐个小时

这是
MySql 5.5中的表,有3000万条记录

CREATE TABLE `campaign_logs` (
  `domain` varchar(50) DEFAULT NULL,
  `campaign_id` varchar(50) DEFAULT NULL,
  `subscriber_id` varchar(50) DEFAULT NULL,
  `message` varchar(21000) DEFAULT NULL,
  `log_time` datetime DEFAULT NULL,
  `log_type` varchar(50) DEFAULT NULL,
  `level` varchar(50) DEFAULT NULL,
  `campaign_name` varchar(500) DEFAULT NULL,
  KEY `subscriber_id_index` (`subscriber_id`),
  KEY `log_type_index` (`log_type`),
  KEY `log_time_index` (`log_time`),
  KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
  KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

在下面的查询中,我正在按时间分组进行分组

QUERY

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date

UNION ALL

SELECT
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date;

结果

上面的查询将给出这样的结果

+---------------+-------+----------------+-------------+
| EMAIL_CLICKED | 1 AM  |             71 |          83 |
| EMAIL_CLICKED | 1 PM  |             25 |          27 |
| EMAIL_SENT    | 10 AM |             51 |          59 |
| EMAIL_OPENED  | 10 PM |             16 |          18 |

这是上述查询的解释

说明

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | select_type  | table         | type  | possible_keys                             | key                                       | key_len | ref  | rows   | Extra                                    |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
|  1 | PRIMARY      | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |  55074 | Using where; Using index; Using filesort |
|  2 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL | 330578 | Using where; Using index; Using filesort |
|  3 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |   1589 | Using where; Using index; Using filesort |
|NULL| UNION RESULT | <union1,2,3>  | ALL   | NULL                                      | NULL                                      | NULL    | NULL |   NULL |                                          |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+

优化?

我们在这张桌子上有一个覆盖索引.

此查询需要很长时间(超过1分钟).

如果我从查询中删除distinct_count(subscriber_id),那么我们将在1.5秒内获得结果,但我需要查询中的distinct_count subscriber_id.

有没有办法优化这个查询?

谢谢

最佳答案 您没有处理大量数据,因此该组不应该花费40秒 – 假设您不在桌面上具有大量锁定活动的非常繁忙的服务器上.

试试这个版本的查询(限于一个log_type):

SELECT log_type,
       DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time,
       count(DISTINCT subscriber_id) AS distinct_count,
       count(subscriber_id) AS total_count
FROM stats.campaign_logs
WHERE DOMAIN = 'xxxx' AND
      campaign_id='1234' AND
      log_type = 'EMAIL_SENT' AND
      log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30')
GROUP BY time;

这应该最佳地使用索引.如果这很快,那么使用union all将行组合在一起.丑陋,但由于索引优化,有时union all比OR / IN快得多.

点赞