本文是SIGMOD论文解读。Automatic Database Management System Tuning Through Large-scale Machine Learning是CMU教授Andy Pavlo以及其phd学生Dana等人在SIGMOD17发表的智能调参论文,称之为ottertune。该论文引用了另一篇paper“Tuning database configuration parameters with iTuned”发表于VLDB2009。我个人认为从核心idea上并无太大差别。
下面说一下ottertune的流程与核心思想
Let’s take an example. I already have history data with two metrics (innodb_pages_reads, innodb_io_reads), three workloads (TPCC, YCSB, wikipedia) and four configrations. So I can get two matrices:
Matrix1 (innodb_pages_reads)
conf1 conf2 conf3 conf4
TPCC 20 30 40 50
YCSB 100 NULL 300 400
WIKI 50 60 NULL 80
Matrix2 (innodb_io_reads)
conf1 conf2 conf3 conf4
TPCC 200 300 400 500
YCSB 100 NULL 300 400
WIKI 500 600 NULL 800
The recommendation steps for target wokload (Aliworkload) would like this:
- We run current target workload (Xworkload) with conf1 (as the defaut configuration), the 5 metrics is (11, 20, 31, 40, 51). So the similar workload is TPC-C
- Take all of the previous data you have for TPC-C and combine it with all of the data collected so far from the current workload. You use this data to train a GP model (again, the configurations are your input matrix and your target objective metric, such as the latency, is your output matrix. Then, starting with a bunch of sample configurations (let’s say for now they’re randomly generated), use the GP model along with gradient descent to predict the means/variances of the sample points and walk towards the nearest optimum (for latency this would be the nearest minimum). Use an exploration/exploitation tradeoff algorithm like UCB (upper confidence bound, or if using a metric like latency where lower is better, lower confidence bound) to select the next configuration to run. Let’s call this conf2. See https://github.com/cmu-db/ott… and https://github.com/cmu-db/ott… for more details.
- Install conf2 on the DBMS and observe the workload for some minutes/hours.
- Repeat steps 1 – 3 until satisfied with the improvement. Note that in the next iteration of step 1, you will now predict the metrics for TPC-C, YCSB, & Wiki for both conf1 and conf2, so the workload that you first selected as being the most similar may change over the course of the tuning session.