PostgreSQL 9.6并行聚合

2023年11月1日 332次阅读

版本9.6中的PostgreSQL增加了对并行聚合的支持.

With 9.6, PostgreSQL introduces initial support for parallel execution
of large queries. Only strictly read-only queries where the driving
table is accessed via a sequential scan can be parallelized. Hash
joins and nested loops can be performed in parallel, as can
aggregation (for supported aggregates). Much remains to be done, but
this is already a useful set of features.

>上面提到的支持的聚合是什么？
>在设计聚合函数以允许使用并行机器时是否有任何特殊注意事项？

最佳答案
PostgreSQL 9.6 User-defined Aggregates documentation现在提到并行聚合：

35.10.4. Partial Aggregation
Optionally, an aggregate function can support partial aggregation. The
idea of partial aggregation is to run the aggregate’s state transition
function over different subsets of the input data independently, and
then to combine the state values resulting from those subsets to
produce the same state value that would have resulted from scanning
all the input in a single operation. This mode can be used for
parallel aggregation by having different worker processes scan
different portions of a table. Each worker produces a partial state
value, and at the end those state values are combined to produce a
final state value. (In the future this mode might also be used for
purposes such as combining aggregations over local and remote tables;
but that is not implemented yet.)
To support partial aggregation, the aggregate definition must provide
a combine function, which takes two values of the aggregate’s state
type (representing the results of aggregating over two subsets of the
input rows) and produces a new value of the state type, representing
what the state would have been after aggregating over the combination
of those sets of rows. It is unspecified what the relative order of
the input rows from the two sets would have been. This means that it’s
usually impossible to define a useful combine function for aggregates
that are sensitive to input row order.