Leading Hint
Leading Hint 是一种强大的查询优化技术,允许用户指导 PALO 优化器确定查询计划中的表连接顺序。正确使用 Leading Hint 可以显著提高复杂查询的性能。本文将详细介绍如何在 PALO 中使用 Leading Hint 来控制 join 顺序。
常规 Leading Hint
语法
Leading Hint 允许指定希望优化器遵循的表连接顺序。在 PALO 里面,Leading Hint 的基本语法如下:
1SELECT /*+ LEADING(tablespec [tablespec]...) */ ...
其中需要注意的是:
- Leading Hint 由
/*+
和*/
包围,并置于 SQL 语句中 SELECT 关键字之后。 tablespec
是表名或表别名,至少需要指定两个表。- 多个表之间用空格或','分隔。
- 可以使用大括号
{}
来显式地指定 Join Tree 的形状。
举例说明:
1mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2;
2+------------------------------------------------------------------------------+
3| Explain String(Nereids Planner) |
4+------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t2] |
10| --------PhysicalDistribute[DistributionSpecHash] |
11| ----------PhysicalOlapScan[t1] |
12| |
13| Hint log: |
14| Used: leading(t2 t1) |
15| UnUsed: |
16| SyntaxError: |
17+------------------------------------------------------------------------------+
当 Leading Hint 不生效的时候会走正常的流程生成计划,EXPLAIN 会显示使用的 Hint 是否生效,主要分三种来显示:
状态 | 描述 |
---|---|
Used |
Leading Hint 正常生效 |
Unused |
这里不支持的情况包含 Leading Hint 指定的 join order 与原 SQL 不等价或本版本暂不支持特性(详见限制) |
SyntaxError |
指 Leading Hint 语法错误,如找不到对应的表等 |
-
Leading Hint 语法默认构造出左深树:
SQL1mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3; 2+--------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+--------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute[DistributionSpecGather] | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 9| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 10| ----------PhysicalOlapScan[t1] | 11| ----------PhysicalDistribute[DistributionSpecHash] | 12| ------------PhysicalOlapScan[t2] | 13| --------PhysicalDistribute[DistributionSpecHash] | 14| ----------PhysicalOlapScan[t3] | 15| | 16| Hint log: | 17| Used: leading(t1 t2 t3) | 18| UnUsed: | 19| SyntaxError: | 20+--------------------------------------------------------------------------------+
-
同时允许使用大括号指定 Join 树形状:
SQL1mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on c1 = c2 join t3 on c2=c3; 2+----------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+----------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute[DistributionSpecGather] | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 9| --------PhysicalOlapScan[t1] | 10| --------PhysicalDistribute[DistributionSpecHash] | 11| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 12| ------------PhysicalOlapScan[t2] | 13| ------------PhysicalDistribute[DistributionSpecHash] | 14| --------------PhysicalOlapScan[t3] | 15| | 16| Hint log: | 17| Used: leading(t1 { t2 t3 }) | 18| UnUsed: | 19| SyntaxError: | 20+----------------------------------------------------------------------------------+
-
当有 View 作为别名参与 JoinReorder 的时候可以指定对应的 View 作为 Leading Hint 的参数。例:
SQL1mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2; 2 +--------------------------------------------------------------------------------------+ 3 | Explain String(Nereids Planner) | 4 +--------------------------------------------------------------------------------------+ 5 | PhysicalResultSink | 6 | --hashAgg[GLOBAL] | 7 | ----PhysicalDistribute[DistributionSpecGather] | 8 | ------hashAgg[LOCAL] | 9 | --------PhysicalProject | 10 | ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() | 11 | ------------PhysicalProject | 12 | --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 13 | ----------------PhysicalProject | 14 | ------------------PhysicalOlapScan[t2] | 15 | ----------------PhysicalDistribute[DistributionSpecHash] | 16 | ------------------PhysicalProject | 17 | --------------------PhysicalOlapScan[t3] | 18 | ------------PhysicalDistribute[DistributionSpecHash] | 19 | --------------PhysicalProject | 20 | ----------------PhysicalOlapScan[t1] | 21 | | 22 | Hint log: | 23 | Used: leading(alias t1) | 24 | UnUsed: | 25 | SyntaxError: | 26 +--------------------------------------------------------------------------------------+
案例
基础场景
-
建表语句如下:
SQL1CREATE DATABASE testleading; 2USE testleading; 3 4create table t1 (c1 int, c11 int) distributed by hash(c1) buckets 3 properties('replication_num' = '1'); 5create table t2 (c2 int, c22 int) distributed by hash(c2) buckets 3 properties('replication_num' = '1'); 6create table t3 (c3 int, c33 int) distributed by hash(c3) buckets 3 properties('replication_num' = '1'); 7create table t4 (c4 int, c44 int) distributed by hash(c4) buckets 3 properties('replication_num' = '1');
-
原始 plan:
SQL1mysql> explain shape plan select * from t1 join t2 on t1.c1 = c2; 2+-------------------------------------------+ 3| Explain String | 4+-------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) | 9| --------PhysicalOlapScan[t2] | 10| --------PhysicalDistribute | 11| ----------PhysicalOlapScan[t1] | 12+-------------------------------------------+
-
当我们需要交换 t1 和 t2 的 join 顺序时,只需在前面加上
leading(t2 t1)
即可。在执行explain
时,会显示是否使用了这个 hint。如下 Leading plan:Used
表示 Hint 正常生效SQL1mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on c1 = c2; 2+------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute[DistributionSpecGather] | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 9| --------PhysicalOlapScan[t2] | 10| --------PhysicalDistribute[DistributionSpecHash] | 11| ----------PhysicalOlapScan[t1] | 12| | 13| Hint log: | 14| Used: leading(t2 t1) | 15| UnUsed: | 16| SyntaxError: | 17+------------------------------------------------------------------------------+
-
如果 Leading Hint 存在语法错误,
explain
时会在SyntaxError
里显示相应信息,但计划仍能照常生成,只是不会使用 Leading 而已。例如:SQL1mysql> explain shape plan select /*+ leading(t2 t3) */ * from t1 join t2 on t1.c1 = c2; 2+--------------------------------------------------------+ 3| Explain String | 4+--------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) | 9| --------PhysicalOlapScan[t1] | 10| --------PhysicalDistribute | 11| ----------PhysicalOlapScan[t2] | 12| | 13| Used: | 14| UnUsed: | 15| SyntaxError: leading(t2 t3) Msg:can not find table: t3 | 16+--------------------------------------------------------+
扩展场景
-
左深树
上文我们提及,PALO 在查询语句不使用任何括号的情况下,Leading 会默认生成左深树。
SQL1mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3; 2+--------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+--------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute[DistributionSpecGather] | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 9| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 10| ----------PhysicalOlapScan[t1] | 11| ----------PhysicalDistribute[DistributionSpecHash] | 12| ------------PhysicalOlapScan[t2] | 13| --------PhysicalDistribute[DistributionSpecHash] | 14| ----------PhysicalOlapScan[t3] | 15| | 16| Hint log: | 17| Used: leading(t1 t2 t3) | 18| UnUsed: | 19| SyntaxError: | 20+--------------------------------------------------------------------------------+
-
右深树
当需要将计划的形状做成右深树、Bushy 树或者 zig-zag 树时,只需加上大括号来限制 plan 的形状即可,无需像 Oracle 使用 swap 从左深树一步步调整。
SQL1mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3; 2+-----------------------------------------------+ 3| Explain String | 4+-----------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN](t1.c1 = t2.c2) | 9| --------PhysicalOlapScan[t1] | 10| --------PhysicalDistribute | 11| ----------hashJoin[INNER_JOIN](t2.c2 = t3.c3) | 12| ------------PhysicalOlapScan[t2] | 13| ------------PhysicalDistribute | 14| --------------PhysicalOlapScan[t3] | 15| | 16| Used: leading(t1 { t2 t3 }) | 17| UnUsed: | 18| SyntaxError: | 19+-----------------------------------------------+
-
Bushy 树
SQL1mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4; 2+-----------------------------------------------+ 3| Explain String | 4+-----------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) | 9| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) | 10| ----------PhysicalOlapScan[t1] | 11| ----------PhysicalDistribute | 12| ------------PhysicalOlapScan[t2] | 13| --------PhysicalDistribute | 14| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) | 15| ------------PhysicalOlapScan[t3] | 16| ------------PhysicalDistribute | 17| --------------PhysicalOlapScan[t4] | 18| | 19| Used: leading({ t1 t2 } { t3 t4 }) | 20| UnUsed: | 21| SyntaxError: | 22+-----------------------------------------------+
-
zig-zag 树
SQL1mysql> explain shape plan select /*+ leading(t1 {t2 t3} t4) */ * from t1 join t2 on t1.c1 = c2 join t3 on c2 = c3 join t4 on c3 = c4; 2+--------------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+--------------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --PhysicalDistribute[DistributionSpecGather] | 7| ----PhysicalProject | 8| ------hashJoin[INNER_JOIN] hashCondition=((t3.c3 = t4.c4)) otherCondition=() | 9| --------PhysicalDistribute[DistributionSpecHash] | 10| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 11| ------------PhysicalOlapScan[t1] | 12| ------------PhysicalDistribute[DistributionSpecHash] | 13| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 14| ----------------PhysicalOlapScan[t2] | 15| ----------------PhysicalDistribute[DistributionSpecHash] | 16| ------------------PhysicalOlapScan[t3] | 17| --------PhysicalDistribute[DistributionSpecHash] | 18| ----------PhysicalOlapScan[t4] | 19| | 20| Hint log: | 21| Used: leading(t1 { t2 t3 } t4) | 22| UnUsed: | 23| SyntaxError: | 24+--------------------------------------------------------------------------------------+
-
Non-inner Join
当遇到非 inner-join(如 Outer Join 或 Semi/Anti Join)时,Leading Hint 会根据原始 SQL 语义自动推导各个 Join 的方式。若 Leading Hint 与原始 SQL 语义不同或无法生成,则会将其放入
UnUsed
中,但这并不影响计划正常流程的生成。以下是一个不能交换的例子:
SQL1-------- test outer join which can not swap 2-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23) 3mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3; 4+--------------------------------------------------------------------------------+ 5| Explain String(Nereids Planner) | 6+--------------------------------------------------------------------------------+ 7| PhysicalResultSink | 8| --PhysicalDistribute[DistributionSpecGather] | 9| ----PhysicalProject | 10| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 11| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() | 12| ----------PhysicalOlapScan[t1] | 13| ----------PhysicalDistribute[DistributionSpecHash] | 14| ------------PhysicalOlapScan[t2] | 15| --------PhysicalDistribute[DistributionSpecHash] | 16| ----------PhysicalOlapScan[t3] | 17| | 18| Hint log: | 19| Used: | 20| UnUsed: leading(t1 { t2 t3 }) | 21| SyntaxError: | 22+--------------------------------------------------------------------------------+
下面是一些可以交换的例子和不能交换的例子,读者可自行验证。
SQL1-------- test outer join which can swap 2-- (t1 leftjoin t2 on (P12)) innerjoin t3 on (P13) = (t1 innerjoin t3 on (P13)) leftjoin t2 on (P12) 3explain shape plan select * from t1 left join t2 on c1 = c2 join t3 on c1 = c3; 4explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 join t3 on c1 = c3; 5 6-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P13) = (t1 leftjoin t3 on (P13)) leftjoin t2 on (P12) 7explain shape plan select * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3; 8explain shape plan select /*+ leading(t1 t3 t2) */ * from t1 left join t2 on c1 = c2 left join t3 on c1 = c3; 9 10-- (t1 leftjoin t2 on (P12)) leftjoin t3 on (P23) = t1 leftjoin (t2 leftjoin t3 on (P23)) on (P12) 11select /*+ leading(t2 t3 t1) SWAP_INPUT(t1) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3; 12explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3; 13explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 left join t3 on c2 = c3; 14 15-------- test outer join which can not swap 16-- t1 leftjoin (t2 join t3 on (P23)) on (P12) != (t1 leftjoin t2 on (P12)) join t3 on (P23) 17-- eliminated to inner join 18explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 left join t2 on c1 = c2 join t3 on c2 = c3; 19explain graph select /*+ leading(t1 t2 t3) */ * from t1 left join (select * from t2 join t3 on c2 = c3) on c1 = c2; 20 21-- test semi join 22explain shape plan select * from t1 where c1 in (select c2 from t2); 23explain shape plan select /*+ leading(t2 t1) */ * from t1 where c1 in (select c2 from t2); 24 25-- test anti join 26explain shape plan select * from t1 where exists (select c2 from t2);
-
View
在涉及别名(Alias)的情况下,可以将别名作为一个完整独立的子树进行指定,并在这些子树内部根据文本序生成 Join 顺序
SQL1mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2; 2+--------------------------------------------------------------------------------------+ 3| Explain String(Nereids Planner) | 4+--------------------------------------------------------------------------------------+ 5| PhysicalResultSink | 6| --hashAgg[GLOBAL] | 7| ----PhysicalDistribute[DistributionSpecGather] | 8| ------hashAgg[LOCAL] | 9| --------PhysicalProject | 10| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() | 11| ------------PhysicalProject | 12| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() | 13| ----------------PhysicalProject | 14| ------------------PhysicalOlapScan[t2] | 15| ----------------PhysicalDistribute[DistributionSpecHash] | 16| ------------------PhysicalProject | 17| --------------------PhysicalOlapScan[t3] | 18| ------------PhysicalDistribute[DistributionSpecHash] | 19| --------------PhysicalProject | 20| ----------------PhysicalOlapScan[t1] | 21| | 22| Hint log: | 23| Used: leading(alias t1) | 24| UnUsed: | 25| SyntaxError: | 26+--------------------------------------------------------------------------------------+
Ordered Hint
Ordered hint 可以看做 leading hint 的一种特例,用于控制 join order 为文本序。
语法
Ordered Hint 的语法为 /*+ ORDERED */
,放置在 SELECT
语句中的 SELECT
关键字之后,紧接着查询的其余部分。
案例
以下是一个使用 Ordered Hint 的示例:
1mysql> explain shape plan select /*+ ORDERED */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
2+--------------------------------------------------------------------------------+
3| Explain String(Nereids Planner) |
4+--------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
9| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
10| ----------PhysicalProject |
11| ------------PhysicalOlapScan[t2] |
12| ----------PhysicalDistribute[DistributionSpecHash] |
13| ------------PhysicalProject |
14| --------------PhysicalOlapScan[t1] |
15| --------PhysicalDistribute[DistributionSpecHash] |
16| ----------PhysicalProject |
17| ------------PhysicalOlapScan[t3] |
18| |
19| Hint log: |
20| Used: ORDERED |
21| UnUsed: |
22| SyntaxError: |
23+--------------------------------------------------------------------------------+
与 Leading Hint 的关系:
当 Ordered Hint 和 Leading Hint 同时使用时,Ordered Hint 将优先于 Leading Hint。这意味着,即使指定了 Leading Hint,如果同时存在 Ordered Hint,查询计划将按照 Ordered Hint 的规则来执行,而 Leading Hint 将被忽略。以下是一个示例,展示了当两者同时使用时的情况:
1mysql> explain shape plan select /*+ ORDERED LEADING(t1 t2 t3) */ t1.c1 from t2 join t1 on t1.c1 = t2.c2 join t3 on c2 = c3;
2 +--------------------------------------------------------------------------------+
3 | Explain String(Nereids Planner) |
4 +--------------------------------------------------------------------------------+
5 | PhysicalResultSink |
6 | --PhysicalDistribute[DistributionSpecGather] |
7 | ----PhysicalProject |
8 | ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
9 | --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
10 | ----------PhysicalProject |
11 | ------------PhysicalOlapScan[t2] |
12 | ----------PhysicalDistribute[DistributionSpecHash] |
13 | ------------PhysicalProject |
14 | --------------PhysicalOlapScan[t1] |
15 | --------PhysicalDistribute[DistributionSpecHash] |
16 | ----------PhysicalProject |
17 | ------------PhysicalOlapScan[t3] |
18 | |
19 | Hint log: |
20 | Used: ORDERED |
21 | UnUsed: leading(t1 t2 t3) |
22 | SyntaxError: |
23 +--------------------------------------------------------------------------------+
总结
Leading Hint 是一个强大的手工控制 join order 的特性,在生产业务调优中应用广泛。使用好 leading hint 能够满足现场针对 join order 的调优需求,增加系统控制的灵活性。Ordered hint 是一种特殊的 leading hint,用于固定当前业务的 join order 为文本序,使用时需要注意和其他 Hint 之间的优先级关系。