使用 Leading Hint 控制 Join 顺序
概述
Leading Hint 特性允许用户手工指定查询中的表的连接顺序,在特定场景优化复杂查询性能。本文将详细介绍如何在 PALO 中使用 Leading Hint 来控制 join 的顺序。
注意: 当前 PALO 已经具备良好的开箱即用的能力,也就意味着在绝大多数场景下,PALO 会自适应的优化各种场景下的性能,无需用户来手工控制 hint 来进行业务调优。本章介绍的内容主要面向专业调优人员,业务人员仅做简单了解即可。
案例 1:调整左右表顺序
对于如下查询:
1mysql> explain shape plan select from t1 join t2 on t1.c1 = t2.c2;
2+------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner) |
4+------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t1] |
10| --------PhysicalDistribute[DistributionSpecHash] |
11| ----------PhysicalOlapScan[t2] |
12+------------------------------------------------------------------------------+
可以使用 Leading Hint,强制指定 join order 为 t2 join t1,调整原始连接顺序。
1mysql> explain shape plan select /*+ leading(t2 t1) */ * from t1 join t2 on t1.c1 = t2.c2;
2+------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner) |
4+------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t2] |
10| --------PhysicalDistribute[DistributionSpecHash] |
11| ----------PhysicalOlapScan[t1] |
12| |
13| Hint log: |
14| Used: leading(t2 t1) |
15| UnUsed: |
16| SyntaxError: |
17+------------------------------------------------------------------------------+
Hint log 展示了应用成功的 hint: Used: leading(t2 t1)
。
案例 2:强制生成左深树
1mysql> explain shape plan select /*+ leading(t1 t2 t3) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
2+--------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner) |
4+--------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
9| --------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
10| ----------PhysicalOlapScan[t1] |
11| ----------PhysicalDistribute[DistributionSpecHash] |
12| ------------PhysicalOlapScan[t2] |
13| --------PhysicalDistribute[DistributionSpecHash] |
14| ----------PhysicalOlapScan[t3] |
15| |
16| Hint log: |
17| Used: leading(t1 t2 t3) |
18| UnUsed: |
19| SyntaxError: |
20+--------------------------------------------------------------------------------+
同样,Hint log 展示了应用成功的 hint: Used: leading(t1 t2 t3)
。
案例 3:强制生成右深树
1mysql> explain shape plan select /*+ leading(t1 {t2 t3}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3;
2+----------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner) |
4+----------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute[DistributionSpecGather] |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
9| --------PhysicalOlapScan[t1] |
10| --------PhysicalDistribute[DistributionSpecHash] |
11| ----------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
12| ------------PhysicalOlapScan[t2] |
13| ------------PhysicalDistribute[DistributionSpecHash] |
14| --------------PhysicalOlapScan[t3] |
15| |
16| Hint log: |
17| Used: leading(t1 { t2 t3 }) |
18| UnUsed: |
19| SyntaxError: |
20+----------------------------------------------------------------------------------+
同样,Hint log 展示了应用成功的 hint: Used: leading(t1 { t2 t3 })
。
案例 4:强制生成 bushy 树
1mysql> explain shape plan select /*+ leading({t1 t2} {t3 t4}) */ * from t1 join t2 on t1.c1 = t2.c2 join t3 on t2.c2 = t3.c3 join t4 on t3.c3 = t4.c4;
2+-----------------------------------------------+
3| _Explain_ String |
4+-----------------------------------------------+
5| PhysicalResultSink |
6| --PhysicalDistribute |
7| ----PhysicalProject |
8| ------hashJoin[INNER_JOIN](t2.c2 = t3.c3) |
9| --------hashJoin[INNER_JOIN](t1.c1 = t2.c2) |
10| ----------PhysicalOlapScan[t1] |
11| ----------PhysicalDistribute |
12| ------------PhysicalOlapScan[t2] |
13| --------PhysicalDistribute |
14| ----------hashJoin[INNER_JOIN](t3.c3 = t4.c4) |
15| ------------PhysicalOlapScan[t3] |
16| ------------PhysicalDistribute |
17| --------------PhysicalOlapScan[t4] |
18| |
19| Used: leading({ t1 t2 } { t3 t4 }) |
20| UnUsed: |
21| SyntaxError: |
22+-----------------------------------------------+
同样,Hint log 展示了应用成功的 hint: Used: leading({ t1 t2 } { t3 t4 })
。
案例 5:view 作为整体参与连接
1mysql> explain shape plan select /*+ leading(alias t1) */ count(*) from t1 join (select c2 from t2 join t3 on t2.c2 = t3.c3) as alias on t1.c1 = alias.c2;
2+--------------------------------------------------------------------------------------+
3| _Explain_ String(Nereids Planner) |
4+--------------------------------------------------------------------------------------+
5| PhysicalResultSink |
6| --hashAgg[GLOBAL] |
7| ----PhysicalDistribute[DistributionSpecGather] |
8| ------hashAgg[LOCAL] |
9| --------PhysicalProject |
10| ----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = alias.c2)) otherCondition=() |
11| ------------PhysicalProject |
12| --------------hashJoin[INNER_JOIN] hashCondition=((t2.c2 = t3.c3)) otherCondition=() |
13| ----------------PhysicalProject |
14| ------------------PhysicalOlapScan[t2] |
15| ----------------PhysicalDistribute[DistributionSpecHash] |
16| ------------------PhysicalProject |
17| --------------------PhysicalOlapScan[t3] |
18| ------------PhysicalDistribute[DistributionSpecHash] |
19| --------------PhysicalProject |
20| ----------------PhysicalOlapScan[t1] |
21| |
22| Hint log: |
23| Used: leading(alias t1) |
24| UnUsed: |
25| SyntaxError: |
26+--------------------------------------------------------------------------------------+
同样,Hint log 展示了应用成功的 hint: Used: leading(alias t1)
。
案例 6:DistributeHint 与 LeadingHint 混用
1explain shape plan
2 select
3 nation,
4 o_year,
5 sum(amount) as sum_profit
6 from
7 (
8 select
9 /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
10 n_name as nation,
11 extract(year from o_orderdate) as o_year,
12 l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
13 from
14 part,
15 supplier,
16 lineitem,
17 partsupp,
18 orders,
19 nation
20 where
21 s_suppkey = l_suppkey
22 and ps_suppkey = l_suppkey
23 and ps_partkey = l_partkey
24 and p_partkey = l_partkey
25 and o_orderkey = l_orderkey
26 and s_nationkey = n_nationkey
27 and p_name like '%green%'
28 ) as profit
29 group by
30 nation,
31 o_year
32 order by
33 nation,
34 o_year desc;
上述 /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
hint 指定方式,混用了 leading 和 distribute hint 两种格式。leading 用于控制总体的表之间的相对 join 顺序,而 shuffle
和 broadcast
分别用于指定特定 join 使用何种 shuffle 方式。通过两种结合使用,可以灵活的控制连接顺序和连接方式,便于手工控制用户期望的计划行为。
使用建议:
- 建议使用 EXPLAIN 来仔细分析执行计划,以确保 Leading Hint 能达到预期的效果。
- PALO 版本升级或者业务数据变更时,应重新评估 Leading Hint 的效果,做到及时记录和调整。
总结
Leading Hint 是一种强大的可以手工控制连接顺序的功能,于此同时,也可以和 shuffle hint 结合使用,同时控制 join 分发方式,进而优化查询性能。注意这种高级特性,应当在充分理解查询特性及数据分布的基础上谨慎使用。