當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

MySQL 8.0 新特性之哈希连接（Hash Join）

發布時間：2024/3/13 数据库 44 豆豆

生活随笔收集整理的這篇文章主要介紹了 MySQL 8.0 新特性之哈希连接（Hash Join）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

MySQL 開發組于 2019 年 10 月 14 日正式發布了 MySQL 8.0.18 GA 版本，帶來了一些新特性和增強功能。其中最引人注目的莫過于多表連接查詢支持 hash join 方式了。我們先來看看官方的描述：

MySQL 實現了用于內連接查詢的 hash join 方式。例如，從 MySQL 8.0.18 開始以下查詢可以使用 hash join 進行連接查詢：

SELECT * FROM t1 JOIN t2 ON t1.c1=t2.c1;

Hash join 不需要索引的支持。大多數情況下，hash join 比之前的 Block Nested-Loop 算法在沒有索引時的等值連接更加高效。使用以下語句創建三張測試表：

CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); CREATE TABLE t3 (c1 INT, c2 INT);

使用EXPLAIN FORMAT=TREE命令可以看到執行計劃中的 hash join，例如：

mysql> EXPLAIN FORMAT=TREE-> SELECT * -> FROM t1 -> JOIN t2 -> ON t1.c1=t2.c1\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)-> Table scan on t2 (cost=0.35 rows=1)-> Hash-> Table scan on t1 (cost=0.35 rows=1)

必須使用 EXPLAIN 命令的 FORMAT=TREE 選項才能看到節點中的 hash join。另外，EXPLAIN ANALYZE命令也可以顯示 hash join 的使用信息。這也是該版本新增的一個功能。

多個表之間使用等值連接的的查詢也會進行這種優化。例如以下查詢：

SELECT * FROM t1JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)JOIN t3 ON (t2.c1 = t3.c1);

在以上示例中，任何其他非等值連接的條件將會在連接操作之后作為過濾器使用。可以通過EXPLAIN FORMAT=TREE命令的輸出進行查看：

mysql> EXPLAIN FORMAT=TREE-> SELECT * -> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)-> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)-> Table scan on t3 (cost=0.35 rows=1)-> Hash-> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)-> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)-> Table scan on t2 (cost=0.35 rows=1)-> Hash-> Table scan on t1 (cost=0.35 rows=1)

從以上輸出同樣可以看出，包含多個等值連接條件的查詢也可以（會）使用多個 hash join 連接。

但是，如果任何連接語句（ON）中沒有使用等值連接條件，將不會采用 hash join 連接方式。例如：

mysql> EXPLAIN FORMAT=TREE-> SELECT * -> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 < t3.c1)\G *************************** 1. row *************************** EXPLAIN: <not executable by iterator executor>

此時，將會采用性能更慢的 block nested loop 連接算法。這與 MySQL 8.0.18 之前版本中沒有索引時的情況一樣：

mysql> EXPLAIN-> SELECT * -> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 < t3.c1)\G *************************** 1. row ***************************id: 1select_type: SIMPLEtable: t1partitions: NULLtype: ALL possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 1filtered: 100.00Extra: NULL *************************** 2. row ***************************id: 1select_type: SIMPLEtable: t2partitions: NULLtype: ALL possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 1filtered: 100.00Extra: Using where; Using join buffer (Block Nested Loop) *************************** 3. row ***************************id: 1select_type: SIMPLEtable: t3partitions: NULLtype: ALL possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 1filtered: 100.00Extra: Using where; Using join buffer (Block Nested Loop)

Hash join 連接同樣適用于不指定查詢條件時的笛卡爾積（Cartesian product），例如：

mysql> EXPLAIN FORMAT=TREE-> SELECT *-> FROM t1-> JOIN t2-> WHERE t1.c2 > 50\G *************************** 1. row *************************** EXPLAIN: -> Inner hash join (cost=0.70 rows=1)-> Table scan on t2 (cost=0.35 rows=1)-> Hash-> Filter: (t1.c2 > 50) (cost=0.35 rows=1)-> Table scan on t1 (cost=0.35 rows=1)

默認配置時，MySQL 所有可能的情況下都會使用 hash join。同時提供了兩種控制是否使用 hash join 的方法：

在全局或者會話級別設置服務器系統變量 optimizer_switch 中的 hash_join=on 或者 hash_join=off 選項。默認為 hash_join=on。
在語句級別為特定的連接指定優化器提示 HASH_JOIN 或者 NO_HASH_JOIN。

可以通過系統變量 join_buffer_size 控制 hash join 允許使用的內存數量；hash join 不會使用超過該變量設置的內存數量。如果 hash join 所需的內存超過該閾值，MySQL 將會在磁盤中執行操作。需要注意的是，如果 hash join 無法在內存中完成，并且打開的文件數量超過系統變量 open_files_limit 的值，連接操作可能會失敗。為了解決這個問題，可以使用以下方法之一：

增加 join_buffer_size 的值，確保 hash join 可以在內存中完成。
增加 open_files_limit 的值。

接下來我們比較一下 hash join 和 block nested loop 的性能，首先分別為 t1、t2 和 t3 生成 1000000 條記錄：

set join_buffer_size=2097152000;SET @@cte_max_recursion_depth = 99999999;INSERT INTO t1 -- INSERT INTO t2 -- INSERT INTO t3 WITH RECURSIVE t AS (SELECT 1 AS c1, 1 AS c2UNION ALLSELECT t.c1 + 1, t.c1 * 2FROM tWHERE t.c1 < 1000000 ) SELECT *FROM t;

沒有索引情況下的 hash join：

mysql> EXPLAIN ANALYZE-> SELECT COUNT(*)-> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: count(0) (actual time=22993.098..22993.099 rows=1 loops=1)-> Inner hash join (t3.c1 = t1.c1) (cost=9952535443663536.00 rows=9952435908880402) (actual time=14489.176..21737.032 rows=1000000 loops=1)-> Table scan on t3 (cost=0.00 rows=998412) (actual time=0.103..3973.892 rows=1000000 loops=1)-> Hash-> Inner hash join (t2.c1 = t1.c1) (cost=99682753413.67 rows=99682653660) (actual time=5663.592..12236.984 rows=1000000 loops=1)-> Table scan on t2 (cost=0.01 rows=998412) (actual time=0.067..3364.105 rows=1000000 loops=1)-> Hash-> Table scan on t1 (cost=100539.40 rows=998412) (actual time=0.133..3395.799 rows=1000000 loops=1)1 row in set (23.22 sec)mysql> SELECT COUNT(*)-> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 = t3.c1); +----------+ | COUNT(*) | +----------+ | 1000000 | +----------+ 1 row in set (12.98 sec)

實際運行花費了 12.98 秒。這個時候如果使用 block nested loop：

mysql> EXPLAIN FORMAT=TREE-> SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)-> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: <not executable by iterator executor>1 row in set (0.00 sec)SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)FROM t1JOIN t2 ON (t1.c1 = t2.c1)JOIN t3 ON (t2.c1 = t3.c1);

EXPLAIN 顯示無法使用 hash join。查詢跑了幾十分鐘也沒有出結果，其中一個 CPU 使用率到了 100%；因為一直在執行嵌套循環（1000000 的 3 次方）。

再看有索引時的 block nested loop 方法，增加索引：

mysql> CREATE index idx1 ON t1(c1); Query OK, 0 rows affected (7.39 sec) Records: 0 Duplicates: 0 Warnings: 0mysql> CREATE index idx2 ON t2(c1); Query OK, 0 rows affected (6.77 sec) Records: 0 Duplicates: 0 Warnings: 0mysql> CREATE index idx3 ON t3(c1); Query OK, 0 rows affected (7.23 sec) Records: 0 Duplicates: 0 Warnings: 0

查看執行計劃并運行相同的查詢語句：

mysql> EXPLAIN ANALYZE-> SELECT COUNT(*)-> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 = t3.c1)\G *************************** 1. row *************************** EXPLAIN: -> Aggregate: count(0) (actual time=47684.034..47684.035 rows=1 loops=1)-> Nested loop inner join (cost=2295573.22 rows=998412) (actual time=0.116..46363.599 rows=1000000 loops=1)-> Nested loop inner join (cost=1198056.31 rows=998412) (actual time=0.087..25788.696 rows=1000000 loops=1)-> Filter: (t1.c1 is not null) (cost=100539.40 rows=998412) (actual time=0.050..5557.847 rows=1000000 loops=1)-> Index scan on t1 using idx1 (cost=100539.40 rows=998412) (actual time=0.043..3253.769 rows=1000000 loops=1)-> Index lookup on t2 using idx2 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)-> Index lookup on t3 using idx3 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)1 row in set (47.68 sec)mysql> SELECT COUNT(*)-> FROM t1-> JOIN t2 -> ON (t1.c1 = t2.c1)-> JOIN t3 -> ON (t2.c1 = t3.c1); +----------+ | COUNT(*) | +----------+ | 1000000 | +----------+ 1 row in set (19.56 sec)

實際運行花費了 19.56 秒。所以在我們這個場景中的測試結果如下：

Hash Join（無索引）Block Nested Loop（無索引）Block Nested Loop（有索引）

12.98 s

未返回

19.56 s

再增加一個 Oracle 12c 中無索引時 hash join 結果：1.282 s。

再增加一個 PostgreSQL 11.5 中無索引時 hash join 結果：6.234 s。

再增加一個 SQL 2017 中無索引時 hash join 結果：5.207 s。

總結

以上是生活随笔為你收集整理的MySQL 8.0 新特性之哈希连接（Hash Join）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：在Android上应用PhoneGap和
下一篇： mongo数据库取并集