當前位置:
首頁 >
前端技术
> javascript
>内容正文
javascript
DLA SQL技巧:行、列转换和JSON数据列展开
生活随笔
收集整理的這篇文章主要介紹了
DLA SQL技巧:行、列转换和JSON数据列展开
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1. 簡介
在數據庫SQL處理中,常常有行轉列(Pivot)和列轉行(Unpivot)的數據處理需求。本文以示例說明在Data Lake Analytics中,如何使用SQL的一些技巧,達到行轉列(Pivot)和列轉行(Unpivot)的目的。另外,DLA支持函數式表達式的處理邏輯、豐富的JSON數據處理函數和UNNEST的SQL語法,結合這些功能,能夠實現非常豐富、強大的SQL數據處理語義和能力,本文也以JSON數據列展開為示例,說明在DLA中使用這種SQL的技巧。
2. 行轉列(Pivot)
2.1 樣例數據
test_pivot表內容:
+------+----------+---------+--------+ | id | username | subject | source | +------+----------+---------+--------+ | 1 | 張三 | 語文 | 60 | | 2 | 李四 | 數學 | 70 | | 3 | 王五 | 英語 | 80 | | 4 | 王五 | 數學 | 75 | | 5 | 王五 | 語文 | 57 | | 6 | 李四 | 語文 | 80 | | 7 | 張三 | 英語 | 100 | +------+----------+---------+--------+2.2 方法一:通過CASE WHEN語句
SQL語句:
SELECT username,max(CASE WHEN subject = '語文' THEN source END) AS `語文`,max(CASE WHEN subject = '數學' THEN source END) AS `數學`,max(CASE WHEN subject = '英語' THEN source END) AS `英語` FROM test_pivot GROUP BY username ORDER BY username;結果:
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+2.3 方法二:通過map_agg函數
該方法思路上分為兩個步驟:
第一步,通過map_agg函數把兩個列的多行的值,映射為map;
第二步,通過map的輸出,達到多列輸出的目的。
第一步SQL:
SELECT username, map_agg(subject, source) kv FROM test_pivot GROUP BY username ORDER BY username;第一步輸出:
+----------+-----------------------------------+ | username | kv | +----------+-----------------------------------+ | 張三 | {語文=60, 英語=100} | | 李四 | {數學=70, 語文=80} | | 王五 | {數學=75, 語文=57, 英語=80} | +----------+-----------------------------------+可以看到map_agg的輸出效果。
最終,該方法的SQL:
SELECTusername,if(element_at(kv, '語文') = null, null, kv['語文']) AS `語文`,if(element_at(kv, '數學') = null, null, kv['數學']) AS `數學`,if(element_at(kv, '英語') = null, null, kv['英語']) AS `英語` FROM (SELECT username, map_agg(subject, source) kvFROM test_pivotGROUP BY username ) t ORDER BY username;結果:
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+3. 列轉行(Unpivot)
3.1 樣例數據
test_unpivot表內容:
+----------+--------+--------+--------+ | username | 語文 | 數學 | 英語 | +----------+--------+--------+--------+ | 張三 | 60 | NULL | 100 | | 李四 | 80 | 70 | NULL | | 王五 | 57 | 75 | 80 | +----------+--------+--------+--------+3.2 方法一:通過UNION語句
SQL語句:
SELECT username, subject, source FROM (SELECT username, '語文' AS subject, `語文` AS source FROM test_unpivot WHERE `語文` is not nullUNIONSELECT username, '數學' AS subject, `數學` AS source FROM test_unpivot WHERE `數學` is not nullUNIONSELECT username, '英語' AS subject, `英語` AS source FROM test_unpivot WHERE `英語` is not null ) ORDER BY username;結果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 張三 | 語文 | 60 | | 張三 | 英語 | 100 | | 李四 | 語文 | 80 | | 李四 | 數學 | 70 | | 王五 | 英語 | 80 | | 王五 | 語文 | 57 | | 王五 | 數學 | 75 | +----------+---------+--------+3.3 方法二:通過CROSS JOIN UNNEST語句
SQL語句:
SELECT t1.username, t2.subject, t2.source FROM test_unpivot t1 CROSS JOIN UNNEST (array['語文', '數學', '英語'],array[`語文`, `數學`, `英語`] ) t2 (subject, source) WHERE t2.source is not null結果:
+----------+---------+--------+ | username | subject | source | +----------+---------+--------+ | 張三 | 語文 | 60 | | 張三 | 英語 | 100 | | 李四 | 語文 | 80 | | 李四 | 數學 | 70 | | 王五 | 語文 | 57 | | 王五 | 數學 | 75 | | 王五 | 英語 | 80 | +----------+---------+--------+4. JSON數據列展開
JSON數據的表達能力非常靈活,因此在數據庫和SQL中,常常需要處理JSON數據,常常碰到稍復雜的需求,就是將JSON數據中的某些屬性字段,進行展開轉換,轉成行、列的關系型表達。
4.1 基本思路和步驟
- 使用JSON函數,對JSON字符串進行解析和數據提取;
- 提取、轉換為ARRAY或者MAP的數據結構,如有需要,可以使用Lambda函數式表達式進行轉換處理;
- 利用UNNEST語法進行列展開。
下面以多個示例說明。
4.2 用UNNEST對MAP進行關系型展開
SQL示例:
SELECT t.m, t.n FROM (SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data ) CROSS JOIN unnest(map_data) AS t(m, n);結果:
+------+------+ | m | n | +------+------+ | foo | 1 | | bar | 2 | +------+------+4.3 用UNNEST對JSON數據進行關系型展開
SQL示例:
SELECT json_extract(t.a, '$.a') AS a, json_extract(t.a, '$.b') AS b FROM (SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS package_array ) CROSS JOIN UNNEST(package_array) AS t(a);結果:
+------+------+ | a | b | +------+------+ | 1 | 2 | | 3 | 4 | +------+------+SQL示例:
SELECT t.m AS _col1, t.n AS _col2 FROM (SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);結果:
+---------------+-----------------+ | _col1 | _col2 | +---------------+-----------------+ | {"a":1,"b":2} | {"a":5,"b":6} | | {"a":3,"b":4} | {"a":7,"b":8} | | NULL | {"a":9,"b":10} | | NULL | {"a":11,"b":12} | +---------------+-----------------+SQL示例:
SELECT json_extract(t.m, '$.a') AS _col1, json_extract(t.m, '$.b') AS _col2, json_extract(t.n, '$.a') AS _col3, json_extract(t.n, '$.b') AS _col4 FROM (SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x') AS array<JSON>) AS array_1, cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x') AS array<JSON>) AS array_2 ) CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);結果:
+-------+-------+-------+-------+ | _col1 | _col2 | _col3 | _col4 | +-------+-------+-------+-------+ | 1 | 2 | 5 | 6 | | 3 | 4 | 7 | 8 | | NULL | NULL | 9 | 10 | | NULL | NULL | 11 | 12 | +-------+-------+-------+-------+4.4 結合Lambda表達式,用UNNEST對JSON數據進行關系型展開
SQL示例:
SELECT count(*) AS cnt, package_name FROM ( SELECT t.a AS package_name FROM ( SELECT transform(packages_map_array, x -> Element_at(x, 'packageName')) AS package_array FROM (SELECT cast(Json_extract(data_json, '$.packages') AS array<map<VARCHAR, VARCHAR>>) AS packages_map_arrayFROM (SELECT json_parse(data) AS data_jsonFROM ( SELECT '{"packages": [{"appName": "鐵路12306","packageName": "com.MobileTicket","versionName": "4.1.9","versionCode": "194"},{"appName": "QQ飛車","packageName": "com.tencent.tmgp.speedmobile","versionName": "1.11.0.13274","versionCode": "1110013274"},{"appName": "掌閱","packageName": "com.chaozh.iReaderFree","versionName": "7.11.0","versionCode": "71101"}]}'AS data ))) ) AS x (package_array)CROSS JOIN UNNEST(package_array) AS t (a) ) GROUP BY package_name ORDER BY cnt DESC;結果:
+------+------------------------------+ | cnt | package_name | +------+------------------------------+ | 1 | com.MobileTicket | | 1 | com.tencent.tmgp.speedmobile | | 1 | com.chaozh.iReaderFree | +------+------------------------------+
原文鏈接
本文為云棲社區原創內容,未經允許不得轉載。
總結
以上是生活随笔為你收集整理的DLA SQL技巧:行、列转换和JSON数据列展开的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 达摩院送你100万,请坚持“看月亮”
- 下一篇: Schedulerx2.0工作流支持数据