Clickhouse查询语句 sample
生活随笔
收集整理的這篇文章主要介紹了
Clickhouse查询语句 sample
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
注意:
sample子句只能用于MergeTree系列引擎的數據表,并且在create table的時候就聲明sample by 抽樣表達式。
概述:
sample子句提供了近似計算的功能,能夠實現數據采樣的功能,使查詢僅僅返回采樣數據而不是全部數據,從而有效減少查詢負載。
sample子句的采樣設計是一種冪等設計,即在數據發生變化的時候使用相同的采樣規則能返回相同的數據。這種特性非常適合那些可以接受近似查詢結果的場景。
官方提供了如下的使用場景:
1.When you have strict timing requirements (like \<100ms) but you can’t justify the cost of additional hardwareresources to meet them. 2.When your raw data is not accurate, so approximation doesn’t noticeably degrade the quality. 3.Business requirements target approximate results (for cost-effectiveness, or to market exact results to premium users). Clickhouse> create table clicks(CounterID UInt64,EventDate DATE, UserID UInt64) engine=MergeTree() order by (CounterID,intHash32(UserID)) sample by intHash32(UserID);CREATE TABLE clicks (`CounterID` UInt64,`EventDate` DATE,`UserID` UInt64 ) ENGINE = MergeTree() ORDER BY (CounterID, intHash32(UserID)) SAMPLE BY intHash32(UserID)插入測試數據: Clickhouse> insert into clicks select CounterID,EventDate,UserID from hits_v1;INSERT INTO clicks SELECT CounterID,EventDate,UserID FROM hits_v1Ok.0 rows in set. Elapsed: 1.003 sec. Processed 8.87 million rows, 124.23 MB (8.85 million rows/s., 123.88 MB/s.) clicks表的定義按照intHash32(UserID) 分布后的結果采樣查詢。 聲明Sample KEY的時候有兩點需要注意: 1.sample by 所聲明的表達式必須同時包含在主鍵的聲明內 2.sample key必須UInt類型,若不是可以定義但是查詢的時候會拋出異常。SAMPLE 子句支持三種格式:1.sample k k表示因子系數,采樣因子,取值范圍【0,1】,若在0--1之間的小數則表示采樣,若為0或者1則等同于不采樣。select CounterID from clicks sample 0.1 等同于: select CounterID from clicks sample 1/10查詢獲取近似結果:Clickhouse> select count() from clicks;SELECT count() FROM clicks┌─count()─┐ │ 8873898 │ └─────────┘1 rows in set. Elapsed: 0.003 sec. Clickhouse> select count() from clicks sample 0.1;SELECT count() FROM clicks SAMPLE 1 / 10┌─count()─┐ │ 839889 │ └─────────┘1 rows in set. Elapsed: 0.029 sec. Processed 5.89 million rows, 94.27 MB (201.86 million rows/s., 3.23 GB/s.) Clickhouse> select CounterID,_sample_factor from clicks sample 0.1 limit 2;SELECT CounterID,_sample_factor FROM clicks SAMPLE 1 / 10 LIMIT 2┌─CounterID─┬─_sample_factor─┐ │ 57 │ 10 │ │ 57 │ 10 │ └───────────┴────────────────┘2 rows in set. Elapsed: 0.012 sec. 可以通過虛擬字段_sample_factor 查詢采樣系數。2.sample n n表示采樣的樣本數量。n表示至少采樣多少行數據。n=1表示不使用采樣,n的范圍從2到表的總行數。Clickhouse> select count() from clicks sample 10000;SELECT count() FROM clicks SAMPLE 10000┌─count()─┐ │ 9251 │ └─────────┘1 rows in set. Elapsed: 0.025 sec. Processed 5.48 million rows, 87.72 MB (223.47 million rows/s., 3.58 GB/s.) Clickhouse> select count()*any(_sample_factor) from clicks sample 10000;SELECT count() * any(_sample_factor) FROM clicks SAMPLE 10000┌─multiply(count(), any(_sample_factor))─┐ │ 8154379.059200001 │ └────────────────────────────────────────┘1 rows in set. Elapsed: 0.024 sec. Processed 5.48 million rows, 54.82 MB (229.44 million rows/s., 2.29 GB/s.) Clickhouse> select CounterID,_sample_factor from clicks sample 10000 limit 2;SELECT CounterID,_sample_factor FROM clicks SAMPLE 10000 LIMIT 2┌─CounterID─┬────_sample_factor─┐ │ 1294 │ 881.4592000000001 │ └───────────┴───────────────────┘ ┌─CounterID─┬────_sample_factor─┐ │ 1366 │ 881.4592000000001 │ └───────────┴───────────────────┘2 rows in set. Elapsed: 0.041 sec. Processed 7.69 thousand rows, 123.01 KB (187.84 thousand rows/s., 3.01 MB/s.) 數據采樣的范圍是一個近似值,采樣數據的最小粒度有index_granularity 索引粒度決定的。 若設置一個小于索引粒度或者較小的n值沒有意義。3. sample k offset n 表示按照因子系數和偏移量采樣。Clickhouse> select CounterID,_sample_factor from clicks sample 0.4 offset 0.5 limit 1;SELECT CounterID,_sample_factor FROM clicks SAMPLE 4 / 10 OFFSET 5 / 10 LIMIT 1┌─CounterID─┬─_sample_factor─┐ │ 57 │ 2.5 │ └───────────┴────────────────┘1 rows in set. Elapsed: 0.017 sec. Clickhouse> select CounterID,_sample_factor from clicks sample 0.6 offset 0.5 limit 1;SELECT CounterID,_sample_factor FROM clicks SAMPLE 6 / 10 OFFSET 5 / 10 LIMIT 1┌─CounterID─┬─────_sample_factor─┐ │ 57 │ 1.6666666666666667 │ └───────────┴────────────────────┘1 rows in set. Elapsed: 0.007 sec. 當采樣因子溢出(offset 的值+sample的值大于1)則溢出的數據則自動階段。?
參考:
https://clickhouse.tech/docs/en/sql-reference/statements/select/sample/
總結
以上是生活随笔為你收集整理的Clickhouse查询语句 sample的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 易宝支付在线支付测试注意事项:
- 下一篇: 查看锐捷poe交换机供电状态_锐捷POE