當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例

發(fā)布時(shí)間：2024/1/23 python 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

技術(shù)博客： https://github.com/yongxinz/tech-blog

同時(shí)，也歡迎關(guān)注我的微信公眾號 AlwaysBeta，更多精彩內(nèi)容等你來。

Elasticsearch 基本概念

Index：Elasticsearch用來存儲(chǔ)數(shù)據(jù)的邏輯區(qū)域，它類似于關(guān)系型數(shù)據(jù)庫中的database 概念。一個(gè)index可以在一個(gè)或者多個(gè)shard上面，同時(shí)一個(gè)shard也可能會(huì)有多個(gè)replicas。

Document：Elasticsearch里面存儲(chǔ)的實(shí)體數(shù)據(jù)，類似于關(guān)系數(shù)據(jù)中一個(gè)table里面的一行數(shù)據(jù)。

document由多個(gè)field組成，不同的document里面同名的field一定具有相同的類型。document里面field可以重復(fù)出現(xiàn)，也就是一個(gè)field會(huì)有多個(gè)值，即multivalued。

Document type：為了查詢需要，一個(gè)index可能會(huì)有多種document，也就是document type. 它類似于關(guān)系型數(shù)據(jù)庫中的 table 概念。但需要注意，不同document里面同名的field一定要是相同類型的。

Mapping：它類似于關(guān)系型數(shù)據(jù)庫中的 schema 定義概念。存儲(chǔ)field的相關(guān)映射信息，不同document type會(huì)有不同的mapping。

Python Elasticsearch DSL 使用簡介

連接 Es：

import elasticsearches = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])

先看一下搜索，q 是指搜索內(nèi)容，空格對 q 查詢結(jié)果沒有影響，size 指定個(gè)數(shù)，from_ 指定起始位置，filter_path 可以指定需要顯示的數(shù)據(jù)，如本例中顯示在最后的結(jié)果中的只有 _id 和 _type。

res_3 = es.search(index="bank", q="Holmes", size=1, from_=1)res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type'])

查詢指定索引的所有數(shù)據(jù)：

其中，index 指定索引，字符串表示一個(gè)索引；列表表示多個(gè)索引，如 index=["bank", "banner", "country"]；正則形式表示符合條件的多個(gè)索引，如 index=["apple*"]，表示以 apple 開頭的全部索引。

search 中同樣可以指定具體 doc-type。

from elasticsearch_dsl import Searchs = Search(using=es, index="index-test").execute()print s.to_dict()

根據(jù)某個(gè)字段查詢，可以多個(gè)查詢條件疊加：

s?=?Search(using=es,?index="index-test").query("match",?sip="192.168.1.1")s?=?s.query("match",?dip="192.168.1.2")s?=?s.excute()

多字段查詢：

from?elasticsearch_dsl.query?import?MultiMatch,?Matchmulti_match?=?MultiMatch(query='hello',?fields=['title',?'content'])s?=?Search(using=es,?index="index-test").query(multi_match)s?=?s.execute()print?s.to_dict()

還可以用 Q() 對象進(jìn)行多字段查詢，fields 是一個(gè)列表，query 為所要查詢的值。

from?elasticsearch_dsl?import?Qq?=?Q("multi_match",?query="hello",?fields=['title',?'content'])s?=?s.query(q).execute()print?s.to_dict()

Q() 第一個(gè)參數(shù)是查詢方法，還可以是 bool。

q?=?Q('bool',?must=[Q('match',?),?Q('match',?content='world')])s?=?s.query(q).execute()print?s.to_dict()

通過 Q() 進(jìn)行組合查詢，相當(dāng)于上面查詢的另一種寫法。

q?=?Q("match",?)?|?Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"should":?[...]}}q?=?Q("match",?)?&?Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"must":?[...]}}q?=?~Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"must_not":?[...]}}

過濾，在此為范圍過濾，range 是方法，timestamp 是所要查詢的 field 名字，gte 為大于等于，lt 為小于，根據(jù)需要設(shè)定即可。

關(guān)于 term 和 match 的區(qū)別，term 是精確匹配，match 會(huì)模糊化，會(huì)進(jìn)行分詞，返回匹配度分?jǐn)?shù)，(term 如果查詢小寫字母的字符串，有大寫會(huì)返回空即沒有命中，match 則是不區(qū)分大小寫都可以進(jìn)行查詢，返回結(jié)果也一樣)

#?范圍查詢s?=?s.filter("range",?timestamp={"gte":?0,?"lt":?time.time()}).query("match",?country="in")#?普通過濾res_3?=?s.filter("terms",?balance_num=["39225",?"5686"]).execute()

其他寫法：

s?=?Search()s?=?s.filter('terms',?tags=['search',?'python'])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'terms':?{'tags':?['search',?'python']}}]}}}s?=?s.query('bool',?filter=[Q('terms',?tags=['search',?'python'])])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'terms':?{'tags':?['search',?'python']}}]}}}s?=?s.exclude('terms',?tags=['search',?'python'])#?或者s?=?s.query('bool',?filter=[~Q('terms',?tags=['search',?'python'])])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'bool':?{'must_not':?[{'terms':?{'tags':?['search',?'python']}}]}}]}}}

聚合可以放在查詢，過濾等操作的后面疊加，需要加 aggs。

bucket 即為分組，其中第一個(gè)參數(shù)是分組的名字，自己指定即可，第二個(gè)參數(shù)是方法，第三個(gè)是指定的 field。

metric 也是同樣，metric 的方法有 sum、avg、max、min 等，但是需要指出的是，有兩個(gè)方法可以一次性返回這些值，stats 和 extended_stats，后者還可以返回方差等值。

#?實(shí)例1s.aggs.bucket("per_country",?"terms",?field="timestamp").metric("sum_click",?"stats",?field="click").metric("sum_request",?"stats",?field="request")#?實(shí)例2s.aggs.bucket("per_age",?"terms",?field="click.keyword").metric("sum_click",?"stats",?field="click")#?實(shí)例3s.aggs.metric("sum_age",?"extended_stats",?field="impression")#?實(shí)例4s.aggs.bucket("per_age",?"terms",?field="country.keyword")#?實(shí)例5，此聚合是根據(jù)區(qū)間進(jìn)行聚合a?=?A("range",?field="account_number",?ranges=[{"to":?10},?{"from":?11,?"to":?21}])res?=?s.execute()

最后依然要執(zhí)行 execute()，此處需要注意，s.aggs 操作不能用變量接收(如 res=s.aggs，這個(gè)操作是錯(cuò)誤的)，聚合的結(jié)果會(huì)保存到 res 中顯示。

排序

s?=?Search().sort('category',?'-title',?{"lines"?:?{"order"?:?"asc",?"mode"?:?"avg"}})

分頁

s?=?s[10:20]#?{"from":?10,?"size":?10}

一些擴(kuò)展方法，感興趣的同學(xué)可以看看：

s = Search()# 設(shè)置擴(kuò)展屬性使用`.extra()`方法s = s.extra(explain=True)# 設(shè)置參數(shù)使用`.params()`s = s.params(search_type="count")# 如要要限制返回字段，可以使用`source()`方法# only return the selected fieldss = s.source(['title', 'body'])# don't return any fields, just the metadatas = s.source(False)# explicitly include/exclude fieldss = s.source(include=["title"], exclude=["user.*"])# reset the field selections = s.source(None)# 使用dict序列化一個(gè)查詢s = Search.from_dict({"query": {"match": {"title": "python"}}})# 修改已經(jīng)存在的查詢s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})

參考文檔：

http://fingerchou.com/2017/08/12/elasticsearch-dsl-with-python-usage-1/

http://fingerchou.com/2017/08/13/elasticsearch-dsl-with-python-usage-2/

https://blog.csdn.net/JunFeng666/article/details/78251788

總結(jié)

以上是生活随笔為你收集整理的elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： c++ windows 点击按钮跳转另一
下一篇： python 线程池_Python线程池