當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

白话Elasticsearch16-深度探秘搜索技术之使用原生cross-fiedls技术解决搜索弊端

發布時間：2025/3/21 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了白话Elasticsearch16-深度探秘搜索技术之使用原生cross-fiedls技术解决搜索弊端小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

概述
例子

概述

繼續跟中華石杉老師學習ES，第15篇

課程地址： https://www.roncoo.com/view/55

白話Elasticsearch14-基于multi_match 使用most_fields策略進行cross-fields search弊端

白話Elasticsearch15-使用copy_to定制組合field解決cross-fields搜索弊端

承接上兩篇，接下來看下如何使用原生cross-fiels技術解決搜索的弊端

例子

使用DSL如下，可以解決 "operator": "and",

GET /forum/article/_search {"query": {"multi_match": {"query": "Peter Smith","type": "cross_fields", "operator": "and","fields": ["author_first_name", "author_last_name"]}} }

返回結果:

{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 2.3258216,"hits": [{"_index": "forum","_type": "article","_id": "1","_score": 2.3258216,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "2017-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog","content": "i like to write best elasticsearch article","sub_title": "learning more courses","author_first_name": "Peter","author_last_name": "Smith","new_author_last_name": "Smith","new_author_first_name": "Peter"}},{"_index": "forum","_type": "article","_id": "5","_score": 1.7770995,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "2019-05-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}}]} }

那是如何解決cromss fields的弊端的呢？我們來分析下

問題1：只是找到盡可能多的field匹配的doc，而不是某個field完全匹配的doc

答：解決，要求每個term都必須在任何一個field中出現

Peter，Smith

要求Peter必須在author_first_name或author_last_name中出現
要求Smith必須在author_first_name或author_last_name中出現

Peter Smith可能是橫跨在多個field中的，所以必須要求每個term都在某個field中出現，組合起來才能組成我們想要的標識，完整的人名

原來most_fiels，可能像Smith Williams也可能會出現，因為most_fields要求只是任何一個field匹配了就可以，匹配的field越多，分數越高

問題2：most_fields，沒辦法用minimum_should_match去掉長尾數據，就是匹配的特別少的結果 --> 解決，既然每個term都要求出現，長尾肯定被去除掉了

答：java hadoop spark --> 這3個term都必須在任何一個field出現了

比如有的document，只有一個field中包含一個java，那就被干掉了，作為長尾就沒了

問題3：TF/IDF算法，比如Peter Smith和Smith Williams，搜索Peter Smith的時候，由于first_name中很少有Smith的，所以query在所有document中的頻率很低，得到的分數很高，可能Smith Williams反而會排在Peter Smith前面

答：計算IDF的時候，將每個query在每個field中的IDF都取出來，取最小值，就不會出現極端情況下的極大值了

Peter Smith

Peter
Smith

Smith，在author_first_name這個field中，在所有doc的這個Field中，出現的頻率很低，導致IDF分數很高；Smith在所有doc的author_last_name field中的頻率算出一個IDF分數，因為一般來說last_name中的Smith頻率都較高，所以IDF分數是正常的，不會太高；然后對于Smith來說，會取兩個IDF分數中，較小的那個分數。就不會出現IDF分過高的情況。

總結

以上是生活随笔為你收集整理的白话Elasticsearch16-深度探秘搜索技术之使用原生cross-fiedls技术解决搜索弊端的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：白话Elasticsearch15-深度
下一篇：白话Elasticsearch17-深度