08.suggester02term_suggester
文章目錄
- 1. Term suggester
- 1. 常見的參數(shù)
- 2. 其他的參數(shù)
- 3. 請(qǐng)求樣例
1. Term suggester
In order to understand the format of suggestions, please read the Suggesters page first.
term suggester根據(jù)編輯距離suggest term。在提出term之前先對(duì)提供的suggest text進(jìn)行分詞。每個(gè)分詞的suggest text token 都會(huì)提供suggest的term。term suggester 并未將整個(gè)query作為請(qǐng)求的一部分考慮在內(nèi)。
The term suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The term suggester doesn’t take the query into account that is part of request.
1. 常見的參數(shù)
Common suggest options:
missing:僅對(duì)未在索引中的suggest text term提供suggest。這是默認(rèn)值。
popular:僅suggest哪些比原始suggest text term在更多的文檔中出現(xiàn)的term。
always:根據(jù)suggest text中的term suggest任何匹配的suggest。
missing: Only provide suggestions for suggest text terms that are not in the index. This is the default.
popular: Only suggest suggestions that occur in more docs than the original suggest text term.
always: Suggest any matching suggestions based on terms in the suggest text.
Other term suggest options:
2. 其他的參數(shù)
1.max_edits
候選suggest可以具有最大編輯距離。只能是1到2之間的值。任何其他值都將導(dǎo)致引發(fā)錯(cuò)誤的請(qǐng)求錯(cuò)誤。默認(rèn)為2。
2.prefix_length
必須匹配的最小前綴字符數(shù)才能成為suggest的候選者。默認(rèn)值為1。增加此數(shù)字可提高拼寫檢查性能。通常用在拼寫錯(cuò)誤不會(huì)出現(xiàn)在前面幾個(gè)字符的情況,比如英文單詞。 (舊名稱“ prefix_len”已棄用)
3.min_word_length
suggest text term必須包含的最小長度。默認(rèn)值為4。(舊名稱“ min_word_len”已棄用)
4.shard_size
設(shè)置要從每個(gè)單獨(dú)的分片中檢索的suggest的最大數(shù)量。在reduce匯總階段,僅根據(jù)size選項(xiàng)返回前N個(gè)suggest。默認(rèn)為size選項(xiàng)。將此值設(shè)置為大于size的值可能很有用,以便以性能為代價(jià)獲得更準(zhǔn)確的文檔頻率以進(jìn)行拼寫更正。由于term在分片之間是獨(dú)立的,因此分片級(jí)別文檔的頻率可能不準(zhǔn)確。增大這個(gè)設(shè)置將使這些文檔搜索更加準(zhǔn)確。
5.max_inspections
一個(gè)因子,用于與shards_size相乘,以便在shard級(jí)別上檢查更多的候選拼寫更正??梢砸孕阅転榇鷥r(jià)提高準(zhǔn)確性。默認(rèn)為5。
6.min_doc_freq
suggest應(yīng)出現(xiàn)的最小文檔數(shù)閾值??梢詫⑵渲付榻^對(duì)數(shù)量或相對(duì)數(shù)量的文檔數(shù)。通過僅suggest高頻項(xiàng)可以提高質(zhì)量。默認(rèn)為0f且未啟用。如果指定的值大于1,則數(shù)字不能為小數(shù)。分片級(jí)別文檔頻率用于此選項(xiàng)。
7.max_term_freq
可以包含suggest text令牌的文檔數(shù)量的最大閾值??梢允窍鄬?duì)百分比數(shù)字(例如0.4)或代表文檔頻率的絕對(duì)數(shù)字。如果指定的值大于1,則不能指定小數(shù)。默認(rèn)為0.01f。這可以用來排除高頻term-通常被正確拼寫-的拼寫檢查。這也提高了拼寫檢查性能。分片級(jí)別文檔頻率用于此選項(xiàng)。
8.string_distance
用于比較suggest term的相似程度的字符串編輯距離實(shí)現(xiàn)??梢灾付ㄎ鍌€(gè)可能的值:
internal: The default based on damerau_levenshtein but highly optimized for comparing string distance for terms inside the index.
damerau_levenshtein: String distance algorithm based on Damerau-Levenshtein algorithm.
levenshtein: String distance algorithm based on Levenshtein edit distance algorithm.
jaro_winkler: String distance algorithm based on Jaro-Winkler algorithm.
ngram: String distance algorithm based on character n-grams.
3. 請(qǐng)求樣例
POST _search {"suggest": {"text" : "tring out Elasticsearch","my-suggest-1" : {"term" : {"field" : "message"}},"my-suggest-2" : {"term" : {"field" : "user"}}} } POST twitter/_search {"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : {"my-suggestion" : {"text" : "tring out Elasticsearch","term" : {"field" : "message"}}} } 超強(qiáng)干貨來襲 云風(fēng)專訪:近40年碼齡,通宵達(dá)旦的技術(shù)人生總結(jié)
以上是生活随笔為你收集整理的08.suggester02term_suggester的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 07.suggester简述
- 下一篇: 10.completion_sugges