如何使用Trie树,设计实践Google一样的输入提示功能
來源 |?搜索技術(shù)
責(zé)編 | 小白
Google和百度都支持輸入提示功能,輔助你快速準(zhǔn)確的輸入想要的內(nèi)容。
如下:輸入“五一”,會提示“五一勞動節(jié)”等。
那如何實(shí)現(xiàn)谷歌這樣的輸入提示功能呢?
分析下輸入提示的功能需求
當(dāng)輸入前面的詞A,希望提示出前綴為A的所有高相關(guān)性的詞。這個特性屬于前綴匹配,trie樹被稱為前綴樹,是一種搜索排序樹,很適合用作輸入提示的實(shí)踐。
下面以python3為例,使用Trie樹,構(gòu)建輸入提示服務(wù)。
# Python3 program to demonstrate auto-complete # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. class TrieNode(): def __init__(self):# Initialising one node for trie self.children = {} self.last = False class Trie(): def __init__(self):# Initialising the trie structure. self.root = TrieNode() self.word_list = []def formTrie(self, keys):# Forms a trie structure with the given set of strings # if it does not exists already else it merges the key # into it by extending the structure as required for key in keys: self.insert(key) # inserting one key to the trie.def insert(self, key):# Inserts a key into trie if it does not exist already. # And if the key is a prefix of the trie node, just # marks it as leaf node. node = self.rootfor a in list(key): if not node.children.get(a): node.children[a] = TrieNode()node = node.children[a]node.last = Truedef search(self, key):# Searches the given key in trie for a full match # and returns True on success else returns False. node = self.root found = Truefor a in list(key): if not node.children.get(a): found = False breaknode = node.children[a]return node and node.last and founddef suggestionsRec(self, node, word):# Method to recursively traverse the trie # and return a whole word. if node.last: self.word_list.append(word)for a,n in node.children.items(): self.suggestionsRec(n, word + a)def printAutoSuggestions(self, key):# Returns all the words in the trie whose common # prefix is the given key thus listing out all # the suggestions for autocomplete. node = self.root not_found = False temp_word = ''for a in list(key): if not node.children.get(a): not_found = True breaktemp_word += a node = node.children[a]if not_found: return 0 elif node.last and not node.children: return -1self.suggestionsRec(node, temp_word)for s in self.word_list: print(s) return 1 # Driver Codekeys = ["五一", "五一勞動節(jié)", "五一放假安排", "五一勞動節(jié)圖片", "五一勞動節(jié)圖片 2020", "五一勞動節(jié)快樂", "五一晚會", "五一假期", "五一快樂","五一節(jié)快樂", "五花肉", "五行", "五行相生"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"] # creating trie objectt = Trie() # creating the trie structure with the# given set of strings.t.formTrie(keys) # autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key) if comp == -1: print("No other strings found with this prefix\n")elif comp == 0: print("No string found with this prefix\n") # This code is contributed by amurdia輸入:五一,輸入提示結(jié)果如下:
結(jié)果都實(shí)現(xiàn)了,但我們實(shí)現(xiàn)后的輸入提示順序跟Google有點(diǎn)不一樣,那怎么辦呢?
一般構(gòu)建輸入提示的數(shù)據(jù)源都是用戶輸入的query詞的日志數(shù)據(jù),并且會統(tǒng)計每個輸入詞的次數(shù),以便按照輸入詞的熱度給用戶提示。
現(xiàn)在我們把日志詞庫加上次數(shù),來模擬Google的輸入效果。
日志庫的查詢詞及個數(shù)示例如下:
五一勞動節(jié) 10五一勞動節(jié)圖片 9五一假期 8五一勞動節(jié)快樂 7五一放假安排 6五一晚會 5五一 4五一快樂 3五一勞動節(jié)圖片2020 2五一快樂 1把輸入提示的代碼調(diào)整下,支持查詢詞次數(shù)的支持:
# Python3 program to demonstrate auto-complete # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. import operatorclass TrieNode(): def __init__(self): # Initialising one node for trie self.children = {} self.last = False class Trie(): def __init__(self): # Initialising the trie structure. self.root = TrieNode() #self.word_list = [] self.word_list = {} def formTrie(self, keys): # Forms a trie structure with the given set of strings # if it does not exists already else it merges the key # into it by extending the structure as required for key in keys: self.insert(key) # inserting one key to the trie. def insert(self, key): # Inserts a key into trie if it does not exist already. # And if the key is a prefix of the trie node, just # marks it as leaf node. node = self.root for a in list(key): if not node.children.get(a): node.children[a] = TrieNode() node = node.children[a] node.last = True def search(self, key): # Searches the given key in trie for a full match # and returns True on success else returns False. node = self.root found = True for a in list(key): if not node.children.get(a): found = False break node = node.children[a] return node and node.last and found def suggestionsRec(self, node, word): # Method to recursively traverse the trie # and return a whole word. if node.last: #self.word_list.append(word) ll = word.split(',') if(len(ll) >= 2): self.word_list[ll[0]] = int(ll[1]) else: self.word_list[ll[0]] = 0 for a,n in node.children.items(): self.suggestionsRec(n, word + a) def printAutoSuggestions(self, key): # Returns all the words in the trie whose common # prefix is the given key thus listing out all # the suggestions for autocomplete. node = self.root not_found = False temp_word = '' for a in list(key): if not node.children.get(a): not_found = True break temp_word += a node = node.children[a] if not_found: return 0 elif node.last and not node.children: return -1 self.suggestionsRec(node, temp_word) #sort sorted_d = dict(sorted(self.word_list.items(), key=operator.itemgetter(1),reverse=True)) for s in sorted_d.keys(): print(s) return 1 # Driver Codekeys = ["五一,4", "五一勞動節(jié),10", "五一放假安排,6", "五一勞動節(jié)圖片,9", "五一勞動節(jié)圖片 2020,2", "五一勞動節(jié)快樂,7", "五一晚會,5", "五一假期,8", "五一快樂,3","五一節(jié)快樂,1", "五花肉,0", "五行,0", "五行相生,0"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"] # creating trie objectt = Trie() # creating the trie structure with the# given set of strings.t.formTrie(keys) # autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key) if comp == -1: print("No other strings found with this prefix\n")elif comp == 0: print("No string found with this prefix\n") # This code is contributed by amurdia輸出結(jié)果跟Google一模一樣:
總結(jié):
以上是使用Trie樹,實(shí)踐Google輸入提示的功能。除了Trie樹實(shí)踐,我們還有其他辦法么,搜索中有沒有其他的索引能很好實(shí)現(xiàn)輸入提示的功能呢?
更多閱讀推薦
云原生體系下的技海浮沉與理論探索
如何通過 Serverless 輕松識別驗(yàn)證碼?
5G與金融行業(yè)融合應(yīng)用的場景探索
打破“打工人”魔咒,RPA 來狙擊!
使用 SQL 語句實(shí)現(xiàn)一個年會抽獎程序
總結(jié)
以上是生活随笔為你收集整理的如何使用Trie树,设计实践Google一样的输入提示功能的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: “蚂蚁漫步”背后的定位原理思考
- 下一篇: 对话阿里云:开源与自研如何共处?