當前位置：首頁 > 编程语言 > python >内容正文

python

inverted index 反向索引 python

發布時間：2024/1/18 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 inverted index 反向索引 python 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、簡單版

from collections import defaultdict class inverted_index:def __init__(self, docs):self.doc = defaultdict(set)for index, doc in enumerate(docs):for term in doc.split():self.doc[term].add(index)def search(self, term):return self.doc[term]if __name__ == "__main__":docs = ["new home sales top forecasts june june june","home sales rise in july june","increase in home sales in july","july new home sales new rise"]i = inverted_index(docs)a = 1print(i.search('sales'))# 結果：{0, 1, 2, 3}

二、 nltk 單詞版本，反向索引存儲 ->? search

# 【2】構建nltk中的單詞反向索引 """ 結果1(儲存index)： run_time = 0.012042999267578125結果2(儲存單詞)： run_time = 0.02428269386291504結論：存儲index比存儲單詞速度快一倍左右 """import time from nltk.corpus import words from collections import defaultdictinverted_index = defaultdict(set) # 如果同一個單詞出現了重復的char，只會記錄一次，屬于某行，不能用default(list) word_list = words.words() a = 1# 結果1存儲index速度會更快，相對于存儲單詞 for i, word in enumerate(word_list):for char in word.lower():inverted_index[char].add(i)# 結果2 # for i, word in enumerate(word_list): # for char in word.lower(): # inverted_index[char].add(word)# 需要搜索某個單詞是否再哪一行, idx 用set.intersection() start = time.time() result = set.intersection(*(inverted_index[char] for char in "aej")) end = time.time() print('run_time = ', end-start) print('result = ', result) print('result_item = ', [word_list[i] for i in result]) def intersection(*args):left = args[0]# Perform len(args)-1 pairwise-intersectionsfor right in args[1:]:# Tests take O(N) time, so minimize N by choosing the smaller setif len(left) > len(right):left, right = right, left# Do the pairwise intersectionresult = set()for element in left:if element in right:result.add(element)left = result # Use as the start for the next intersectionreturn left

三、

總結

以上是生活随笔為你收集整理的inverted index 反向索引 python的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：常用的霍尔效应测试方案
下一篇： websocket python爬虫_p