inverted index 反向索引 python
生活随笔
收集整理的這篇文章主要介紹了
inverted index 反向索引 python
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、簡單版
from collections import defaultdict class inverted_index:def __init__(self, docs):self.doc = defaultdict(set)for index, doc in enumerate(docs):for term in doc.split():self.doc[term].add(index)def search(self, term):return self.doc[term]if __name__ == "__main__":docs = ["new home sales top forecasts june june june","home sales rise in july june","increase in home sales in july","july new home sales new rise"]i = inverted_index(docs)a = 1print(i.search('sales'))# 結果:{0, 1, 2, 3}?
二、 nltk 單詞版本,反向索引 存儲 ->? search
# 【2】構建nltk中的單詞 反向索引 """ 結果1(儲存index): run_time = 0.012042999267578125結果2(儲存單詞): run_time = 0.02428269386291504結論:存儲index比存儲單詞速度快一倍左右 """import time from nltk.corpus import words from collections import defaultdictinverted_index = defaultdict(set) # 如果同一個單詞出現了重復的char,只會記錄一次,屬于某行,不能用default(list) word_list = words.words() a = 1# 結果1存儲index速度會更快,相對于存儲單詞 for i, word in enumerate(word_list):for char in word.lower():inverted_index[char].add(i)# 結果2 # for i, word in enumerate(word_list): # for char in word.lower(): # inverted_index[char].add(word)# 需要搜索某個單詞是否再哪一行, idx 用set.intersection() start = time.time() result = set.intersection(*(inverted_index[char] for char in "aej")) end = time.time() print('run_time = ', end-start) print('result = ', result) print('result_item = ', [word_list[i] for i in result]) def intersection(*args):left = args[0]# Perform len(args)-1 pairwise-intersectionsfor right in args[1:]:# Tests take O(N) time, so minimize N by choosing the smaller setif len(left) > len(right):left, right = right, left# Do the pairwise intersectionresult = set()for element in left:if element in right:result.add(element)left = result # Use as the start for the next intersectionreturn left三、
總結
以上是生活随笔為你收集整理的inverted index 反向索引 python的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 常用的霍尔效应测试方案
- 下一篇: websocket python爬虫_p