當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Faiss库了解

發布時間：2024/1/1 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 Faiss库了解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

搜索庫Faiss

Faiss全稱(Facebook AI Similarity Search)是Facebook AI團隊開源的針對聚類和相似性搜索庫，為稠密向量提供高效相似度搜索和聚類，支持十億級別向量的搜索，是目前較成熟的近似近鄰搜索庫。
參考介紹
【用法1】、【推薦】、【用法3】

在Cosplace工程中test.py具體代碼如下：

import faiss import time # Compute R@1, R@5, R@10, R@20 RECALL_VALUES = [1, 5, 10, 20] #。。。 queries_descriptors = all_descriptors[eval_ds.database_num:] database_descriptors = all_descriptors[:eval_ds.database_num]#Use a kNN to find predictionstic = time.time()faiss_index = faiss.IndexFlatL2(args.fc_output_dim)faiss_index.add(database_descriptors)print('Index built in {} sec'.format(time.time() - tic))del database_descriptors, all_descriptorslogging.debug("Calculating recalls")_, predictions = faiss_index.search(queries_descriptors, max(RECALL_VALUES))print('Searched in {} sec'.format(time.time() - tic))print(predictions.shape)print(predictions[:5])nlist = 100 # 單元格數tic = time.time()quantizer = faiss.IndexFlatL2(args.fc_output_dim) # the other index d是向量維度index = faiss.IndexIVFFlat(quantizer, args.fc_output_dim, nlist, faiss.METRIC_L2) # # here we specify METRIC_L2, by default it performs inner-product search# assert not index.is_trainedindex.train(database_descriptors)# assert index.is_trainedindex.add(database_descriptors) # add may be a bit slower as wellprint('Index built in {} sec'.format(time.time() - tic))index.nprobe = 10 # 執行搜索訪問的單元格數（nlist以外） # default nprobe is 1, try a few moreD, I = index.search(queries_descriptors, max(RECALL_VALUES)) # actual searchprint('Searched in {} sec'.format(time.time() - tic))# print("D.shape: ",D.shape)# print("D[:5]", D[:5])print("I.shape: ", I.shape)print("I[:5]",I[:5]) # neighbors of the 5 last queries# IndexIVFPQ索引方式nlist = 100m = 64tic = time.time()quantizer = faiss.IndexFlatL2(args.fc_output_dim) # this remains the same# 為了擴展到非常大的數據集，Faiss提供了基于產品量化器的有損壓縮來壓縮存儲的向量的變體。壓縮的方法基于乘積量化。損失了一定精度為代價，自身距離也不為0，這是由于有損壓縮。index = faiss.IndexIVFPQ(quantizer, args.fc_output_dim, nlist, m, 8)# 8 specifies that each sub-vector is encoded as 8 bitsindex.train(database_descriptors)index.add(database_descriptors)print('Searched in {} sec'.format(time.time() - tic))# D, I = index.search(xb[:5], k) # sanity check# print(I)# print(D)index.nprobe = 10 # make comparable with experiment above_, I = index.search(queries_descriptors, max(RECALL_VALUES)) # searchprint('Searched in {} sec'.format(time.time() - tic))# print(I[:5])

如上便是實現IndexFlatL2、IndexIVFFlat、IndexIVFPQ三種索引方式的代碼。在數據集上測試，其中database為1700張圖片，query為10000張，查詢top20最后測試結果為：

IndexFlatL2: Indexbuilt(0.0231 sec), searched(0.1628 sec)

IndexIVFFlat： Indexbuilt(0.2696 sec), searched(0.7498 sec)

IndexIVFPQ： Indexbuilt(6.7583 sec), searched(6.8314 sec) 。參數m設置需注意，報錯參考[ 第9個問題 ]

總結：理論上IndexIVFPQ效率應該更高，但在小數據庫中反而包里搜索IndexFlatL2速度更快，依靠歐氏距離計算，而IndexIVFFlat和IndexIVFPQ都有個訓練的過程。

總結

以上是生活随笔為你收集整理的Faiss库了解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

faiss

上一篇：【量化投资实训】基于MATLAB实验三.
下一篇：微信小程序实现js控制动画——点击播放动