當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

转：Some interesting facts about SharePoint 2007 Search

發布時間：2024/7/5 编程问答 49 豆豆

生活随笔收集整理的這篇文章主要介紹了转：Some interesting facts about SharePoint 2007 Search 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Some interesting facts about SharePoint 2007 Search

Published 14 November 08 06:21 PM | harikumh?

Can we search in any language other than English?? Do we need language pack for the same?

Language Pack has nothing to do with?search?in?languages other than English or the language in which SharePoint is depoyed. Out of the?box, MOSS already shipped with the major wordbreakers/stemmers, although with very bad quality for some of the languages such as Chinese and Deutsch?

Despite the quality of the word breakers, by design you may encounter two problems,

1.?In index time, the ifilters should emit a correct LCID for that language. However, this is not possible with some of the file types. For example, when processing file types like TXT, XLS(XLSX) and RTF, The ifilters will return 1033(en-us) instead of the correct ones. So what will happen? You may get nothing when you search for any long word, only single character for Japanese/Chinese in these files. Other language may have the same problem, but not as obvious like that.

2.?In query time, when user submit the keyword through the browser, MOSS will detect the browser language setting?that the user?is using. And it will use the value to call the corresponding wordbreaker. If this wordbreaker does not match with the one used in index time, you will be in trouble again. For example if you use a English client to search for something in Chinese, without modifying browser language setting to chinese, even the files indexed in the right language, you will still get no result for a word.

The?space needed for the index on the query machine is approximatley 2.8 times of the size of the actual index.? What is the logic behind this??

Lets say the index size is X.

During crawls, we accumulate more shadow indexes because of items that are indexed. When these shadow indexes cover about 0.1 times X (10% of X), we do a master merge.

A master merge takes the 1.1X (X + 0.1X) and creates a new index with that in the same location as the old index. The size of the new index is roughly 1.1X. So before the old index is deleted, the requirement is for at least 2.2X (for both indexes).

However, since query servers are expected to be online at all times, the master merge should have minimal impact on query latencies. To achieve this, we use more than the 1.1X space by creating temp files during the master merge.

This leads to the worst case number of 2.8X so that master merges can succeed while not impacting query latencies.

Then we will delete the old index on both, the indexer and query machines immediately after the master merge is complete.

How does the duplicate document is identified when we do a search?

Document similarity for purposes of identifying duplicates is based only on a hash of the content of the document.? No File properties (e.g. file name, type, author, create and modify dates) are input to this hash.? The MSSDuplicateHashes table in the SSP’s search database holds, for each document, all the 64bit hashes necessary to determine if one document is a near-duplicate of another.? This is read while doing a search if duplicate collapsing is enabled.

What are discovered definitions and how does search find those?

Discovered Definitions are a feature in MOSS that can be enabled/disabled in the properties of the SearchCoreResults webpart.? When enabled, the results web part will display not only document matches for a term, but also any definitions it has discovered for that word during crawling.

Definition extraction?feature in MOSS 2007 is a feature that extracts meaning of definition from indexed text.??

Definition Extraction is done during the crawl. ?The?crawler? looks for couple verbs like ‘is a’ or ‘is the’ and then, when a nebulous threshold? is reached, it extracts the definition of the related word for later use in search results display with the words “What people are saying about <term>”.

?At query time passed search token is compared with existing entry in definitions database. If a match is found the definitions link is populated at the bottom of the search results page. Collapsing the link shows number of definitions.

Pasted from <http://blogs.technet.com/harikumh/archive/2008/11/14/some-interesting-facts-about-sharepoint-2007-search.aspx>

轉載于:https://www.cnblogs.com/wenjielee/archive/2010/12/29/1921154.html

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的转：Some interesting facts about SharePoint 2007 Search的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Prometheus学习
下一篇： VM虚拟机上的CentOS 7系统重置r