日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

【solr基础教程之二】索引

發(fā)布時(shí)間:2024/6/18 编程问答 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【solr基础教程之二】索引 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.


一、向Solr提交索引的方式

1、使用post.jar進(jìn)行索引 (1)創(chuàng)建文檔xml文件 <add><doc><field name="id">test4</field><field name="title">testagain</field><field name="url">http://www.163.com</field></doc> </add>
(2)使用java -jar post.jar [root@jediael44 exampledocs]# java -Durl=http://ip:8080/solr/update -jar post.jar test.xml SimplePostTool version 1.5 Posting files to base url http://ip:8080/solr/update using content-type application/xml.. POSTing file test.xml 1 files indexed. COMMITting Solr index changes to http://localhost:8080/solr/update.. Time spent: 0:00:00.135
(3)查看post.jar的使用方法 [root@jediael44 exampledocs]# java -jar post.jar --help SimplePostTool version 1.5 Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]] Supported System Properties and their defaults:-Ddata=files|web|args|stdin (default=files)-Dtype=<content-type> (default=application/xml)-Durl=<solr-update-url> (default=http://localhost:8983/solr/update)-Dauto=yes|no (default=no)-Drecursive=yes|no|<depth> (default=0)-Ddelay=<seconds> (default=0 for files, 10 for web)-Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)-Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded)-Dcommit=yes|no (default=yes)-Doptimize=yes|no (default=no)-Dout=yes|no (default=no) This is a simple command line tool for POSTing raw data to a Solr port. Data can be read from files specified as commandline args, URLs specified as args, as raw commandline arg strings or via STDIN. Examples:java -jar post.jar *.xmljava -Ddata=args -jar post.jar '<delete><id>42</id></delete>'java -Ddata=stdin -jar post.jar < hd.xmljava -Ddata=web -jar post.jar http://example.com/java -Dtype=text/csv -jar post.jar *.csvjava -Dtype=application/json -jar post.jar *.jsonjava -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a -Dtype=application/pdf -jar post.jar a.pdfjava -Dauto -jar post.jar *java -Dauto -Drecursive -jar post.jar afolderjava -Dauto -Dfiletypes=ppt,html -jar post.jar afolder The options controlled by System Properties include the Solr URL to POST to, the Content-Type of the data, whether a commit or optimize should be executed, and whether the response should be written to STDOUT. If auto=yes the tool will try to set type and url automatically from file name. When posting rich documents the file name will be propagated as "resource.name" and also used as "literal.id". You may override these or any other request parameter through the -Dparams property. To do a commit only, use "-" as argument. The web mode is a simple crawler following links within domain, default delay=10s.
(4)默認(rèn)情況下,使用xml文件作數(shù)據(jù)源,若使用其它方式,如下 java -Dtype=application/json -jar post.jar *.json
2、使用管理界面的Document頁(yè)面進(jìn)行提交

3、使用SolrJ進(jìn)行索引

(1)使用SolrJ進(jìn)行簡(jiǎn)單索引

package org.ljh.test.solr;import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.impl.HttpSolrServer; import org.apache.solr.common.SolrInputDocument;public class BasicSolrJIndexDemo {public static void main(String[] args) throws Exception {/** 注意,雖然使用地址http://ip:8080/solr/#/collection1來(lái)訪問(wèn)頁(yè)面,但應(yīng)該通過(guò)http:/* /ip:8080/solr/collection1來(lái)進(jìn)行文檔的提交*/ String serverUrl = (args != null && args.length > 0) ? args[0]: "http://localhost:8080/solr/collection1";SolrServer solrServer = new HttpSolrServer(serverUrl);SolrInputDocument doc1 = new SolrInputDocument();doc1.setField("id", "solrJTest3");doc1.setField("url", "http://www.163.com/");solrServer.add(doc1);SolrInputDocument doc2 = new SolrInputDocument();doc2.setField("id", "solrJTest4");doc2.setField("url", "http://www.sina.com/");solrServer.add(doc2);solrServer.commit(true,true);}}
(2)使用SolrJ進(jìn)行簡(jiǎn)單查詢(xún)

package org.ljh.test.solr;import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.impl.HttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.SolrDocument; import org.apache.solr.common.SolrDocumentList;public class BasicSolrJSearchDemo {public static void main(String[] args) throws Exception {String serverUrl = (args != null && args.length > 0) ? args[0]: "http://localhost:8080/solr/collection1";SolrServer solrServer = new HttpSolrServer(serverUrl);//讀取輸入?yún)?shù)作為查詢(xún)關(guān)鍵字,若無(wú)關(guān)鍵字,則查詢(xún)?nèi)績(jī)?nèi)容。String queryString = (args != null && args.length > 1) ? args[1] : "url:163";SolrQuery solrQuery = new SolrQuery(queryString);solrQuery.setRows(5);QueryResponse resp = solrServer.query(solrQuery);SolrDocumentList hits = resp.getResults();for(SolrDocument doc : hits ){System.out.println(doc.getFieldValue("id").toString() + " : " + doc.getFieldValue("url"));}}}
4、使用第三方工具

(1)DIH


(2)ExtractingRequestHandler, aka Solr Cell


(3)Nutch


二、schema.xml : 定義文檔的格式

schema.xml定義了被索引的文檔應(yīng)該包括哪些Field、這個(gè)Filed的類(lèi)型,以及其它相關(guān)信息。

1、示例

Nutch為Solr提供的schema.xml如下:

<?xml version="1.0" encoding="UTF-8" ?><schema name="nutch" version="1.5"><types><fieldType name="string" class="solr.StrField" sortMissingLast="true"omitNorms="true"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0"omitNorms="true" positionIncrementGap="0"/><fieldType name="float" class="solr.TrieFloatField" precisionStep="0"omitNorms="true" positionIncrementGap="0"/><fieldType name="date" class="solr.TrieDateField" precisionStep="0"omitNorms="true" positionIncrementGap="0"/><fieldType name="text" class="solr.TextField"positionIncrementGap="100"><analyzer><tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.StopFilterFactory"ignoreCase="true" words="stopwords.txt"/><filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1"catenateWords="1" catenateNumbers="1" catenateAll="0"splitOnCaseChange="1"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer></fieldType><fieldType name="url" class="solr.TextField"positionIncrementGap="100"><analyzer><tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/><filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1"/></analyzer></fieldType></types><fields><field name="id" type="string" stored="true" indexed="true"/><!-- core fields --><field name="batchId" type="string" stored="true" indexed="false"/><field name="digest" type="string" stored="true" indexed="false"/><field name="boost" type="float" stored="true" indexed="false"/><!-- fields for index-basic plugin --><field name="host" type="url" stored="false" indexed="true"/><field name="url" type="url" stored="true" indexed="true"required="true"/><field name="content" type="text" stored="false" indexed="true"/><field name="title" type="text" stored="true" indexed="true"/><field name="cache" type="string" stored="true" indexed="false"/><field name="tstamp" type="date" stored="true" indexed="false"/><field name="_version_" type="long" indexed="true" stored="true"/><!-- fields for index-anchor plugin --><field name="anchor" type="string" stored="true" indexed="true"multiValued="true"/><!-- fields for index-more plugin --><field name="type" type="string" stored="true" indexed="true"multiValued="true"/><field name="contentLength" type="long" stored="true"indexed="false"/><field name="lastModified" type="date" stored="true"indexed="false"/><field name="date" type="date" stored="true" indexed="true"/><!-- fields for languageidentifier plugin --><field name="lang" type="string" stored="true" indexed="true"/><!-- fields for subcollection plugin --><field name="subcollection" type="string" stored="true"indexed="true" multiValued="true"/><!-- fields for feed plugin (tag is also used by microformats-reltag)--><field name="author" type="string" stored="true" indexed="true"/><field name="tag" type="string" stored="true" indexed="true" multiValued="true"/><field name="feed" type="string" stored="true" indexed="true"/><field name="publishedDate" type="date" stored="true"indexed="true"/><field name="updatedDate" type="date" stored="true"indexed="true"/><!-- fields for creativecommons plugin --><field name="cc" type="string" stored="true" indexed="true"multiValued="true"/><!-- fields for tld plugin --> <field name="tld" type="string" stored="false" indexed="false"/></fields><uniqueKey>id</uniqueKey><defaultSearchField>content</defaultSearchField><solrQueryParser defaultOperator="OR"/> </schema>以上文檔包括5個(gè)部分:

(1)FiledType:域的類(lèi)型
(2)Field:哪些域被索引、存儲(chǔ)等,以及這個(gè)域是什么類(lèi)型。
(3)uniqueKey:哪個(gè)域作為id,即文章的唯一標(biāo)識(shí)。
(4)defaultSearchField:默認(rèn)的搜索域
(5)solrQueryParser:OR,即使用OR來(lái)構(gòu)建Query。


2、Field元素

一個(gè)或者多個(gè)Field元素組成一個(gè)Fields元素,Nutch中使用了此結(jié)構(gòu),但solr的example中沒(méi)有Fileds元素,而是直接將Fields元素作為schma元素的下一級(jí)元素。FieldType與此類(lèi)似。

一個(gè)Filed的示例如下:

<field name="tag" type="string" stored="true" indexed="true" multiValued="true"/>Filed的幾個(gè)基本屬性如下:

(1)name屬性

域的名稱(chēng)

(2)type屬性

域的類(lèi)型

(3)stored屬性

是否存儲(chǔ)這個(gè)域,只有存儲(chǔ)了,才能在搜索結(jié)果中查看這個(gè)域的完整內(nèi)容。

(4)indexed屬性

是否索引這個(gè)域,索引了就可以用作搜索域,除此之外,即使你不需要對(duì)這個(gè)域進(jìn)行搜索,但需要排序、分組、查詢(xún)提示、facet、function queries等,也需要對(duì)這個(gè)域進(jìn)行索引。

例如,查詢(xún)一本書(shū)時(shí),一般不會(huì)通過(guò)銷(xiāo)售的數(shù)量進(jìn)行搜索,但會(huì)根據(jù)銷(xiāo)售的數(shù)量進(jìn)行排序。

In addition to enabling searching, you will also need to mark your field as indexed?if you need to sort, facet, group by, provide query suggestions for, or execute function queries on values within a field.

(5)multiValued屬性

若一個(gè)域中允許存在多個(gè)值,則設(shè)置multiValued為true。

<field name="tag" type="string" stored="true" indexed="true" multiValued="true"/>

此時(shí),在被索引的文檔中,可以使用多個(gè)具有相同name值的Filed。 <add><doc>............<Field name="tag">lucene</Field><Field name="tag">solr</Field></doc> </add>
若使用SolrJ,則使用addField方法代替setField方法。 doc.addField("tag","lucene"); doc.addField("tag","solr");

(6)required屬性

Solr使用required屬性來(lái)指定每個(gè)提交的文檔都必須提供這個(gè)域。注意uniqueKey元素中指定的域隱含了required=true。

?<field name="url" type="url" stored="true" indexed="true" required="true"/>



3、dynamicField元素

<dynamicField name="*_ti" type="tint" indexed="true" stored="true"/>
(1)一般而言,不要使用動(dòng)態(tài)域,除非是以下三種情況 Dynamic fields help address common?problems that occur when building search applications, including
■ Modeling documents with many fields
■ Supporting documents from diverse sources
■ Adding new document sources
具體可見(jiàn)solr in action的5.3.3節(jié)。

4、copyField?

copyFiled用于以下2種情形

copy fields support two use cases that are common in most search applications:
■ Populate a single catch-all field with the contents of multiple fields.
■ Apply different text analysis to the same field content to create a new searchable?field.

(1)將多個(gè)域復(fù)制到一個(gè)單一的域,以方便搜索等。如:

<copyField source="title" dest="text"/> <copyField source="author" dest="text"/> <copyField source="description" dest="text"/> <copyField source="keywords" dest="text"/> <copyField source="content" dest="text"/> <copyField source="content_type" dest="text"/> <copyField source="resourcename" dest="text"/> <copyField source="url" dest="text"/>
則搜索時(shí)只對(duì)text進(jìn)行搜索即可。

(2)對(duì)同一個(gè)域進(jìn)行多次不同的分析處理,如:

<field name="text" type="stemmed_text" indexed="true" stored="true"/> <field name="auto_suggest" type="unstemmed_text" indexed="true" stored="false" multiValued="true"/> ... <copyField source="text" dest="auto_suggest" />在上述例子中,若對(duì)一個(gè)域進(jìn)行索引,則將詞匯詞干化,但在搜索提示時(shí),就不對(duì)詞匯進(jìn)行詞干化。


5、FieldType元素

(1)FiedlType定義了Filed的類(lèi)型,它將在Filed中的type屬性中被引用。

(2)Solr內(nèi)置的FiledType有以下類(lèi)型:


(3)有2大類(lèi)FieldType:

一類(lèi)是要對(duì)其進(jìn)行分析后再索引的非結(jié)構(gòu)化數(shù)據(jù),如文章 的正文等,如StrField,TrieLongField等。

另一類(lèi)是不需要對(duì)其進(jìn)行分析,而直接索引的的結(jié)構(gòu)批數(shù)據(jù),如url,id,人名等,主要是TextField。

(4)在schema.xml中看到 的solr.*代表的是org.apache.solr.schema.*,如

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> 表示類(lèi)型為org.apache.solr.schema.StrField。

(5)StringField

StringField中的內(nèi)容不應(yīng)該被分析,它包含的是結(jié)構(gòu)化數(shù)據(jù)。

StringField,用類(lèi)org.apache.solr.schema.StrField表示。

(6)DateField

DateField一般使用TrieDateField來(lái)表示,其中Trie數(shù)據(jù)可以方便的進(jìn)行范圍搜索。

DateField的默認(rèn)格式:In general, Solr expects your dates to be in the ISO-8601 Date/Time format (yyyy-MMddTHH:mm:ssZ); the date in our tweet (2012-05-22T09:30:22Z) breaks down to
yyyy = 2012
MM = 05
dd = 22
HH = 09 (24-hr clock)
mm = 30
ss = 22
Z = UTC Timezone (Z is for Zulu)

可以通過(guò)以下方式截取其內(nèi)容:

<field name="timestamp">2012-05022T09:30:00Z/HOUR</fileld>

表示截取到小時(shí)的粒度,即其值為:2012-05022T09:00:00Z

(7)NumericField

有多個(gè)實(shí)現(xiàn)類(lèi)型,如TrieDoubleField,TrieFloatField,TrieIntField,TrieLongField等。

(8)type有多個(gè)屬性,主要包括

sortMissingFirst:當(dāng)根據(jù)使用這個(gè)類(lèi)型的域進(jìn)行排序時(shí),若這個(gè)域沒(méi)有值,則在排序時(shí),將此文檔放在最前面。

sortMissingLast::當(dāng)根據(jù)使用這個(gè)類(lèi)型的域進(jìn)行排序時(shí),若這個(gè)域沒(méi)有值,則在排序時(shí),將此文檔放在最后面。

precisionStep:

positionIncrementGap:見(jiàn)solr in action 5.4.4節(jié)。



6、UniqueKey元素

(1)Solr使用<uniqueKey>元素來(lái)標(biāo)識(shí)一個(gè)唯一標(biāo)識(shí)符,類(lèi)似于一個(gè)數(shù)據(jù)庫(kù)表的主鍵。如:

<uniqueKey>id</uniqueKey>必須選擇一個(gè)Field作為一個(gè)uniqueKey。使用uniqueKey標(biāo)識(shí)的字段,每一個(gè)進(jìn)行索引的文檔都必須提供。

(2)Solr不要求為每個(gè)文檔提供一個(gè)唯一標(biāo)識(shí)符,但建議為每個(gè)文檔都提供一個(gè)唯一標(biāo)識(shí)符,以用于避免重復(fù)等。

(3)當(dāng)向solr提交一個(gè)文檔時(shí),若此文檔的id已經(jīng)存在,則此文檔會(huì)覆蓋原有的文檔。

(4)如果solr被部署在多個(gè)服務(wù)器中,則必須提供uniqueKey。

(5)使用基本類(lèi)似來(lái)作為uniqueKey,不要使用復(fù)雜類(lèi)型。 ?One thing to note is that it’s best to use a primitive field type, such as string or long,?for the field you indicate as being the <uniqueKey/> as that ensures Solr doesn’t make
any changes to the value during indexing


三、SolrConfig.xml中與索引相關(guān)的內(nèi)容

以下為一個(gè)示例

<!-- The default high-performance update handler --> <updateHandler class="solr.DirectUpdateHandler2"> <!--Enables a transaction log, used for real-time get, durability, andand solr cloud replica recovery. The log can grow as big asuncommitted changes to the index, so use of a hard autoCommitis recommended (see below)."dir" - the target directory for transaction logs, defaults to thesolr data directory. --> <updateLog> <str name="dir">${solr.ulog.dir:}</str> </updateLog> <!--AutoCommitPerform a hard commit automatically under certain conditions.Instead of enabling autoCommit, consider using "commitWithin"when adding documents. http://wiki.apache.org/solr/UpdateXmlMessagesmaxDocs - Maximum number of documents to add since the lastcommit before automatically triggering a new commit.maxTime - Maximum amount of time in ms that is allowed to passsince a document was added before automaticallytriggering a new commit. openSearcher - if false, the commit causes recent index changesto be flushed to stable storage, but does not cause a newsearcher to be opened to make those changes visible.If the updateLog is enabled, then it's highly recommended tohave some sort of hard autoCommit to limit the log size.--> <autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>false</openSearcher> </autoCommit> <!--softAutoCommit is like autoCommit except it causes a'soft' commit which only ensures that changes are visiblebut does not ensure that data is synced to disk. This isfaster and more near-realtime friendly than a hard commit.--> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> <!--Update Related Event ListenersVarious IndexWriter related events can trigger Listeners totake actions.postCommit - fired after every commit or optimize commandpostOptimize - fired after every optimize command--> <!--The RunExecutableListener executes an external command from ahook such as postCommit or postOptimize.exe - the name of the executable to rundir - dir to use as the current working directory. (default=".")wait - the calling thread waits until the executable returns. (default="true")args - the arguments to pass to the program. (default is none)env - environment variables to set. (default is none)--> <!--This example shows how RunExecutableListener could be usedwith the script based replication...http://wiki.apache.org/solr/CollectionDistribution--> <!--<listener event="postCommit" class="solr.RunExecutableListener"><str name="exe">solr/bin/snapshooter</str><str name="dir">.</str><bool name="wait">true</bool><arr name="args"> <str>arg1</str> <str>arg2</str> </arr><arr name="env"> <str>MYVAR=val1</str> </arr></listener>--> </updateHandler>











轉(zhuǎn)載于:https://www.cnblogs.com/jediael/p/4304091.html

總結(jié)

以上是生活随笔為你收集整理的【solr基础教程之二】索引的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。