ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

最全面solrr入门教程

2021-04-19 14:05:07  阅读:154  来源: 互联网

标签:-- 入门教程 apache client org 全面 import solr solrr


搜索引擎技术系列教材 (一)- solr - Solr 入门教程

 

步骤 1 : 关于JDK版本

至少使用JDK8版本,请下载JDK8或者更高版本: 下载以及配置JDK环境

步骤 2 : 什么是 Solr

前面学习了Lucene, 现在开始学习Solr。 
以连接数据库为类比:Lucene 就相当于JDBC,是基本的用法。
Solr 就相当 Mybatis, 方便开发人员配置,访问和调用。
而且Solr 被做成了 webapp形式,以tomcat的应用的方式启动,提供了可视化的配置界面

步骤 3 : 下载 Solr

下载区(点击进入)下载 solr-7.2.1.rar 并解压。
截至目前 (2018-3-10) 使用的最新版本是7.2.1

步骤 4 : 启动服务器

我的解压目录在 D:\software\solr-7.2.1

cd D:\software\solr-7.2.1\bin

d:

solr.cmd start


如此就启动了服务器,会占用端口8983。 倘若端口被占用,会启动失败,排查办法请看 端口被占用

启动服务器

步骤 5 : 访问服务器

输入地址:

http://127.0.0.1:8983/solr/#/


就可以访问Solr 服务器了

访问服务器

步骤 6 : 继续

但是这仅仅是个开始,为了能够使用Solr服务器还需要跟多的工作,请继续往下学习

搜索引擎技术系列教材 (二)- solr - 创建Core


 

步骤 1 : Core 概念

如果说Solr相当于一个数据库的话,那么Core就相当于一张表

步骤 2 : 不要通过图形界面创建Core

如图所示,通过图形界面创建Core会失败,应该使用 命令行方式创建Core

不要通过图形界面创建Core

步骤 3 : 命令行方式创建Core

如图所示就创建了 Core

cd D:\software\solr-7.2.1\bin

d:

solr.cmd create -c how2java

命令行方式创建Core

步骤 4 : 删除 new_core

如果点击了步骤 不要通过图形界面创建Core 里的图形界面里的 Add Core,那么就会一直有错误提醒,那么按照如下方式删除 new_core 就不会再有错误提醒了

cd D:\software\solr-7.2.1\bin

d:

solr.cmd delete -c new_core

删除 new_core

搜索引擎技术系列教材 (三)- solr - Solr7.2 可以用的中文分词器 IKAnalyzer6.5.0.jar


 

步骤 1 : 没有中文分词

默认情况下是没有中文分词的,如图所示,通过点击左边的how2java->Analysis 然后输入 四川省成都市动物园,得到是按照每个字的分词效果

没有中文分词

步骤 2 : 配置中文分词

接下来为 Solr 准备中文分词

步骤 3 : 下载 IKAnalyzer6.5.0.jar

从下载区(点击进入)下载 IKAnalyzer6.5.0.jar,然后复制到如下目录:

D:\software\solr-7.2.1\server\solr-webapp\webapp\WEB-INF\lib

下载 IKAnalyzer6.5.0.jar

步骤 4 : 增加新的字段类型

修改配置文件 managed-schema:

D:\software\solr-7.2.1\server\solr\how2java\conf\managed-schema


在<schema...> 标签下增加如下代码

<fieldType name="text_ik" class="solr.TextField"> 

        <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> 

</fieldType> 

<field name="text_ik"  type="text_ik" indexed="true"  stored="true"  multiValued="false" />


下面贴出来完整的修改之后的,不放心的就直接复制完整的覆盖

D:\software\solr-7.2.1\server\solr\how2java\conf\managed-schema

增加新的字段类型

<?xml version="1.0" encoding="UTF-8" ?>

<!--

 Licensed to the Apache Software Foundation (ASF) under one or more

 contributor license agreements.  See the NOTICE file distributed with

 this work for additional information regarding copyright ownership.

 The ASF licenses this file to You under the Apache License, Version 2.0

 (the "License"); you may not use this file except in compliance with

 the License.  You may obtain a copy of the License at

 

     http://www.apache.org/licenses/LICENSE-2.0

 

 Unless required by applicable law or agreed to in writing, software

 distributed under the License is distributed on an "AS IS" BASIS,

 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 See the License for the specific language governing permissions and

 limitations under the License.

-->

 

<!--

 

 This example schema is the recommended starting point for users.

 It should be kept correct and concise, usable out-of-the-box.

 

 For more information, on how to customize this file, please see

 http://lucene.apache.org/solr/guide/documents-fields-and-schema-design.html

 

 PERFORMANCE NOTE: this schema includes many optional features and should not

 be used for benchmarking.  To improve performance one could

  - set stored="false" for all fields possible (esp large fields) when you

    only need to search on the field but don't need to return the original

    value.

  - set indexed="false" if you don't need to search on the field, but only

    return the field as a result of searching on other indexed fields.

  - remove all unneeded copyField statements

  - for best index size and searching performance, set "index" to false

    for all general text fields, use copyField to copy them to the

    catchall "text" field, and use that for searching.

-->

 

<schema name="default-config" version="1.6">

    <fieldType name="text_ik" class="solr.TextField"> 

            <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> 

    </fieldType> 

    <field name="text_ik"  type="text_ik" indexed="true"  stored="true"  multiValued="false" />

 

    <!-- attribute "name" is the name of this schema and is only used for display purposes.

       version="x.y" is Solr's version number for the schema syntax and

       semantics.  It should not normally be changed by applications.

 

       1.0: multiValued attribute did not exist, all fields are multiValued

            by nature

       1.1: multiValued attribute introduced, false by default

       1.2: omitTermFreqAndPositions attribute introduced, true by default

            except for text fields.

       1.3: removed optional field compress feature

       1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser

            behavior when a single string produces multiple tokens.  Defaults

            to off for version >= 1.4

       1.5: omitNorms defaults to true for primitive field types

            (int, float, boolean, string...)

       1.6: useDocValuesAsStored defaults to true.

    -->

 

    <!-- Valid attributes for fields:

     name: mandatory - the name for the field

     type: mandatory - the name of a field type from the

       fieldTypes section

     indexed: true if this field should be indexed (searchable or sortable)

     stored: true if this field should be retrievable

     docValues: true if this field should have doc values. Doc Values is

       recommended (required, if you are using *Point fields) for faceting,

       grouping, sorting and function queries. Doc Values will make the index

       faster to load, more NRT-friendly and more memory-efficient.

       They are currently only supported by StrField, UUIDField, all

       *PointFields, and depending on the field type, they might require

       the field to be single-valued, be required or have a default value

       (check the documentation of the field type you're interested in for

       more information)

     multiValued: true if this field may contain multiple values per document

     omitNorms: (expert) set to true to omit the norms associated with

       this field (this disables length normalization and index-time

       boosting for the field, and saves some memory).  Only full-text

       fields or fields that need an index-time boost need norms.

       Norms are omitted for primitive (non-analyzed) types by default.

     termVectors: [false] set to true to store the term vector for a

       given field.

       When using MoreLikeThis, fields used for similarity should be

       stored for best performance.

     termPositions: Store position information with the term vector. 

       This will increase storage costs.

     termOffsets: Store offset information with the term vector. This

       will increase storage costs.

     required: The field is required.  It will throw an error if the

       value does not exist

     default: a value that should be used if no value is specified

       when adding a document.

    -->

 

    <!-- field names should consist of alphanumeric or underscore characters only and

      not start with a digit.  This is not currently strictly enforced,

      but other field names will not have first class support from all components

      and back compatibility is not guaranteed.  Names with both leading and

      trailing underscores (e.g. _version_) are reserved.

    -->

 

    <!-- In this _default configset, only four fields are pre-declared:

         id, _version_, and _text_ and _root_. All other fields will be type guessed and added via the

         "add-unknown-fields-to-the-schema" update request processor chain declared in solrconfig.xml.

          

         Note that many dynamic fields are also defined - you can use them to specify a

         field's type via field naming conventions - see below.

   

         WARNING: The _text_ catch-all field will significantly increase your index size.

         If you don't need it, consider removing it and the corresponding copyField directive.

    -->

 

    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

    <!-- docValues are enabled by default for long type so we don't need to index the version field  -->

    <field name="_version_" type="plong" indexed="false" stored="false"/>

    <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

    <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>

 

    <!-- This can be enabled, in case the client does not know what fields may be searched. It isn't enabled by default

         because it's very expensive to index everything twice. -->

    <!-- <copyField source="*" dest="_text_"/> -->

 

    <!-- Dynamic field definitions allow using convention over configuration

       for fields via the specification of patterns to match field names.

       EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)

       RESTRICTION: the glob-like pattern in the name attribute must have a "*" only at the start or the end.  -->

    

    <dynamicField name="*_i"  type="pint"    indexed="true"  stored="true"/>

    <dynamicField name="*_is" type="pints"    indexed="true"  stored="true"/>

    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />

    <dynamicField name="*_ss" type="strings"  indexed="true"  stored="true"/>

    <dynamicField name="*_l"  type="plong"   indexed="true"  stored="true"/>

    <dynamicField name="*_ls" type="plongs"   indexed="true"  stored="true"/>

    <dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>

    <dynamicField name="*_b"  type="boolean" indexed="true" stored="true"/>

    <dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/>

    <dynamicField name="*_f"  type="pfloat"  indexed="true"  stored="true"/>

    <dynamicField name="*_fs" type="pfloats"  indexed="true"  stored="true"/>

    <dynamicField name="*_d"  type="pdouble" indexed="true"  stored="true"/>

    <dynamicField name="*_ds" type="pdoubles" indexed="true"  stored="true"/>

 

    <!-- Type used for data-driven schema, to add a string copy for each text field -->

    <dynamicField name="*_str" type="strings" stored="false" docValues="true" indexed="false" />

 

    <dynamicField name="*_dt"  type="pdate"    indexed="true"  stored="true"/>

    <dynamicField name="*_dts" type="pdate"    indexed="true"  stored="true" multiValued="true"/>

    <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>

    <dynamicField name="*_srpt"  type="location_rpt" indexed="true" stored="true"/>

     

    <!-- payloaded dynamic fields -->

    <dynamicField name="*_dpf" type="delimited_payloads_float" indexed="true"  stored="true"/>

    <dynamicField name="*_dpi" type="delimited_payloads_int" indexed="true"  stored="true"/>

    <dynamicField name="*_dps" type="delimited_payloads_string" indexed="true"  stored="true"/>

 

    <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>

 

    <!-- Field to use to determine and enforce document uniqueness.

      Unless this field is marked with required="false", it will be a required field

    -->

    <uniqueKey>id</uniqueKey>

 

    <!-- copyField commands copy one field to another at the time a document

       is added to the index.  It's used either to index the same field differently,

       or to add multiple fields to the same field for easier/faster searching.

 

    <copyField source="sourceFieldName" dest="destinationFieldName"/>

    -->

 

    <!-- field type definitions. The "name" attribute is

       just a label to be used by field definitions.  The "class"

       attribute and any other attributes determine the real

       behavior of the fieldType.

         Class names starting with "solr" refer to java classes in a

       standard package such as org.apache.solr.analysis

    -->

 

    <!-- sortMissingLast and sortMissingFirst attributes are optional attributes are

         currently supported on types that are sorted internally as strings

         and on numeric types.

       This includes "string", "boolean", "pint", "pfloat", "plong", "pdate", "pdouble".

       - If sortMissingLast="true", then a sort on this field will cause documents

         without the field to come after documents with the field,

         regardless of the requested sort order (asc or desc).

       - If sortMissingFirst="true", then a sort on this field will cause documents

         without the field to come before documents with the field,

         regardless of the requested sort order.

       - If sortMissingLast="false" and sortMissingFirst="false" (the default),

         then default lucene sorting will be used which places docs without the

         field first in an ascending sort and last in a descending sort.

    -->

 

    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->

    <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />

    <fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true"docValues="true" />

 

    <!-- boolean type: "true" or "false" -->

    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true"/>

 

    <!--

      Numeric field types that index values using KD-trees.

      Point fields don't support FieldCache, so they must have docValues="true" if needed for sorting, faceting, functions, etc.

    -->

    <fieldType name="pint" class="solr.IntPointField" docValues="true"/>

    <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>

    <fieldType name="plong" class="solr.LongPointField" docValues="true"/>

    <fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>

     

    <fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true"/>

    <fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true"/>

    <fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true"/>

    <fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true"/>

 

    <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and

         is a more restricted form of the canonical representation of dateTime

         http://www.w3.org/TR/xmlschema-2/#dateTime   

         The trailing "Z" designates UTC time and is mandatory.

         Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z

         All other components are mandatory.

 

         Expressions can also be used to denote calculations that should be

         performed relative to "NOW" to determine the value, ie...

 

               NOW/HOUR

                  ... Round to the start of the current hour

               NOW-1DAY

                  ... Exactly 1 day prior to now

               NOW/DAY+6MONTHS+3DAYS

                  ... 6 months and 3 days in the future from the start of

                      the current day

                       

      -->

    <!-- KD-tree versions of date fields -->

    <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>

    <fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true"/>

     

    <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->

    <fieldType name="binary" class="solr.BinaryField"/>

 

    <!-- solr.TextField allows the specification of custom text analyzers

         specified as a tokenizer and a list of token filters. Different

         analyzers may be specified for indexing and querying.

 

         The optional positionIncrementGap puts space between multiple fields of

         this type on the same document, with the purpose of preventing false phrase

         matching across fields.

 

         For more info on customizing your analyzer chain, please see

         http://lucene.apache.org/solr/guide/understanding-analyzers-tokenizers-and-filters.html#understanding-analyzers-tokenizers-and-filters

     -->

 

    <!-- One can also specify an existing Analyzer class that has a

         default constructor via the class attribute on the analyzer element.

         Example:

    <fieldType name="text_greek" class="solr.TextField">

      <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>

    </fieldType>

    -->

 

    <!-- A text field that only splits on whitespace for exact matching of words -->

    <dynamicField name="*_ws" type="text_ws"  indexed="true"  stored="true"/>

    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- A general text field that has reasonable, generic

         cross-language defaults: it tokenizes with StandardTokenizer,

           removes stop words from case-insensitive "stopwords.txt"

           (empty by default), and down cases.  At query time only, it

           also applies synonyms.

      -->

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">

      <analyzer type="index">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <!-- in this example, we will only use synonyms at query time

        <filter class="solr.SynonymGraphFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"expand="false"/>

        <filter class="solr.FlattenGraphFilterFactory"/>

        -->

        <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="true"/>

        <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- A text field with defaults appropriate for English: it tokenizes with StandardTokenizer,

         removes English stop words (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and

         finally applies Porter's stemming.  The query time analyzer also applies synonyms from synonyms.txt. -->

    <dynamicField name="*_txt_en" type="text_en"  indexed="true"  stored="true"/>

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- in this example, we will only use synonyms at query time

        <filter class="solr.SynonymGraphFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"expand="false"/>

        <filter class="solr.FlattenGraphFilterFactory"/>

        -->

        <!-- Case insensitive stop word removal.

        -->

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="lang/stopwords_en.txt"

            />

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:

        <filter class="solr.EnglishMinimalStemFilterFactory"/>

          -->

        <filter class="solr.PorterStemFilterFactory"/>

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="true"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="lang/stopwords_en.txt"

        />

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.EnglishPossessiveFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:

        <filter class="solr.EnglishMinimalStemFilterFactory"/>

          -->

        <filter class="solr.PorterStemFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- A text field with defaults appropriate for English, plus

         aggressive word-splitting and autophrase features enabled.

         This field is just like text_en, except it adds

         WordDelimiterGraphFilter to enable splitting and matching of

         words on case-change, alpha numeric boundaries, and

         non-alphanumeric chars.  This means certain compound word

         cases will work, for example query "wi fi" will match

         document "WiFi" or "wi-fi".

    -->

    <dynamicField name="*_txt_en_split" type="text_en_splitting"  indexed="true"  stored="true"/>

    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="true">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <!-- in this example, we will only use synonyms at query time

        <filter class="solr.SynonymGraphFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"expand="false"/>

        -->

        <!-- Case insensitive stop word removal.

        -->

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="lang/stopwords_en.txt"

        />

        <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1"catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <filter class="solr.PorterStemFilterFactory"/>

        <filter class="solr.FlattenGraphFilterFactory" />

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="true"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="lang/stopwords_en.txt"

        />

        <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1"catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <filter class="solr.PorterStemFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- Less flexible matching, but less false matches.  Probably not ideal for product names,

         but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->

    <dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight"  indexed="true"  stored="true"/>

    <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="true">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="false"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>

        <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="0"catenateWords="1" catenateNumbers="1" catenateAll="0"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <filter class="solr.EnglishMinimalStemFilterFactory"/>

        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes

             possible with WordDelimiterGraphFilter in conjuncton with stemming. -->

        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

        <filter class="solr.FlattenGraphFilterFactory" />

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="false"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>

        <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="0" generateNumberParts="0"catenateWords="1" catenateNumbers="1" catenateAll="0"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>

        <filter class="solr.EnglishMinimalStemFilterFactory"/>

        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes

             possible with WordDelimiterGraphFilter in conjuncton with stemming. -->

        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- Just like text_general except it reverses the characters of

           each token, to enable more efficient leading wildcard queries.

    -->

    <dynamicField name="*_txt_rev" type="text_general_rev"  indexed="true"  stored="true"/>

    <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"

                maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"expand="true"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

        <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <dynamicField name="*_phon_en" type="phonetic_en"  indexed="true"  stored="true"/>

    <fieldType name="phonetic_en" stored="false" indexed="true" class="solr.TextField" >

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>

      </analyzer>

    </fieldType>

 

    <!-- lowercases the entire field value, keeping it as a single token.  -->

    <dynamicField name="*_s_lower" type="lowercase"  indexed="true"  stored="true"/>

    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.KeywordTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory" />

      </analyzer>

    </fieldType>

 

    <!--

      Example of using PathHierarchyTokenizerFactory at index time, so

      queries for paths match documents at that path, or in descendent paths

    -->

    <dynamicField name="*_descendent_path" type="descendent_path"  indexed="true"  stored="true"/>

    <fieldType name="descendent_path" class="solr.TextField">

      <analyzer type="index">

        <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.KeywordTokenizerFactory" />

      </analyzer>

    </fieldType>

 

    <!--

      Example of using PathHierarchyTokenizerFactory at query time, so

      queries for paths match documents at that path, or in ancestor paths

    -->

    <dynamicField name="*_ancestor_path" type="ancestor_path"  indexed="true"  stored="true"/>

    <fieldType name="ancestor_path" class="solr.TextField">

      <analyzer type="index">

        <tokenizer class="solr.KeywordTokenizerFactory" />

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />

      </analyzer>

    </fieldType>

 

    <!-- This point type indexes the coordinates as separate fields (subFields)

      If subFieldType is defined, it references a type, and a dynamic field

      definition is created matching *___<typename>.  Alternately, if

      subFieldSuffix is defined, that is used to create the subFields.

      Example: if subFieldType="double", then the coordinates would be

        indexed in fields myloc_0___double,myloc_1___double.

      Example: if subFieldSuffix="_d" then the coordinates would be indexed

        in fields myloc_0_d,myloc_1_d

      The subFields are an implementation detail of the fieldType, and end

      users normally should not need to know about them.

     -->

    <dynamicField name="*_point" type="point"  indexed="true"  stored="true"/>

    <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

 

    <!-- A specialized field for geospatial search filters and distance sorting. -->

    <fieldType name="location" class="solr.LatLonPointSpatialField" docValues="true"/>

 

    <!-- A geospatial field type that supports multiValued and polygon shapes.

      For more information about this and other spatial fields see:

      http://lucene.apache.org/solr/guide/spatial-search.html

    -->

    <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"

               geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />

 

    <!-- Payloaded field types -->

    <fieldType name="delimited_payloads_float" stored="false" indexed="true" class="solr.TextField">

      <analyzer>

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>

      </analyzer>

    </fieldType>

    <fieldType name="delimited_payloads_int" stored="false" indexed="true" class="solr.TextField">

      <analyzer>

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="integer"/>

      </analyzer>

    </fieldType>

    <fieldType name="delimited_payloads_string" stored="false" indexed="true" class="solr.TextField">

      <analyzer>

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="identity"/>

      </analyzer>

    </fieldType>

 

    <!-- some examples for different languages (generally ordered by ISO code) -->

 

    <!-- Arabic -->

    <dynamicField name="*_txt_ar" type="text_ar"  indexed="true"  stored="true"/>

    <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- for any non-arabic -->

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" />

        <!-- normalizes ﻯ to ﻱ, etc -->

        <filter class="solr.ArabicNormalizationFilterFactory"/>

        <filter class="solr.ArabicStemFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- Bulgarian -->

    <dynamicField name="*_txt_bg" type="text_bg"  indexed="true"  stored="true"/>

    <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_bg.txt" />

        <filter class="solr.BulgarianStemFilterFactory"/>      

      </analyzer>

    </fieldType>

     

    <!-- Catalan -->

    <dynamicField name="*_txt_ca" type="text_ca"  indexed="true"  stored="true"/>

    <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- removes l', etc -->

        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ca.txt"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ca.txt" />

        <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>      

      </analyzer>

    </fieldType>

     

    <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->

    <dynamicField name="*_txt_cjk" type="text_cjk"  indexed="true"  stored="true"/>

    <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- normalize width before bigram, as e.g. half-width dakuten combine  -->

        <filter class="solr.CJKWidthFilterFactory"/>

        <!-- for any non-CJK -->

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.CJKBigramFilterFactory"/>

      </analyzer>

    </fieldType>

 

    <!-- Czech -->

    <dynamicField name="*_txt_cz" type="text_cz"  indexed="true"  stored="true"/>

    <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_cz.txt" />

        <filter class="solr.CzechStemFilterFactory"/>      

      </analyzer>

    </fieldType>

     

    <!-- Danish -->

    <dynamicField name="*_txt_da" type="text_da"  indexed="true"  stored="true"/>

    <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_da.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>      

      </analyzer>

    </fieldType>

     

    <!-- German -->

    <dynamicField name="*_txt_de" type="text_de"  indexed="true"  stored="true"/>

    <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"format="snowball" />

        <filter class="solr.GermanNormalizationFilterFactory"/>

        <filter class="solr.GermanLightStemFilterFactory"/>

        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->

        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Greek -->

    <dynamicField name="*_txt_el" type="text_el"  indexed="true"  stored="true"/>

    <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- greek specific lowercase for sigma -->

        <filter class="solr.GreekLowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" />

        <filter class="solr.GreekStemFilterFactory"/>

      </analyzer>

    </fieldType>

     

    <!-- Spanish -->

    <dynamicField name="*_txt_es" type="text_es"  indexed="true"  stored="true"/>

    <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt"format="snowball" />

        <filter class="solr.SpanishLightStemFilterFactory"/>

        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Basque -->

    <dynamicField name="*_txt_eu" type="text_eu"  indexed="true"  stored="true"/>

    <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_eu.txt" />

        <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>

      </analyzer>

    </fieldType>

     

    <!-- Persian -->

    <dynamicField name="*_txt_fa" type="text_fa"  indexed="true"  stored="true"/>

    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <!-- for ZWNJ -->

        <charFilter class="solr.PersianCharFilterFactory"/>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.ArabicNormalizationFilterFactory"/>

        <filter class="solr.PersianNormalizationFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" />

      </analyzer>

    </fieldType>

     

    <!-- Finnish -->

    <dynamicField name="*_txt_fi" type="text_fi"  indexed="true"  stored="true"/>

    <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fi.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>

        <!-- less aggressive: <filter class="solr.FinnishLightStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- French -->

    <dynamicField name="*_txt_fr" type="text_fr"  indexed="true"  stored="true"/>

    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- removes l', etc -->

        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt"format="snowball" />

        <filter class="solr.FrenchLightStemFilterFactory"/>

        <!-- less aggressive: <filter class="solr.FrenchMinimalStemFilterFactory"/> -->

        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="French"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Irish -->

    <dynamicField name="*_txt_ga" type="text_ga"  indexed="true"  stored="true"/>

    <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- removes d', etc -->

        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ga.txt"/>

        <!-- removes n-, etc. position increments is intentionally false! -->

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/hyphenations_ga.txt"/>

        <filter class="solr.IrishLowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ga.txt"/>

        <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>

      </analyzer>

    </fieldType>

     

    <!-- Galician -->

    <dynamicField name="*_txt_gl" type="text_gl"  indexed="true"  stored="true"/>

    <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_gl.txt" />

        <filter class="solr.GalicianStemFilterFactory"/>

        <!-- less aggressive: <filter class="solr.GalicianMinimalStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Hindi -->

    <dynamicField name="*_txt_hi" type="text_hi"  indexed="true"  stored="true"/>

    <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <!-- normalizes unicode representation -->

        <filter class="solr.IndicNormalizationFilterFactory"/>

        <!-- normalizes variation in spelling -->

        <filter class="solr.HindiNormalizationFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hi.txt" />

        <filter class="solr.HindiStemFilterFactory"/>

      </analyzer>

    </fieldType>

     

    <!-- Hungarian -->

    <dynamicField name="*_txt_hu" type="text_hu"  indexed="true"  stored="true"/>

    <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hu.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>

        <!-- less aggressive: <filter class="solr.HungarianLightStemFilterFactory"/> -->  

      </analyzer>

    </fieldType>

     

    <!-- Armenian -->

    <dynamicField name="*_txt_hy" type="text_hy"  indexed="true"  stored="true"/>

    <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" />

        <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>

      </analyzer>

    </fieldType>

     

    <!-- Indonesian -->

    <dynamicField name="*_txt_id" type="text_id"  indexed="true"  stored="true"/>

    <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt" />

        <!-- for a less aggressive approach (only inflectional suffixes), set stemDerivational to false -->

        <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>

      </analyzer>

    </fieldType>

     

    <!-- Italian -->

  <dynamicField name="*_txt_it" type="text_it"  indexed="true"  stored="true"/>

  <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <!-- removes l', etc -->

        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt"format="snowball" />

        <filter class="solr.ItalianLightStemFilterFactory"/>

        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Japanese using morphological analysis (see text_cjk for a configuration using bigramming)

 

         NOTE: If you want to optimize search for precision, use default operator AND in your request

         handler config (q.op) Use OR if you would like to optimize for recall (default).

    -->

    <dynamicField name="*_txt_ja" type="text_ja"  indexed="true"  stored="true"/>

    <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="false">

      <analyzer>

        <!-- Kuromoji Japanese morphological analyzer/tokenizer (JapaneseTokenizer)

 

           Kuromoji has a search mode (default) that does segmentation useful for search.  A heuristic

           is used to segment compounds into its parts and the compound itself is kept as synonym.

 

           Valid values for attribute mode are:

              normal: regular segmentation

              search: segmentation useful for search with synonyms compounds (default)

            extended: same as search mode, but unigrams unknown words (experimental)

 

           For some applications it might be good to use search mode for indexing and normal mode for

           queries to reduce recall and prevent parts of compounds from being matched and highlighted.

           Use <analyzer type="index"> and <analyzer type="query"> for this and mode normal in query.

 

           Kuromoji also has a convenient user dictionary feature that allows overriding the statistical

           model with your own entries for segmentation, part-of-speech tags and readings without a need

           to specify weights.  Notice that user dictionaries have not been subject to extensive testing.

 

           User dictionary attributes are:

                     userDictionary: user dictionary filename

             userDictionaryEncoding: user dictionary encoding (default is UTF-8)

 

           See lang/userdict_ja.txt for a sample user dictionary file.

 

           Punctuation characters are discarded by default.  Use discardPunctuation="false" to keep them.

        -->

        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>

        <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search"userDictionary="lang/userdict_ja.txt"/>-->

        <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (辞書形) -->

        <filter class="solr.JapaneseBaseFormFilterFactory"/>

        <!-- Removes tokens with certain part-of-speech tags -->

        <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" />

        <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->

        <filter class="solr.CJKWidthFilterFactory"/>

        <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" />

        <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -->

        <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>

        <!-- Lower-cases romaji characters -->

        <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>

     

    <!-- Latvian -->

    <dynamicField name="*_txt_lv" type="text_lv"  indexed="true"  stored="true"/>

    <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_lv.txt" />

        <filter class="solr.LatvianStemFilterFactory"/>

      </analyzer>

    </fieldType>

     

    <!-- Dutch -->

    <dynamicField name="*_txt_nl" type="text_nl"  indexed="true"  stored="true"/>

    <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt"format="snowball" />

        <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt"ignoreCase="false"/>

        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>

      </analyzer>

    </fieldType>

     

    <!-- Norwegian -->

    <dynamicField name="*_txt_no" type="text_no"  indexed="true"  stored="true"/>

    <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_no.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>

        <!-- less aggressive: <filter class="solr.NorwegianLightStemFilterFactory"/> -->

        <!-- singular/plural: <filter class="solr.NorwegianMinimalStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Portuguese -->

  <dynamicField name="*_txt_pt" type="text_pt"  indexed="true"  stored="true"/>

  <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt"format="snowball" />

        <filter class="solr.PortugueseLightStemFilterFactory"/>

        <!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> -->

        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->

        <!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Romanian -->

    <dynamicField name="*_txt_ro" type="text_ro"  indexed="true"  stored="true"/>

    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" />

        <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>

      </analyzer>

    </fieldType>

     

    <!-- Russian -->

    <dynamicField name="*_txt_ru" type="text_ru"  indexed="true"  stored="true"/>

    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>

        <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Swedish -->

    <dynamicField name="*_txt_sv" type="text_sv"  indexed="true"  stored="true"/>

    <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_sv.txt"format="snowball" />

        <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>

        <!-- less aggressive: <filter class="solr.SwedishLightStemFilterFactory"/> -->

      </analyzer>

    </fieldType>

     

    <!-- Thai -->

    <dynamicField name="*_txt_th" type="text_th"  indexed="true"  stored="true"/>

    <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.ThaiTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_th.txt" />

      </analyzer>

    </fieldType>

     

    <!-- Turkish -->

    <dynamicField name="*_txt_tr" type="text_tr"  indexed="true"  stored="true"/>

    <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.TurkishLowerCaseFilterFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_tr.txt" />

        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>

      </analyzer>

    </fieldType>

 

    <!-- Similarity is the scoring routine for each document vs. a query.

       A custom Similarity or SimilarityFactory may be specified here, but

       the default is fine for most applications. 

       For more info: http://lucene.apache.org/solr/guide/other-schema-elements.html#OtherSchemaElements-Similarity

    -->

    <!--

     <similarity class="com.example.solr.CustomSimilarityFactory">

       <str name="paramkey">param value</str>

     </similarity>

    -->

 

</schema>

步骤 5 : 重启 Solr

使用如下命令重启

cd D:\software\solr-7.2.1\bin

d:

solr.cmd stop -all

solr.cmd start

重启 Solr

步骤 6 : 重新测试分词

如图所示,使用中文分词后,就可以看到分词的效果了。
注: FieldType 记得选增加新的字段类型 中的 text_ik

重新测试分词

搜索引擎技术系列教材 (四)- solr - 设置字段


 

步骤 1 : 字段概念

创建Core 中的Core就相当于表,那么接下来就要为这个表设置字段,用于存放数据

步骤 2 : 创建name字段

左边选中 how2java -> Schema -> Add Field 输入name: name, field type: text_ik, 这里一定要使用中文分词 中新创建的 text_ik类型,否则后续查询中文会失败。
然后点击 Add Field按钮进行添加

创建name字段

步骤 3 : 创建其他字段

按照创建name字段 的方式,继续创建如下字段:
category text_ik,
price pfloat,
place text_ik,
code text_ik
注: price的类型是pfloat

步骤 4 : 关于id字段

id字段是默认就有的,无需自己创建

步骤 5 : 查看创建的字段

创建好之后,可以看到如图所示的这些字段名。
为什么要有这些字段名呢?因为后续要加入索引的产品对象,就是这些字段名了

查看创建的字段

搜索引擎技术系列教材 (五)- solr - SolrJ 教程 - 使用 SolrJ 向 Solr 增加14万条索引记录


 

步骤 1 : 如何创建索引

solr 提供了一种方式向其中增加索引的界面,但是呢。。。不太方便,也和实际工作环境不相符合。 
实际工作环境一般都是从数据库里读取数据,然后加入到索引的。很少会通过界面添加索引,因为这样维护更新删除也不方便,尤其是数据量比较大的时候。 
那么本教材就会讲解,如何通过程序把数据加入到Solr 索引里。

如何创建索引

步骤 2 : SolrJ

Solr 支持通过各种各样的语言(如php,javascript, c#, )把数据加入到索引里,因为本教程主要是基于Java的,所以会使用一个第三方工具SolrJ,使用 Java 语言来把数据加入到索引里。

SolrJ

步骤 3 : 先运行,看到效果,再学习

老规矩,先下载下载区(点击进入)的可运行项目,配置运行起来,确认可用之后,再学习做了哪些步骤以达到这样的效果。 
下载右上角的solr4j.rar, 然后运行里面的 TestSolr4j, 一共导入14万条数据,时间比较长,请耐心等待。
完成之后,打开

http://127.0.0.1:8983/solr/#/how2java


左边点击 Query -> 点击 Execute Query 查询之后,可以看到右侧显示查询结果,总数是 147939 条

先运行,看到效果,再学习

步骤 4 : 模仿和排错

在确保可运行项目能够正确无误地运行之后,再严格照着教程的步骤,对代码模仿一遍。 
模仿过程难免代码有出入,导致无法得到期望的运行结果,此时此刻通过比较正确答案 ( 可运行项目 ) 和自己的代码,来定位问题所在。 
采用这种方式,学习有效果,排错有效率,可以较为明显地提升学习速度,跨过学习路上的各个槛。 

推荐使用diffmerge软件,进行文件夹比较。把你自己做的项目文件夹,和我的可运行项目文件夹进行比较。 
这个软件很牛逼的,可以知道文件夹里哪两个文件不对,并且很明显地标记出来 
这里提供了绿色安装和使用教程:diffmerge 下载和使用教程

步骤 5 : 14万条数据

为了模仿真实环境,花了很多精力,四处搜刮来了14万条天猫的产品数据,接下来我们就会把这14万条记录加入到 Solr,然后观察搜索效果。
这14万条记录放在下载区(点击进入) 140k_products.rar,其解析办法在后续会讲解

步骤 6 : 关于数据库

本来应该先把这14万条记录保存进数据库,然后再从数据库中取出来的,不过考虑到不是每个同学都有JDBC基础,以及放进数据库的繁琐,和14万条数据从数据库里读取出来的时间消耗,就改成直接从文件里读取出来,然后转换为泛型是Product的集合的形式,相当于从数据库里读取出来了,不过会快很多。

步骤 7 : 140k_products.txt

首先下载 140k_products.rar,并解压为140k_products.txt, 然后放在项目目录下。 这个文件里一共有14万条产品记录。

140k_products.txt

步骤 8 : Product.java

准备实体类来存放产品信息
注: 每个字段上都有@Field 注解,用来告诉Solr 这些和 how2java core里的字段对应

package how2java;

 

import org.apache.solr.client.solrj.beans.Field;

 

public class Product {

     

    @Field

    int id;

    @Field

    String name;

    @Field

    String category;

    @Field

    float price;

    @Field

    String place;

    @Field

    String code;

     

    public int getId() {

        return id;

    }

    public void setId(int id) {

        this.id = id;

    }

    public String getName() {

        return name;

    }

    public void setName(String name) {

        this.name = name;

    }

    public String getCategory() {

        return category;

    }

    public void setCategory(String category) {

        this.category = category;

    }

    public float getPrice() {

        return price;

    }

    public void setPrice(float price) {

        this.price = price;

    }

    public String getPlace() {

        return place;

    }

    public void setPlace(String place) {

        this.place = place;

    }

 

    public String getCode() {

        return code;

    }

    public void setCode(String code) {

        this.code = code;

    }

    @Override

    public String toString() {

        return "Product [id=" + id + ", name=" + name + ", category=" + category + ", price=" + price + ", place="

                + place + ", code=" + code + "]";

    }

 

}

步骤 9 : ProductUtil

工具类,把 140k_products.txt 文本文件,转换为泛型是Product的集合

package how2java;

 

import java.awt.AWTException;

import java.io.File;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

 

import org.apache.commons.io.FileUtils;

     

public class ProductUtil {

     

    public static void main(String[] args) throws IOException, InterruptedException, AWTException {

 

        String fileName = "140k_products.txt";

         

        List<Product> products = file2list(fileName);

         

        System.out.println(products.size());

             

    }

 

    public static List<Product> file2list(String fileName) throws IOException {

        File f = new File(fileName);

        List<String> lines = FileUtils.readLines(f,"UTF-8");

        List<Product> products = new ArrayList<>();

        for (String line : lines) {

            Product p = line2product(line);

            products.add(p);

        }

        return products;

    }

     

    private static Product line2product(String line) {

        Product p = new Product();

        String[] fields = line.split(",");

        p.setId(Integer.parseInt(fields[0]));

        p.setName(fields[1]);

        p.setCategory(fields[2]);

        p.setPrice(Float.parseFloat(fields[3]));

        p.setPlace(fields[4]);

        p.setCode(fields[5]);

        return p;

    }

 

}

步骤 10 : SolrUtil

工具类,用来把产品集合批量增加到Solr. 这里就用到了SolrJ第三方包里的api了。

package how2java;

import java.io.IOException;

import java.util.List;

 

import org.apache.solr.client.solrj.SolrClient;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.beans.DocumentObjectBinder;

import org.apache.solr.client.solrj.impl.HttpSolrClient;

import org.apache.solr.common.SolrInputDocument;

 

public class SolrUtil {

    public static SolrClient client;

    private static String url;

    static {

        url = "http://localhost:8983/solr/how2java";

        client = new HttpSolrClient.Builder(url).build();

    }

 

    public static <T> boolean batchSaveOrUpdate(List<T> entities) throws SolrServerException, IOException {

 

        DocumentObjectBinder binder = new DocumentObjectBinder();

        int total = entities.size();

        int count=0;

        for (T t : entities) {

            SolrInputDocument doc = binder.toSolrInputDocument(t);

            client.add(doc);

            System.out.printf("添加数据到索引中,总共要添加 %d 条记录,当前添加第%d条 %n",total,++count);

        }

        client.commit();

        return true;

    }

 

}

步骤 11 : TestSolr4j

得到14万个产品对象,然后通过SolrUtil 工具类提交到Solr 服务器

package how2java;

import java.io.IOException;

import java.util.List;

import org.apache.solr.client.solrj.SolrServerException;

 

public class TestSolr4j {

    public static void main(String[] args) throws SolrServerException, IOException {

        List<Product> products = ProductUtil.file2list("140k_products.txt");

        SolrUtil.batchSaveOrUpdate(products);

    }

}

步骤 12 : 验证提交效果

打开

http://127.0.0.1:8983/solr/#/how2java


左边点击 Query -> 点击 Execute Query 查询之后,可以看到右侧显示查询结果,总数是 147939 条

验证提交效果

 

搜索引擎技术系列教材 (六)- solr - SolrJ 分页查询


 

步骤 1 : 先运行,看到效果,再学习

老规矩,先下载下载区(点击进入)的可运行项目,配置运行起来,确认可用之后,再学习做了哪些步骤以达到这样的效果。 
运行TestSolr4j,可以看到如图所示的查询了10条出来的效果

先运行,看到效果,再学习

步骤 2 : 模仿和排错

在确保可运行项目能够正确无误地运行之后,再严格照着教程的步骤,对代码模仿一遍。 
模仿过程难免代码有出入,导致无法得到期望的运行结果,此时此刻通过比较正确答案 ( 可运行项目 ) 和自己的代码,来定位问题所在。 
采用这种方式,学习有效果,排错有效率,可以较为明显地提升学习速度,跨过学习路上的各个槛。 

推荐使用diffmerge软件,进行文件夹比较。把你自己做的项目文件夹,和我的可运行项目文件夹进行比较。 
这个软件很牛逼的,可以知道文件夹里哪两个文件不对,并且很明显地标记出来 
这里提供了绿色安装和使用教程:diffmerge 下载和使用教程

步骤 3 : SolrUtil

SolrUtil 增加分页查询的方法 

public static QueryResponse query(String keywords,int startOfPage, int numberOfPage) throwsSolrServerException, IOException {

     SolrQuery query = new SolrQuery();

     query.setStart(startOfPage);

     query.setRows(numberOfPage);

      

     query.setQuery(keywords);

     QueryResponse rsp = client.query(query);

     return rsp;

 }

package how2java;

import java.io.IOException;

import java.util.List;

 

import org.apache.solr.client.solrj.SolrClient;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.beans.DocumentObjectBinder;

import org.apache.solr.client.solrj.impl.HttpSolrClient;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.common.SolrDocument;

import org.apache.solr.common.SolrDocumentList;

import org.apache.solr.common.SolrInputDocument;

import org.apache.solr.common.util.NamedList;

 

public class SolrUtil {

    public static SolrClient client;

    private static String url;

    static {

        url = "http://localhost:8983/solr/how2java";

        client = new HttpSolrClient.Builder(url).build();

    }

 

    public static <T> boolean batchSaveOrUpdate(List<T> entities) throws SolrServerException, IOException {

 

        DocumentObjectBinder binder = new DocumentObjectBinder();

        int total = entities.size();

        int count=0;

        for (T t : entities) {

            SolrInputDocument doc = binder.toSolrInputDocument(t);

            client.add(doc);

            System.out.printf("添加数据到索引中,总共要添加 %d 条记录,当前添加第%d条 %n",total,++count);

        }

        client.commit();

        return true;

    }

 

    public static QueryResponse query(String keywords,int startOfPage, int numberOfPage) throwsSolrServerException, IOException {

        SolrQuery query = new SolrQuery();

        query.setStart(startOfPage);

        query.setRows(numberOfPage);

         

        query.setQuery(keywords);

        QueryResponse rsp = client.query(query);

        return rsp;

    }

 

}

步骤 4 : TestSolr4j

拿到分页查询的结果,遍历出来

package how2java;

 

import java.io.IOException;

import java.util.Collection;

 

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.common.SolrDocument;

import org.apache.solr.common.SolrDocumentList;

 

public class TestSolr4j {

 

    public static void main(String[] args) throws SolrServerException, IOException {

        //查询

        QueryResponse queryResponse = SolrUtil.query("name:手机",0,10);

        SolrDocumentList documents= queryResponse.getResults();

        System.out.println("累计找到的条数:"+documents.getNumFound());

        if(!documents.isEmpty()){

             

            Collection<String> fieldNames = documents.get(0).getFieldNames();

            for (String fieldName : fieldNames) {

                System.out.print(fieldName+"\t");

            }

            System.out.println();

        }

         

        for (SolrDocument solrDocument : documents) {

             

            Collection<String> fieldNames= solrDocument.getFieldNames();

             

            for (String fieldName : fieldNames) {

                System.out.print(solrDocument.get(fieldName)+"\t");

                 

            }  

            System.out.println();

             

        }

    }

 

}

搜索引擎技术系列教材 (七)- solr - SolrJ 高亮显示


 

步骤 1 : 先运行,看到效果,再学习

老规矩,先下载下载区(点击进入)的可运行项目,配置运行起来,确认可用之后,再学习做了哪些步骤以达到这样的效果。 
运行TestSolr4j 看到如图所示的效果,手机 这个关键字被高亮出来了

先运行,看到效果,再学习

步骤 2 : 模仿和排错

在确保可运行项目能够正确无误地运行之后,再严格照着教程的步骤,对代码模仿一遍。 
模仿过程难免代码有出入,导致无法得到期望的运行结果,此时此刻通过比较正确答案 ( 可运行项目 ) 和自己的代码,来定位问题所在。 
采用这种方式,学习有效果,排错有效率,可以较为明显地提升学习速度,跨过学习路上的各个槛。 

推荐使用diffmerge软件,进行文件夹比较。把你自己做的项目文件夹,和我的可运行项目文件夹进行比较。 
这个软件很牛逼的,可以知道文件夹里哪两个文件不对,并且很明显地标记出来 
这里提供了绿色安装和使用教程:diffmerge 下载和使用教程

步骤 3 : SolrUtil

增加queryHighlight 方法

public static void queryHighlight(String keywords) throws SolrServerException, IOException {

    SolrQuery q = new SolrQuery();

    //开始页数

    q.setStart(0);

    //每页显示条数

    q.setRows(10);

    // 设置查询关键字

    q.setQuery(keywords);

    // 开启高亮

    q.setHighlight(true);

    // 高亮字段

    q.addHighlightField("name");

    // 高亮单词的前缀

    q.setHighlightSimplePre("<span style='color:red'>");

    // 高亮单词的后缀

    q.setHighlightSimplePost("</span>");

    //摘要最长100个字符

    q.setHighlightFragsize(100);

    //查询

    QueryResponse query = client.query(q);

 

    //获取高亮字段name相应结果

    NamedList<Object> response = query.getResponse();

    NamedList<?> highlighting = (NamedList<?>) response.get("highlighting");

    for (int i = 0; i < highlighting.size(); i++) {

        System.out.println(highlighting.getName(i) + ":" + highlighting.getVal(i));

    }

     

    //获取查询结果

    SolrDocumentList results = query.getResults();

    for (SolrDocument result : results) {

        System.out.println(result.toString());

    }

}

package how2java;

import java.io.IOException;

import java.util.List;

 

import org.apache.solr.client.solrj.SolrClient;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.beans.DocumentObjectBinder;

import org.apache.solr.client.solrj.impl.HttpSolrClient;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.common.SolrDocument;

import org.apache.solr.common.SolrDocumentList;

import org.apache.solr.common.SolrInputDocument;

import org.apache.solr.common.util.NamedList;

 

public class SolrUtil {

    public static SolrClient client;

    private static String url;

    static {

        url = "http://localhost:8983/solr/how2java";

        client = new HttpSolrClient.Builder(url).build();

    }

 

    public static void queryHighlight(String keywords) throws SolrServerException, IOException {

        SolrQuery q = new SolrQuery();

        //开始页数

        q.setStart(0);

        //每页显示条数

        q.setRows(10);

        // 设置查询关键字

        q.setQuery(keywords);

        // 开启高亮

        q.setHighlight(true);

        // 高亮字段

        q.addHighlightField("name");

        // 高亮单词的前缀

        q.setHighlightSimplePre("<span style='color:red'>");

        // 高亮单词的后缀

        q.setHighlightSimplePost("</span>");

        //摘要最长100个字符

        q.setHighlightFragsize(100);

        //查询

        QueryResponse query = client.query(q);

 

        //获取高亮字段name相应结果

        NamedList<Object> response = query.getResponse();

        NamedList<?> highlighting = (NamedList<?>) response.get("highlighting");

        for (int i = 0; i < highlighting.size(); i++) {

            System.out.println(highlighting.getName(i) + ":" + highlighting.getVal(i));

        }

         

        //获取查询结果

        SolrDocumentList results = query.getResults();

        for (SolrDocument result : results) {

            System.out.println(result.toString());

        }

    }

 

    public static <T> boolean batchSaveOrUpdate(List<T> entities) throws SolrServerException, IOException {

 

        DocumentObjectBinder binder = new DocumentObjectBinder();

        int total = entities.size();

        int count=0;

        for (T t : entities) {

            SolrInputDocument doc = binder.toSolrInputDocument(t);

            client.add(doc);

            System.out.printf("添加数据到索引中,总共要添加 %d 条记录,当前添加第%d条 %n",total,++count);

        }

        client.commit();

        return true;

    }

 

    public static QueryResponse query(String keywords,int startOfPage, int numberOfPage) throwsSolrServerException, IOException {

        SolrQuery query = new SolrQuery();

        query.setStart(startOfPage);

        query.setRows(numberOfPage);

         

        query.setQuery(keywords);

        QueryResponse rsp = client.query(query);

        return rsp;

    }

 

}

步骤 4 : TestSolr4j

调用queryHighlight 方法

package how2java;

 

import java.io.IOException;

 

import org.apache.solr.client.solrj.SolrServerException;

 

public class TestSolr4j {

 

    public static void main(String[] args) throws SolrServerException, IOException {

        //高亮查询查询

        SolrUtil.queryHighlight("name:手机");

         

    }

 

}

 

搜索引擎技术系列教材 (八)- solr - SolrJ 更新和删除索引


 

步骤 1 : 先运行,看到效果,再学习

老规矩,先下载下载区(点击进入)的可运行项目,配置运行起来,确认可用之后,再学习做了哪些步骤以达到这样的效果。 
运行TestSolr4j就可以看到在 修改之前使用关键字 鞭 能查询到一条结果,修改之后,查询的结果改变了。 删除之后,查询不到结果了

先运行,看到效果,再学习

步骤 2 : 模仿和排错

在确保可运行项目能够正确无误地运行之后,再严格照着教程的步骤,对代码模仿一遍。 
模仿过程难免代码有出入,导致无法得到期望的运行结果,此时此刻通过比较正确答案 ( 可运行项目 ) 和自己的代码,来定位问题所在。 
采用这种方式,学习有效果,排错有效率,可以较为明显地提升学习速度,跨过学习路上的各个槛。 

推荐使用diffmerge软件,进行文件夹比较。把你自己做的项目文件夹,和我的可运行项目文件夹进行比较。 
这个软件很牛逼的,可以知道文件夹里哪两个文件不对,并且很明显地标记出来 
这里提供了绿色安装和使用教程:diffmerge 下载和使用教程

步骤 3 : SolrUtil

SolrUtil提供一个对象的增加或者更新(都是同一个方法)

public static <T> boolean saveOrUpdate(T entity) throws SolrServerException, IOException {

    DocumentObjectBinder binder = new DocumentObjectBinder();

    SolrInputDocument doc = binder.toSolrInputDocument(entity);

    client.add(doc);

    client.commit();

    return true;

}


根据id删除这个索引

public static boolean deleteById(String id) {

    try {

        client.deleteById(id);

        client.commit();

    } catch (Exception e) {

        e.printStackTrace();

        return false;

    }

    return true;

}

package how2java;

import java.io.IOException;

import java.util.List;

 

import org.apache.solr.client.solrj.SolrClient;

import org.apache.solr.client.solrj.SolrQuery;

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.beans.DocumentObjectBinder;

import org.apache.solr.client.solrj.impl.HttpSolrClient;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.common.SolrDocument;

import org.apache.solr.common.SolrDocumentList;

import org.apache.solr.common.SolrInputDocument;

import org.apache.solr.common.util.NamedList;

 

public class SolrUtil {

    public static SolrClient client;

    private static String url;

    static {

        url = "http://localhost:8983/solr/how2java";

        client = new HttpSolrClient.Builder(url).build();

    }

 

    public static void queryHighlight(String keywords) throws SolrServerException, IOException {

        SolrQuery q = new SolrQuery();

        //开始页数

        q.setStart(0);

        //每页显示条数

        q.setRows(10);

        // 设置查询关键字

        q.setQuery(keywords);

        // 开启高亮

        q.setHighlight(true);

        // 高亮字段

        q.addHighlightField("name");

        // 高亮单词的前缀

        q.setHighlightSimplePre("<span style='color:red'>");

        // 高亮单词的后缀

        q.setHighlightSimplePost("</span>");

        //摘要最长100个字符

        q.setHighlightFragsize(100);

        //查询

        QueryResponse query = client.query(q);

 

        //获取高亮字段name相应结果

        NamedList<Object> response = query.getResponse();

        NamedList<?> highlighting = (NamedList<?>) response.get("highlighting");

        for (int i = 0; i < highlighting.size(); i++) {

            System.out.println(highlighting.getName(i) + ":" + highlighting.getVal(i));

        }

         

        //获取查询结果

        SolrDocumentList results = query.getResults();

        for (SolrDocument result : results) {

            System.out.println(result.toString());

        }

    }

 

    public static <T> boolean batchSaveOrUpdate(List<T> entities) throws SolrServerException, IOException {

 

        DocumentObjectBinder binder = new DocumentObjectBinder();

        int total = entities.size();

        int count=0;

        for (T t : entities) {

            SolrInputDocument doc = binder.toSolrInputDocument(t);

            client.add(doc);

            System.out.printf("添加数据到索引中,总共要添加 %d 条记录,当前添加第%d条 %n",total,++count);

        }

        client.commit();

        return true;

    }

 

    public static QueryResponse query(String keywords,int startOfPage, int numberOfPage) throwsSolrServerException, IOException {

        SolrQuery query = new SolrQuery();

        query.setStart(startOfPage);

        query.setRows(numberOfPage);

         

        query.setQuery(keywords);

        QueryResponse rsp = client.query(query);

        return rsp;

    }

 

    public static <T> boolean saveOrUpdate(T entity) throws SolrServerException, IOException {

        DocumentObjectBinder binder = new DocumentObjectBinder();

        SolrInputDocument doc = binder.toSolrInputDocument(entity);

        client.add(doc);

        client.commit();

        return true;

    }

     

    public static boolean deleteById(String id) {

        try {

            client.deleteById(id);

            client.commit();

        } catch (Exception e) {

            e.printStackTrace();

            return false;

        }

        return true;

    }

     

}

步骤 4 : TestSolr4j

修改之前查询一次
修改之后查询一次
删除之后查询一次
观察修改和删除的效果

TestSolr4j

package how2java;

 

import java.io.IOException;

import java.util.Collection;

import java.util.List;

 

import org.apache.solr.client.solrj.SolrServerException;

import org.apache.solr.client.solrj.response.QueryResponse;

import org.apache.solr.common.SolrDocument;

import org.apache.solr.common.SolrDocumentList;

 

public class TestSolr4j {

 

    public static void main(String[] args) throws SolrServerException, IOException {

        String keyword = "name:鞭";

        System.out.println("修改之前");

        query(keyword);

         

        Product p = new Product();

        p.setId(51173);

        p.setName("修改后的神鞭");

        SolrUtil.saveOrUpdate(p);

        System.out.println("修改之后");

        query(keyword);

         

        SolrUtil.deleteById("51173");

        System.out.println("删除之后");

        query(keyword);

         

    }

 

    private static void query(String keyword) throws SolrServerException, IOException {

        QueryResponse queryResponse = SolrUtil.query(keyword,0,10);

        SolrDocumentList documents= queryResponse.getResults();

        System.out.println("累计找到的条数:"+documents.getNumFound());

        if(!documents.isEmpty()){

              

            Collection<String> fieldNames = documents.get(0).getFieldNames();

            for (String fieldName : fieldNames) {

                System.out.print(fieldName+"\t");

            }

            System.out.println();

        }

          

        for (SolrDocument solrDocument : documents) {

              

            Collection<String> fieldNames= solrDocument.getFieldNames();

              

            for (String fieldName : fieldNames) {

                System.out.print(solrDocument.get(fieldName)+"\t");

                  

            } 

            System.out.println();

              

        }

    }

 

}


标签:--,入门教程,apache,client,org,全面,import,solr,solrr
来源: https://blog.51cto.com/u_14188313/2717615

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有