Difference between revisions of "Search.conf"

From CollectiveAccess Documentation
Jump to: navigation, search
(Indexing Tokenizer Regex)
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
'''IN PROGRESS'''
 
'''IN PROGRESS'''
 
Search configuration
 
 
Suffixes to add to searches if they conform to a listed regular expression
 
search_suffixes = {
 
[\d]+\.[0-9A-Za-z\.]* = *
 
 
 
search index configuration
 
search_indexing_config = <ca_conf_dir>/search_indexing.conf
 
  
 
===Indexing Tokenizer Regex===
 
===Indexing Tokenizer Regex===
  
This is the Regex character class used when indexing; values matched will be used as token delimiters(in other words, the search expression will be broken into words wherever the matched characters are).
+
This is the Regex character class used when indexing saved text; values matched will be used as token delimiters (in other words, the search expression will be broken into words wherever the matched characters are). Note that the default class, as displayed in the example below, starts with a caret ("^"), which has the effect of negating the class. In other words, the class defines what characters will '''not''' be treated a token delimiters.
  
 
<pre>
 
<pre>
Line 21: Line 11:
 
===Search Tokenizer Regex===
 
===Search Tokenizer Regex===
  
Regex character class used when searching; values matched will be used as token delimiters
+
This is the Regex character class used when searching; values matched will be used as token delimiters (this is the same thing as indexing_tokenizer_regex except that it's used when to break user searches into words rather than text to be indexed).
(this is the same thing as indexing_tokenizer_regex except that it's used when searching rather than indexing)
+
 
 +
<pre>
 
search_tokenizer_regex = ^\pL\pN\pNd/_#\@\&\.
 
search_tokenizer_regex = ^\pL\pN\pNd/_#\@\&\.
 +
</pre>
  
 
==="As Is" Regex Matching for Accession Numbers===
 
==="As Is" Regex Matching for Accession Numbers===
  
Here you may enter a list of regular expressions that if matched cause search input to be treated "as-is," or searched without processing. This is useful for preventing tokenization of accession numbers and other values that rely upon punctuation being kept intact.
+
Here you may enter a list of regular expressions that if matched cause search input to be treated "as-is," or searched without being broken up into tokens. This is useful for preventing tokenization of accession numbers and other values that rely upon punctuation being kept intact when being searched.
  
 
<pre>
 
<pre>
Line 35: Line 27:
 
</pre>
 
</pre>
  
===MySQL Fulltext Plugin Configuration===
+
===Changing the layout of quicksearch results===
  
 +
With the following format:
  
Set to 0 if you don't want search input stemmed (ie. suffixes removed) prior to search
+
<pre>
 +
ca_<table>_<type>_quicksearch_result_display_template =
 +
</pre>
 +
or
 +
<pre>
 +
ca_<table>_quicksearch_result_display_template =
 +
</pre>
  
The plugin uses the English Snoball stemmer (http://snowball.tartarus.org/) and can give poor results with non-English content. If you are cataloguing non-English material you will probably want to turn this off.
+
The format of the quick search results can be altered.  The value of the template uses the same syntax as [[Bundle_Display_Templates|bundle displays]]. The below is an example for adding "artists" to an "artwork" search result layout:
search_mysql_fulltext_do_stemming = 1
 
  
Perl-compatible regular expression used to tokenize text for indexing. The text will be broken up into words using any of the characters specified in the regular expression. The expression should be bracketed with start and end markers (eg. #<regex goes here># or !<regex goes here>!)
+
<pre>
 +
ca_objects_artwork_quicksearch_result_display_template =
 +
<unit relativeTo='ca_entities' restrictToRelationshipTypes='artist'><u>^ca_entities.preferred_labels.surname, ^ca_entities.preferred_labels.forename</u>:</unit>
 +
<em>^ca_objects.preferred_labels.name</em> (<l>^ca_objects.idno</l>) [^ca_objects.type_id]
 +
</pre>
  
If you change this setting you'll have to reindex your database to see a difference.
+
===[[Search_Engines#SQL-based_.22SQLSearch.22|SqlSearch]] Plugin Configuration===
search_mysql_fulltext_tokenize_preg = #[\.\,\!\?\_\- ]#
 
 
 
 
 
===Solr Plugin Configuration===
 
 
 
enter the home directory of the Solr here
 
search_solr_home_dir = /usr/local/solr/
 
 
 
enter the solr URL here
 
search_solr_url = http://localhost:9090/solr
 
 
 
 
 
===SqlSearch Plugin Configuration===
 
  
 
Set to 0 if you don't want search input stemmed (ie. suffixes removed) prior to search
 
Set to 0 if you don't want search input stemmed (ie. suffixes removed) prior to search
Line 67: Line 56:
  
  
===ElasticSearch Plugin Configuration===
+
===[[ElasticSearch]] Plugin Configuration===
  
 
enter the elastic search base url here (without any index names)
 
enter the elastic search base url here (without any index names)
Line 77: Line 66:
 
other applications.
 
other applications.
 
search_elasticsearch_index_name = collectiveaccess
 
search_elasticsearch_index_name = collectiveaccess
 +
 +
sphinx_moved

Latest revision as of 17:27, 9 August 2018

IN PROGRESS

Indexing Tokenizer Regex

This is the Regex character class used when indexing saved text; values matched will be used as token delimiters (in other words, the search expression will be broken into words wherever the matched characters are). Note that the default class, as displayed in the example below, starts with a caret ("^"), which has the effect of negating the class. In other words, the class defines what characters will not be treated a token delimiters.

indexing_tokenizer_regex = ^\pL\pN\pNd/_#\@\&\.

Search Tokenizer Regex

This is the Regex character class used when searching; values matched will be used as token delimiters (this is the same thing as indexing_tokenizer_regex except that it's used when to break user searches into words rather than text to be indexed).

search_tokenizer_regex = ^\pL\pN\pNd/_#\@\&\.

"As Is" Regex Matching for Accession Numbers

Here you may enter a list of regular expressions that if matched cause search input to be treated "as-is," or searched without being broken up into tokens. This is useful for preventing tokenization of accession numbers and other values that rely upon punctuation being kept intact when being searched.

asis_regexes = [
	"^[\d]+[\.\-][A-Za-z0-9\.\-]+$"
]

Changing the layout of quicksearch results

With the following format:

ca_<table>_<type>_quicksearch_result_display_template = 

or

ca_<table>_quicksearch_result_display_template = 

The format of the quick search results can be altered. The value of the template uses the same syntax as bundle displays. The below is an example for adding "artists" to an "artwork" search result layout:

ca_objects_artwork_quicksearch_result_display_template = 
<unit relativeTo='ca_entities' restrictToRelationshipTypes='artist'><u>^ca_entities.preferred_labels.surname, ^ca_entities.preferred_labels.forename</u>:</unit>
<em>^ca_objects.preferred_labels.name</em> (<l>^ca_objects.idno</l>) [^ca_objects.type_id]

SqlSearch Plugin Configuration

Set to 0 if you don't want search input stemmed (ie. suffixes removed) prior to search

The plugin uses the English Snoball stemmer (http://snowball.tartarus.org/) and can give poor results with non-English content. If you are cataloguing non-English material you will probably want to turn this off.

search_sql_search_do_stemming = 1


ElasticSearch Plugin Configuration

enter the elastic search base url here (without any index names) search_elasticsearch_base_url = http://localhost:9200/

This is the name of the ElasticSearch index used by CollectiveAccess. You probably don't need to change this unless you're using a single ElasticSearch setup for multiple CollectiveAccess instances and/or other applications. search_elasticsearch_index_name = collectiveaccess

sphinx_moved

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools