ElasticSearch

From CollectiveAccess Documentation
Revision as of 22:10, 21 November 2012 by Jonathan (talk | contribs)
Jump to: navigation, search

About ElasticSearch

[http://www.elasticsearch.org/]. "It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene."

Setup

Please refer to the ElasticSearch website for installation and setup notes. Once you have ElasticSearch set up, you will have to set aside an index for CollectiveAccess to use. By default that index is called "collectiveaccess" but that can be changed in the search.conf configuration file if you want to use one ElasticSearch setup for multiple CollectiveAccess instances. You also have to configure the communication endpoint you want CollectiveAccess to use. If you're running ElasticSearch locally with the default settings, the default values in the config file should work as is.

search_elasticsearch_base_url = http://localhost:9200/
search_elasticsearch_index_name = collectiveaccess

If ElasticSearch and CollectiveAccess are configured properly, the next step is to execute the following script:

<collectiveaccess_base_dir>/support/utils/createElasticSearchSchema.php

As the name indicates, the script deletes the configured index and recreates it (to wipe all existing data) and then creates an ElasticSearch mapping which tells the search engine how the data we index is structured based on the current CollectiveAccess metadata schema (the fields you configured for your system) so that it can be indexed properly. Data like date ranges, geographic points and numbers has to be treated differently so that we can take full advantage of ElasticSearch capabilities like range searches.

Operation

ElasticSearch is able to change its schema dynamically if new fields are added but it is recommended to recreate the index and the mapping with the script mentioned above when you do extensive changes to your metadata schema in CollectiveAccess because ElasticSearch's guesses regarding the data structure are not always optimal which can for example lead to dates being treated as pure strings.

Namespaces

Variants
Actions
Navigation
Tools
User
Personal tools